Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deriv_mg_tune: when switching the configuration, the initial MG setup is used for the first inversion #609

Open
kostrzewa opened this issue Mar 13, 2024 · 0 comments
Assignees

Comments

@kostrzewa
Copy link
Member

kostrzewa commented Mar 13, 2024

When the MG autotuner has found a good setup on the first gauge configuration:

tuning_iteration: 300/300
cur_tuning_lvl: 0
cur_tuning_dir: mg_smoother_tol
steps_done_in_cur_dir: 0


             mg_mu_factor: (1.000000, 2.250000, 90.000000) -> (1.000000, 2.250000, 90.000000)
 mg_coarse_solver_maxiter: (45, 50, 25) -> (45, 50, 25)
     mg_coarse_solver_tol: (0.100000, 0.400000, 0.100000) -> (0.100000, 0.400000, 0.100000)
               mg_nu_post: (3, 1, 2) -> (3, 1, 2)
                mg_nu_pre: (0, 0, 0) -> (0, 0, 0)
          mg_smoother_tol: (0.200000, 0.100000, 0.200000) -> (0.300000, 0.100000, 0.200000)
                 mg_omega: (0.900000, 0.850000, 0.850000) -> (0.900000, 0.850000, 0.850000)

# TM_QUDA: Time for updateMultigridQuda 3.959178e+00 s level: 4 proc_id: 0 /DERIV_MG_TUNE/cloverdetlight:cloverdet_derivative/solve_degenerate/invert_eo_degenerate_quda/updateMultigridQuda
GCR: Convergence at 80 iterations, L2 relative residual: iterated = 3.030996e-11, true = 3.030996e-11 (requested = 3.162278e-11)
# TM_QUDA: Time for invertQuda 1.192162e+01 s level: 4 proc_id: 0 /DERIV_MG_TUNE/cloverdetlight:cloverdet_derivative/solve_degenerate/invert_eo_degenerate_quda/invertQuda


QUDA-MG param tuner: BEST SET OF PARAMETERS
-------------------------------------------
             mg_mu_factor: (1.000000, 2.250000, 90.000000)
 mg_coarse_solver_maxiter: (45, 50, 25)
     mg_coarse_solver_tol: (0.100000, 0.400000, 0.100000)
               mg_nu_post: (3, 1, 2)
                mg_nu_pre: (0, 0, 0)
          mg_smoother_tol: (0.200000, 0.100000, 0.200000)
                 mg_omega: (0.900000, 0.850000, 0.850000)
Timing: 11.767856, Iters: 80
-------------------------------------------

and the configuration is switched:

# Trying to read gauge field from file conf.0240 in double precision.
# Constructing LEMON reader for file conf.0240 ...
found header xlf-info, will now read the message
found header ildg-format, will now read the message
found header ildg-binary-data, will now read the message
# Time spent reading 309 Gb was 27.9 s.
# Reading speed: 11.1 Gb/s (43.3 Mb/s per MPI process).
found header scidac-checksum, will now read the message
# Scidac checksums for gaugefield conf.0240:
#   Calculated            : A = 0x6ce26943 B = 0x57720413.
#   Read from LIME headers: A = 0x6ce26943 B = 0x57720413.
# Reading ildg-format record:
#   Precision = 64 bits (double).
#   Lattice size: LX = 128, LY = 128, LZ = 128, LT = 256.
# Input parameters:
#   Precision = 64 bits (double).
#   Lattice size: LX = 128, LY = 128, LZ = 128, LT = 256.
# Finished reading gauge field.
# Computed plaquette value: 0.583358774141.

The first inversion will again be done with the initial MG setup with which the tuner was started:

# TM_QUDA: mu = 0.000540000000, kappa = 0.137972174000, csw = 1.611200000000
# TM_QUDA: using MG solver to invert operator with 2kappamu = 0.000149009948
# TM_QUDA: MG level 0, extent of (xyzt) dim 0: 32
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG level 0, extent of (xyzt) dim 1: 32
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG level 0, extent of (xyzt) dim 2: 32
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG level 0, extent of (xyzt) dim 3: 64
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG setting coarse mu scaling factor on level 0 to 1.000000
# TM_QUDA: MG level 1, extent of (xyzt) dim 0: 8
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG level 1, extent of (xyzt) dim 1: 8
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG level 1, extent of (xyzt) dim 2: 8
# TM_QUDA: MG aggregation size set to: 2
# TM_QUDA: MG level 1, extent of (xyzt) dim 3: 16
# TM_QUDA: MG aggregation size set to: 2
# TM_QUDA: MG setting coarse mu scaling factor on level 1 to 1.000000
# TM_QUDA: MG setting coarse mu scaling factor on level 2 to 30.000000
# TM_QUDA: Destroying MG Preconditioner Setup
# TM_QUDA: Performing MG Preconditioner Setup for gauge_id: 3.000000
# TM_QUDA: Generating MG Setup with mu = 0.000540000000 instead of 0.000540000000
# TM_QUDA: Time for MG_Preconditioner_Setup 3.509506e+02 s level: 4 proc_id: 0 /DERIV_MG_TUNE/cloverdetlight:cloverdet_derivative/solve_degenerate/invert_eo_degenerate_quda/MG_Preconditioner_Setup
# TM_QUDA: Time for reorder_spinor_eo_toQuda 5.094417e-02 s level: 4 proc_id: 0 /DERIV_MG_TUNE/cloverdetlight:cloverdet_derivative/solve_degenerate/invert_eo_degenerate_quda/reorder_spinor_eo_toQuda
GCR: Convergence at 350 iterations, L2 relative residual: iterated = 2.314415e-04, true = 2.314415e-04 (requested = 3.162278e-11)
# TM_QUDA: Time for invertQuda 1.227894e+03 s level: 4 proc_id: 0 /DERIV_MG_TUNE/cloverdetlight:cloverdet_derivative/solve_degenerate/invert_eo_degenerate_quda/invertQuda

and only then will the tuned setup be applied:

             mg_mu_factor: (1.000000, 2.250000, 90.000000) -> (1.000000, 2.250000, 90.000000)
 mg_coarse_solver_maxiter: (45, 50, 25) -> (45, 50, 25)
     mg_coarse_solver_tol: (0.100000, 0.400000, 0.100000) -> (0.100000, 0.400000, 0.100000)
               mg_nu_post: (3, 1, 2) -> (3, 1, 2)
                mg_nu_pre: (0, 0, 0) -> (0, 0, 0)
          mg_smoother_tol: (0.200000, 0.100000, 0.200000) -> (0.200000, 0.100000, 0.200000)
                 mg_omega: (0.900000, 0.850000, 0.850000) -> (0.900000, 0.850000, 0.850000)

# TM_QUDA: Time for updateMultigridQuda 3.961939e+00 s level: 4 proc_id: 0 /DERIV_MG_TUNE/cloverdetlight:cloverdet_derivative/solve_degenerate/invert_eo_degenerate_quda/updateMultigridQuda
GCR: Convergence at 81 iterations, L2 relative residual: iterated = 2.871780e-11, true = 2.871780e-11 (requested = 3.162278e-11)

The correct behaviour would be for the tuned setup to be already used for the very first inversion on the new config as the current behaviour can be extremely wasteful if the initial setup does not converge or is very slow.

@kostrzewa kostrzewa self-assigned this Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant