You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using coarsening on X2 direction (corresponding to $\theta$ in spherical coordinates), I obtain an illegal memory access error when I try to run idefix on GPU (seen on A100 and RTX 2080 SUPER). The same configuration works on CPU.
In my case, I use a 2D-axisym spherical grid, and I divide my radial grid in one uniform grid in the inner domain, and one log grid in the outer domain. To avoid that my uniform grid in $r$ becomes too thin in the $\theta$ direction when it comes close to the center of the domain, in need to coars the grid in the $\theta$ direction.
where radiusCoars is the radius that delimits the uniform and the log grids.
The error seems to occure when the number of points become too big.
For example :
- with one process on one RTX 2080 SUPER, the error seems to occure when the total number of point comes close to 2^22 (works for 2^21 points but not for 2^22 points).
- with two mpi processes (on two RTX 2080 SUPER) the limit increase also by a factor 2.
- increasing the number of point only in the log grid (which is not coarsed) seems also to be a trigger.
- if I remove the coarsening, I don't have this error whatever is the number of point.
It also seems to be triggered more easily when I increase the maximum coarsening level (via coarsIncrement in the coarsening function).
By looking at the log in debug mode, the error seems to appear when idefix is trying to coars the magnetic field while keeping divB equal to zero (in the second FLUID_CoarsenFlow_BXsn idefix_for loop in the src/fluid/coarsenFlow.hpp file).
Thanks in advance for your help.
Error message:
----> Profiler::Init...
----> ...returned
.:HMMMMHn:. ..:n..
.H*'`` `'%HM'''''!x. :x x*` .(MH: `#h. x.`M M> :nMMMMMMMh. `n. *kXk.. XL nnx:.XMMMMMMMMMMML .. 4X. )MMMMMx 'M `^?M*MMMMMMMMMMMM:HMMMHHMM. MMMMMMMX ?k 'X ..'*MMMMMMM.#MMMMMMMMMx XMMMMMMMX 4: M:MhHxxHHHx`MMx`MMMMMMMMM> XM!`?M `x 4MM'`''``HHhMMX 'MMMMMMMM 4M M `: *>`` .('MX '*MMMM' MX `X.nnx.. ..XMx` 'M*X
?h. ''^'*!Hx. :Mf xHMh M**MMM 4L` `*Mx `'*n.x. 4M> :M```'M ` %'% ``*MHMX X>! :!`#MM> X> ` :x
:M ?M `X . ..'M XX .!*X `x XM( MMx`h'M>:: `M: `+ MMX XMM `:
'M> M 'X 'MMX ?MMk.Xx..'M>?L ...:! MMX.H**'MMMM*h M> #L :!'`MM. . X*`.xHMMMMMnMk.
`!#h. :L XM'*hxHMM*MhHMMMMMMMMMM'#h + XMh: 4! x :f MM' `*MMMMMMMMMM% `X M Mf``tHhxHM M> 4k xxX' `#MMMMMMMf `M .> :f M `MMMMM: M> M!MMM: '*MMf''MH*! Xf 'MMMMMX `X X>'h.` :P*Mx. .d*~..
:M X 4MMMMM>! X~ `Mh. .nHL..M#'%nnMhH!'`
XM d>'X`'**h 'h M ^'MMHH+*'` '''' `'**' %nxM> *x+x.:. XL.. `k `::X:nMMHMMM:. X> Mn`*MMMMMHM: `: ?MMn. `'**MML M>'MMhMMMMMMMM # `M:^*x ^*MMttnnMMMMMMMMMMMH>. M:.4X `MMMM>X ( .MMM:MM! . `'''4x.dX +^ `''MMMMHM?L..``' `'`'`'` Idefix version 2.0.04-00e3db93 Built against Kokkos 30500Main: Initialization stage.----> Grid::Grid(Input)...----> ...returned----> GridHost::GridHost(Grid)...----> ...returned----> GridHost::MakeGrid...----> ...returned----> GridHost::SyncToDevice...----> ...returned----> DataBlock::DataBlock...--------> GridHost::GridHost(Grid)...--------> ...returned--------> GridHost::SyncFromDevice...--------> ...returned--------> DataBlock::ExtractSubdomain...------------> idefix_for(coordinates)...------------> ...returned------------> idefix_for(coordinates)...------------> ...returned------------> idefix_for(coordinates)...------------> ...returned--------> ...returned-------->DataBlock::MakeGeometry()...------------> idefix_for(init_coarsening)...------------> ...returned------------> idefix_for(Volumes)...------------> ...returned------------> idefix_for(GeometricalCentersX1)...------------> ...returned------------> idefix_for(GeometricalCentersX2)...------------> ...returned------------> idefix_for(GeometricalCentersX3)...------------> ...returned------------> idefix_for(AreaX1)...------------> ...returned------------> idefix_for(AreaX2)...------------> ...returned------------> idefix_for(AreaX3)...------------> ...returned--------> ...returned--------> Dump::Init...--------> ...returned--------> GridHost::GridHost(Grid)...--------> ...returned--------> GridHost::SyncFromDevice...--------> ...returned--------> Fluid::Fluid...------------> StateContainer::PushArray...------------> ...returned------------> StateContainer::PushArray...------------> ...returned------------> idefix_for(ComputePLMweights)...------------> ...returned------------> idefix_for(ComputePLMweights)...------------> ...returned------------> ConstrainedTransport::Init...------------> ...returned------------> Boundary::Boundary...Phys MHD------------> ...returned--------> ...returned--------> Gravity::Gravity...------------> SelfGravity::Init...----------------> Laplacian::Laplacian...--------------------> Laplacian::InitInternalGrid...------------------------> GridHost::GridHost(Grid)...------------------------> ...returned------------------------> GridHost::SyncFromDevice...------------------------> ...returned------------------------> idefix_for(InternalGridCopy)...------------------------> ...returned------------------------> idefix_for(InternalGridFill)...------------------------> ...returned------------------------> idefix_for(Volumes)...------------------------> ...returned------------------------> idefix_for(AreaX1)...------------------------> ...returned------------------------> idefix_for(AreaX2)...------------------------> ...returned------------------------> idefix_for(AreaX3)...------------------------> ...returned--------------------> ...returned--------------------> Laplacian::InitPreconditioner...------------------------> idefix_for(ResetPrecond)...------------------------> ...returned------------------------> idefix_for(InitPrecond)...------------------------> ...returned------------------------> idefix_for(InitPrecond)...------------------------> ...returned--------------------> ...returned--------------------> Laplacian::PreComputeLaplacian...------------------------> idefix_for(L_Factor)...------------------------> ...returned--------------------> ...returned----------------> ...returned----------------> idefix_for(InitDensity)...----------------> ...returned------------> ...returned--------> ...returned----> ...returned----> TimeIntegrator::TimeIntegrator(Input...)...--------> StateContainer::AllocateAs...--------> ...returned----> ...returned----> Output::Output...----> ...returnedSetup:: Coarsening is on.----> Output::EnrollUserDefVariable...----> ...returnedMain: initialisation finished.-----------------------------------------------------------------------------Input Parameters using input file idefix.ini:-----------------------------------------------------------------------------[Boundary] X1-beg userdef X1-end outflow X2-beg axis X2-end reflective X3-beg periodic X3-end periodic[Gravity] gravCst 39.476926408897626 potential selfgravity skip 1[Grid]; X1-grid 2 2e-4 196 u 1e-2 2640 l 5.e3 X2-grid 1 0.0 512 u 1.570796326794896 X1-grid 2 2e-4 98 u 1e-2 1320 l 5.e3 X2-grid 1 0.0 1024 u 1.570796326794896 X3-grid 1 0.0 1 u 6.283185307179586 coars_increment 1 coarsening static X2 radiusCoars 1e-2[Hydro] csiso userdef gamma 1.666667 solver hll[Output] dmp 0.001 dmp_dir ./Try5_resartTry3_newBoundary/ log 100 uservar PhiP Cs InvDt dV T AX3 vtk 0.001 vtk_dir ./Try5_resartTry3_newBoundary/[SelfGravity] boundary-X1-beg origin boundary-X1-end nullpot boundary-X2-beg axis boundary-X2-end nullgrad boundary-X3-beg periodic boundary-X3-end periodic maxIter 10000 skip 1 solver PBICGSTAB targetError 1e-4[Setup] Mc 1. T0 10. alpha 0.25 beta 0.0 mu 4.[TimeIntegrator] CFL 0.8 CFL_max_var 1.1 check_nan 100 first_dt 1.e-7 max_runtime 96 maxdivB 1e-2 nstages 2 tstop 24000.----------------------------------------------------------------------------------------------------------------------------------------------------------Input: Compiled with DOUBLE PRECISION arithmetic.Input: DIMENSIONS=2.Input: COMPONENTS=3.Input: Kokkos CUDA target ENABLED.Grid: full grid size is Direction X1: userdef 0.01....4096....5000 outflow Direction X2: axis 0....1024....1.5708 reflectiveGrid: static grid coarsening enabled in direction(s) X2 Hydro: solving MHD equations.Hydro: Using EXPERIMENTAL vector potential formulation for MHD.Hydro: Reconstruction: 2nd order (PLM Van Leer)EquationOfState: isothermal with user-defined cs function.RiemannSolver: hll (MHD).ConstrainedTransport: Using UCT_CONTACT averaging scheme.Axis: Axis regularisation ENABLED.Axis: Full 2pi regularisation around the axis.Gravity: ENABLED.Gravity: G=39.4769.Gravity: self-gravity ENABLED.SelfGravity: Using preconditionned BICGSTAB solver.SelfGravity: using origin boundary with 280 additional radial points.----> Bicgstab::ShowConfig...Bicgstab: TargetError: 0.0001Bicgstab: Maximum iterations: 10000----> ...returnedTimeIntegrator: using 2nd Order (RK2) integrator.TimeIntegrator: Using adaptive dt with CFL=0.8 .TimeIntegrator: will stop after 96 hours.Main: Creating initial conditions.----> Setup::Initflow...--------> DataBlockHost::DataBlockHost(DataBlock)...--------> ...returnedInnitFlow:: Sarting a new collapse.-------->DataBlockHost::SyncToDevice()...--------> ...returned----> ...returned----> ConstrainedTransport::ComputeMagFieldfromA...--------> idefix_for(ComputeMagFieldFromA)...--------> ...returned--------> Fluid::CoarsenMagField...------------> idefix_for(FLUID_CoarsenFlow_BXsn)...------------> ...returned------------> idefix_for(FLUID_CoarsenFlow_BXsn)...------------> ...returned--------> ...returned----> ...returned----> DataBlock::ComputeGridCoarseningLevels...--------> User-defined Coarsening function...------------> idefix_for(set_coarsening)...------------> ...returned--------> ...returned-------->DataBlock::CheckCoarseningLevels()...------------> DataBlockHost::DataBlockHost(DataBlock)...------------> ...returned------------>DataBlockHost::SyncFromDevice()...------------> ...returned--------> ...returned----> ...returned----> Fluid::CoarsenFlow...--------> idefix_for(FLUID_CoarsenFlow)...--------> ...returned----> ...returned----> Fluid::CoarsenMagField...--------> idefix_for(FLUID_CoarsenFlow_BXsn)...--------> ...returned--------> idefix_for(FLUID_CoarsenFlow_BXsn)...terminate called after throwing an instance of 'std::runtime_error'what(): cudaDeviceSynchronize() error( cudaErrorIllegalAddress): an illegal memory access was encountered /home/aborderi/src/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:150Traceback functionality not availableAborted
runtime information:
Idefix version 2.0.04-00e3db93
Master branch
Kokkos 30500
Seen on GPU A100 and RTX 2080 SUPER
The text was updated successfully, but these errors were encountered:
Hi @Anto6453 , thanks for the report. In order to reproduce this, could you send me the full setup you are using? (i.e. setup.cpp+idefix.ini) (can be by email if your don't want your setup to appear publicly).
Problem is that gridCoarsening tries to reconstruct the normal component of B from the divergence of the two tangential components of B stored in Vs. When DIMENSIONS<3, one of these components is not defined (because it's not stored on cell faces), resulting in a segfault (as it turns out, also on CPUs). The fix #230 will be part of Idefix v2.0.05 release.
Describe the issue:
When using coarsening on X2 direction (corresponding to$\theta$ in spherical coordinates), I obtain an illegal memory access error when I try to run idefix on GPU (seen on A100 and RTX 2080 SUPER). The same configuration works on CPU.
In my case, I use a 2D-axisym spherical grid, and I divide my radial grid in one uniform grid in the inner domain, and one log grid in the outer domain. To avoid that my uniform grid in$r$ becomes too thin in the $\theta$ direction when it comes close to the center of the domain, in need to coars the grid in the $\theta$ direction.
My coarsening function :
where
radiusCoars
is the radius that delimits the uniform and the log grids.The error seems to occure when the number of points become too big.
For example :
- with one process on one RTX 2080 SUPER, the error seems to occure when the total number of point comes close to 2^22 (works for 2^21 points but not for 2^22 points).
- with two mpi processes (on two RTX 2080 SUPER) the limit increase also by a factor 2.
- increasing the number of point only in the log grid (which is not coarsed) seems also to be a trigger.
- if I remove the coarsening, I don't have this error whatever is the number of point.
It also seems to be triggered more easily when I increase the maximum coarsening level (via
coarsIncrement
in the coarsening function).By looking at the log in debug mode, the error seems to appear when idefix is trying to coars the magnetic field while keeping divB equal to zero (in the second FLUID_CoarsenFlow_BXsn idefix_for loop in the src/fluid/coarsenFlow.hpp file).
Thanks in advance for your help.
Error message:
runtime information:
Idefix version 2.0.04-00e3db93
Master branch
Kokkos 30500
Seen on GPU A100 and RTX 2080 SUPER
The text was updated successfully, but these errors were encountered: