Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add versioninfo to MSE comparsion #2350

Closed
wants to merge 1 commit into from
Closed

Add versioninfo to MSE comparsion #2350

wants to merge 1 commit into from

Conversation

Sbozzolo
Copy link
Member

@Sbozzolo Sbozzolo commented Nov 9, 2023

Not a solution to unstable MSE problems, but this will print something like:

Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 32 virtual cores
Environment:
  LD_LIBRARY_PATH = /central/software/hdf5/1.12.2-ompi415//lib:/central/software/OpenMPI/4.1.5_cuda-12.2//lib:/central/software/ucx/1.14.1_cuda-12.2//lib:/central/software/CUDA/12.2/lib64:/central/software/CUDA/12.2/extras/CUPTI/lib64:/central/software/CUDA/12.2/targets/x86_64-linux/lib:/central/software/julia/1.9.3/lib
  LD_RUN_PATH = /central/software/OpenMPI/4.1.5_cuda-12.2//lib:/central/software/ucx/1.14.1_cuda-12.2//lib:/central/software/CUDA/12.2/lib64:/central/software/CUDA/12.2/extras/CUPTI/lib64:/central/software/CUDA/12.2/targets/x86_64-linux/lib

This can help identifying runs that are known to have MSE problems (e.g., because they are not broadwell nodes).

This helps understanding if the node is non-standard
@simonbyrne
Copy link
Member

If you just want the features, we can also get that by

scontrol show node $SLURM_NODELIST

I can add this to the default output

@simonbyrne
Copy link
Member

I've added this to the environment script, so now it should now it should display something like:

NodeName=hpc-92-37 Arch=x86_64 CoresPerSocket=14
   CPUAlloc=22 CPUEfctv=28 CPUTot=28 CPULoad=20.66
   AvailableFeatures=broadwell
   ActiveFeatures=broadwell
   Gres=gpu:p100:4
   NodeAddr=hpc-92-37 NodeHostName=hpc-92-37 Version=22.05.6
   OS=Linux 3.10.0-1160.53.1.el7.x86_64 #1 SMP Fri Jan 14 13:59:45 UTC 2022
   RealMemory=250000 AllocMem=145408 FreeMem=187328 Sockets=2 Boards=1
   State=MIXED+RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=30 Owner=N/A MCS_label=N/A
   Partitions=any
   BootTime=2023-09-20T09:28:59 SlurmdStartTime=2023-11-02T07:05:01
   LastBusyTime=2023-11-09T09:28:17
   CfgTRES=cpu=28,mem=250000M,billing=28,gres/gpu=4
   AllocTRES=cpu=22,mem=142G,gres/gpu=2
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Nov 9, 2023

If you just want the features, we can also get that by

scontrol show node $SLURM_NODELIST

I can add this to the default output

Yes, but I wanted to have everything immediately available on the job.

@Sbozzolo Sbozzolo closed this Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants