-
Notifications
You must be signed in to change notification settings - Fork 24
Green computing
OsmanSeckinSimsek edited this page Jan 27, 2023
·
6 revisions
- Power is measured in Watts,
- Energy is measured in Joules, 1 joule equals to 1 watt per second.
- daint: installed in /project/c32/skach/pmt/
- lumi: installed in ...
Get power usage with:
srun pmt/bin/NVML-test ./my-cuda-exe
srun pmt/bin/ROCM-test ./my-hip-exe
A typical job output will be (gpu usage only):
Runtime: 0.851007 s
Joules: 37.369 J
Watt: 43.9115 W
export
PMT_DUMPFILE=mydata
to save the collected data to a file.
- NVIDIA/cuda: nvml/NVML.cpp calls nvmlDeviceGetPowerUsage
- AMD/rocm: rocm/ROCM.cpp calls rsmi_dev_power_ave_get
- sphexa-cuda --init sedov --ascii -s 1 -n 200
# SPHEXA: develop/4ee066d0
# 1 MPI-3.1 process(es) with 12 OpenMP-201511 thread(s)/process
Data generated for 8000000 global particles
Domain synchronized, nLocalParticles 8000000
# domain::sync: 0.0480743s 0.235J
# FindNeighbors: 0.0109477s 0.016J
# XMass: 0.389342s 0.319J
# mpi::synchronizeHalos: 0.0103546s 0.019J
# Normalization & Gradh: 0.396792s 0.235J
# EquationOfState: 0.00975014s 0.031J
# mpi::synchronizeHalos: 0.00656075s 0.143J
# IadVelocityDivCurl: 0.465183s 0.233J
# mpi::synchronizeHalos: 0.00887381s 0.023J
# AVswitches: 0.432815s 0.229J
# mpi::synchronizeHalos: 0.00886888s 0.252J
# MomentumAndEnergy: 0.626999s 0.22J
# Timestep: 0.00885275s 0.023J
# UpdateQuantities: 0.0065658s 0.165J
# UpdateSmoothingLength: 0.00900962s 0.019J
...
=== Total time for iteration(1) 2.44552s
Setup the environment with perftools loaded, for example:
module swap PrgEnv-cray PrgEnv-gnu
module load perftools
cmake --build ...
# -> myexe
Instrument the executable with pat_build
tracing flag:
pat_build -f -g mpi build/myexe
# -> myexe+pat
Run the job (--cpu_bind is mandatory):
srun --cpu_bind=rank -n8 --ntasks-per-node=1 ./myexe+pat
Report the power usage for each compute node with pat_report
:
pat_report -O program_energy -s pe=ALL myexe+pat+35827-8637756t > rpt.txt
A typical output will be:
This table shows energy and power usage for the nodes with the
maximum, mean, and minimum usage, as well as the sum of usage over
all nodes.
Energy and power for accelerators is also shown, if applicable.
For further explanation, see the "General table notes" below, or
use: pat_report -v -O program_energy ...
Table 1: Program energy and power usage (from Cray PM)
Node | Node | Process | Node Id
Energy | Power (W) | Time | PE
(J) | | |
9,203 | 3,589.734 | 2.563700 | Total
|---------------------------------------------
| 1,440 | 561.585 | 2.564172 | nid.7
| | | | pe.7
| 1,292 | 504.010 | 2.563443 | nid.4
| | | | pe.4
| -- | -- | 2.563432 | nid.6
| | | | pe.6
- Tested on Lumi, Piz Daint and Alps.
Provided that slurm is configured to collect energy counters:
scontrol show conf |grep AcctGatherEnergyType
AcctGatherEnergyType = acct_gather_energy/pm_counters
It should allow to report job usage with: sacct -o ConsumedEnergy -j myjobid