Skip to content

Green computing

OsmanSeckinSimsek edited this page Jan 27, 2023 · 6 revisions

Measuring Power Usage

  • Power is measured in Watts,
  • Energy is measured in Joules, 1 joule equals to 1 watt per second.

pmt.git

  • daint: installed in /project/c32/skach/pmt/
  • lumi: installed in ...

> GPUs

Get power usage with:

srun pmt/bin/NVML-test ./my-cuda-exe
srun pmt/bin/ROCM-test ./my-hip-exe

A typical job output will be (gpu usage only):

Runtime: 0.851007 s
Joules: 37.369 J
Watt: 43.9115 W

export PMT_DUMPFILE=mydata to save the collected data to a file.

API

code instrumentation (function level)

  • sphexa-cuda --init sedov --ascii -s 1 -n 200

# SPHEXA: develop/4ee066d0
# 1 MPI-3.1 process(es) with 12 OpenMP-201511 thread(s)/process
Data generated for 8000000 global particles
Domain synchronized, nLocalParticles 8000000
# domain::sync: 0.0480743s 0.235J
# FindNeighbors: 0.0109477s 0.016J
# XMass: 0.389342s 0.319J
# mpi::synchronizeHalos: 0.0103546s 0.019J
# Normalization & Gradh: 0.396792s 0.235J
# EquationOfState: 0.00975014s 0.031J
# mpi::synchronizeHalos: 0.00656075s 0.143J
# IadVelocityDivCurl: 0.465183s 0.233J
# mpi::synchronizeHalos: 0.00887381s 0.023J
# AVswitches: 0.432815s 0.229J
# mpi::synchronizeHalos: 0.00886888s 0.252J
# MomentumAndEnergy: 0.626999s 0.22J
# Timestep: 0.00885275s 0.023J
# UpdateQuantities: 0.0065658s 0.165J
# UpdateSmoothingLength: 0.00900962s 0.019J
...
=== Total time for iteration(1) 2.44552s

Cray's perftools

> Compute node(s) energy usage

Setup the environment with perftools loaded, for example:

module swap PrgEnv-cray PrgEnv-gnu
module load perftools
cmake --build ...
# -> myexe

Instrument the executable with pat_build tracing flag:

pat_build -f -g mpi build/myexe
# -> myexe+pat

Run the job (--cpu_bind is mandatory):

srun --cpu_bind=rank -n8 --ntasks-per-node=1 ./myexe+pat

Report the power usage for each compute node with pat_report:

pat_report -O program_energy -s pe=ALL myexe+pat+35827-8637756t > rpt.txt

A typical output will be:

  This table shows energy and power usage for the nodes with the
    maximum, mean, and minimum usage, as well as the sum of usage over
    all nodes.
    Energy and power for accelerators is also shown, if applicable.
  For further explanation, see the "General table notes" below, or
    use:  pat_report -v -O program_energy ...

Table 1:  Program energy and power usage (from Cray PM)

   Node |      Node |  Process | Node Id
 Energy | Power (W) |     Time |  PE
    (J) |           |          |
   9,203 | 3,589.734 | 2.563700 | Total
|---------------------------------------------
|  1,440 |   561.585 | 2.564172 | nid.7
|        |           |          |  pe.7
|  1,292 |   504.010 | 2.563443 | nid.4
|        |           |          |  pe.4
|     -- |        -- | 2.563432 | nid.6
|        |           |          |  pe.6
  • Tested on Lumi, Piz Daint and Alps.

Slurm

Provided that slurm is configured to collect energy counters:

scontrol show conf |grep AcctGatherEnergyType
AcctGatherEnergyType = acct_gather_energy/pm_counters

It should allow to report job usage with: sacct -o ConsumedEnergy -j myjobid