Skip to content
sjones-hep-ph-liv-ac-uk edited this page Nov 21, 2016 · 19 revisions

Introduction

This wiki talks about ways to compare the accounting records at your site to the accounting records stored in APEL. It is based on the system we developed at Liverpool, but it is intended to be both portable and extensible. If the batch system or CE at your site is not covered by any of the sections, we encourage you to extend to software using similar scripts and techniques to those presented. Andrew McNab (Andrew.Mcnab@cern.)ch can give you write access to the repository for bug fixes and new methods.

Installing this software

For the time being, I recommended to install this software directly from github repo as follows:

git clone https://:@github.com/gridpp/audit.git

The nodes that need the software will be described in specific sections late.

Obtaining the APEL accounting estimates

You'll need a Linux system with a browser for this step. Login as root and install the software, as described above, off the /root directory or somewhere like that.

Start the browser and go to the next EGI accounting portal

https://accounting-next.egi.eu/

In the web page, click Research Infrastructure, Tier 2 ...

In the Row Variable, select Submit Host, and chose start time, end time, which is usually one month, e.g. Oct 2016 to Oct 2016. This is not as meaningless as it looks - it means the start of Oct to the end of Oct.

You will make two runs of the report. On the first run, select metric as Number of Jobs and click update. Below the table is a button to download the data as CSV. Do so, and save the file in ~/audit/APEL/, calling the file jobcount.csv.

On the second run, select metric as Normalised Sum of Elapsed * Number of Processors, Again, click Update and save the data as a CSV, calling the file hs06.csv this time.

On the command line, for each of your submit host, run this:

$ ./totals.pl hs06.csv hepgrid97
Numeric column totals:
      1859428.0362    1859428    0.09  
$ ./totals.pl jobcount.csv hepgrid97
Numeric column totals:
      76217    76217    0.23  

This means that, for the hepgrid97 .ph.liv.ac.uk CE, APEL knows of 76217 jobs, and thinks they did 1,859,428 HS06 Hours of work.

Do that for all your CEs/Submit hosts. Where you are using VAC, put in the leading characters of the VAC hostname; for Liverpool, this would be like this:

$ ./totals.pl hs06.csv vac01.ph.liv.ac.uk
Numeric column totals:
      484434.3381    484435    0  
$ ./totals.pl jobcount.csv vac01.ph.liv.ac.uk
Numeric column totals:
      111806    111806    0.31  

The work for each VAC factory is totalled, and this shows that APEL knows of 111806 jobs, and thinks they did 484,434 HS06 Hours of work.

Obtaining Torque accounting estimates

Introduction

This wiki talks about ways to compare the accounting records at your site to the accounting records stored in APEL. It is based on the system we developed at Liverpool, but it is intended to be both portable and extensible. If the batch system or CE at your site is not covered by any of the sections, we encourage you to extend to software using similar scripts and techniques to those presented. Andrew McNab (Andrew.Mcnab@cern.)ch can give you write access to the repository for bug fixes and new methods.

Installing this software

For the time being, I recommended to install this software directly from github repo as follows:

git clone https://:@github.com/gridpp/audit.git

The nodes that need the software will be described in specific sections late.

Obtaining the APEL accounting estimates

You'll need a Linux system with a browser for this step. Login as root and install the software, as described above, off the /root directory or somewhere like that.

Start the browser and go to the next EGI accounting portal

https://accounting-next.egi.eu/

In the web page, click Research Infrastructure, Tier 2 ...

In the Row Variable, select Submit Host, and chose start time, end time, which is usually one month, e.g. Oct 2016 to Oct 2016. This is not as meaningless as it looks - it means the start of Oct to the end of Oct.

You will make two runs of the report. On the first run, select metric as Number of Jobs and click update. Below the table is a button to download the data as CSV. Do so, and save the file in ~/audit/APEL/, calling the file jobcount.csv.

On the second run, select metric as Normalised Sum of Elapsed * Number of Processors, Again, click Update and save the data as a CSV, calling the file hs06.csv this time.

On the command line, for each of your submit host, run this:

$ ./totals.pl hs06.csv hepgrid97
Numeric column totals:
      1859428.0362    1859428    0.09  
$ ./totals.pl jobcount.csv hepgrid97
Numeric column totals:
      76217    76217    0.23  

This means that, for the hepgrid97 .ph.liv.ac.uk CE, APEL knows of 76217 jobs, and thinks they did 1,859,428 HS06 Hours of work.

Do that for all your CEs/Submit hosts. Where you are using VAC, put in the leading characters of the VAC hostname; for Liverpool, this would be like this:

$ ./totals.pl hs06.csv vac01.ph.liv.ac.uk
Numeric column totals:
      484434.3381    484435    0  
$ ./totals.pl jobcount.csv vac01.ph.liv.ac.uk
Numeric column totals:
      111806    111806    0.31  

The work for each VAC factory is totalled, and this shows that APEL knows of 111806 jobs, and thinks they did 484,434 HS06 Hours of work.

Obtaining Torque accounting estimates

Login as root into your Toque headnode and install the software, as described above, off the /root directory or somewhere like that.

Get a list of the files that cover the period you want, so the script below doesn't have to plough through them all. Since some records for Oct might lie in Aug or Sept, just, then list those too. Note: the location of the files may vary; check with your admin guy.

rm -f recordFilesCoveringPeriod
ls /var/lib/torque/server_priv/accounting/201609* >> recordFilesCoveringPeriod
ls /var/lib/torque/server_priv/accounting/201610* >> recordFilesCoveringPeriod
ls /var/lib/torque/server_priv/accounting/201611* >> recordFilesCoveringPeriod

Get the UNIX epochs for the start and end of the period in question (in this case on the start of October or later, but before the 1st Nov)

startEpoch=`date --date="Oct 01 00:00:00 UTC 2016" +%s`
endEpoch=`date   --date="Nov 01 00:00:00 UTC 2016" +%s`

Now run the script to get the job data for that period. You have to pass it the Publishing Benchmark to which you scale. At Liverpool, we scale to 10 HS06, which is 2500 bogoSpecInt2K, hence we use 2500. Also give it the start and end.

./extractRecordsBetweenEpochs.pl recordFilesCoveringPeriod 2500 $startEpoch $endEpoch > table.oct

Add up the tables to get the result for the month.

./accu.pl table.oct

The work done for that month, in HS06 Hours, should pop out. The job count for the month is represented by the number of lines in the table file.

Obtaining ARC accounting estimates

Obtaining VAC accounting estimates

Obtaining other accounting estimates

TBD

Comparison

Obtaining ARC accounting estimates

Obtaining VAC accounting estimates

Obtaining other accounting estimates

TBD

Comparison

Clone this wiki locally