The xls_parser.py
“transpiles” the XLS file used by the FAPP
profiler into a python
script (xls_parser.out.py
).
xls_parser.out.py
mimics the calculations of the original XLS file
(without the need to access to the XLS file or Excel) and outputs the
results into JSON format.
Run xls_parse.py /path/to/cpu_pa_report.xlsm
which will generate
the xls_parse.out.py
file.
Requirements: openpyxl
, (e.g. do pip install openpyxl
).
The cpu_pa_report.xlsm
file must already be “loaded”: i.e. you
must use Excel once, as described in the fapp
manual to load data
from the CSV files and save it! The xls_parse.py
doesn’t work
with the “empty” cpu_pa_report.xlsm
as provided by fapp
.
Currently, the openpyxl
package gives the following 3 warnings
when reading the XLSM file. These can be ignored.
/usr/lib/python3.10/site-packages/openpyxl/worksheet/_reader.py:312: UserWarning: Unknown extension is not supported and will be removed warn(msg) /usr/lib/python3.10/site-packages/openpyxl/worksheet/_reader.py:296: UserWarning: Failed to load a conditional formatting rule. It will be discarded. Cause: expected <class 'float'> warn(msg) /usr/lib/python3.10/site-packages/openpyxl/worksheet/_reader.py:312: UserWarning: Conditional Formatting extension is not supported and will be removed warn(msg)
The xls_parser.out.py
should not need to be rebuilt for runs which
have fewer threads than the XLSM file which was used to generate
it. For example: if the XLSM file was generated with CSV data
obtained by profiling an 8 thread application, the
xls_parser.out.py
generated should be good for single thread
applications, but might generate wrong results (or fail) for 12
thread applications.
Generally, reusing the same xls_parser.out.py
to profile an
application with modifications to the code should yield correct
results. Any errors, incorrect results should be reported as an
issue.
# XMLPATH: the directory with paN.xml files
python xls_parse.out.py $XMLPATH
# see
python xls_parse.out.py --help
# for more options
The generated xls_parse.out.py
can be used to print (in JSON
format) all the derived counters calculated in the Excel file
(without the need for the Excel).
The xls_parse.out.py
has a one mandatory input: the path to a
directory with the paX.xml
files (where X=1, ..., 17
) generated
using the fapp
utility:
for i in `seq 1 17`; do
fapp -C -d ./tmp${i} -Icpupa,nompi -Hevent=pa${i} ${BINARY}
fapp -A -d ./tmp${i} -Icpupa -txml -o ./pa${i}.xml
done
Note that unlike Excel which reads CSV file, xls_parse.out.py
reads XLS files.
The output is printed to stdout
in JSON format as (nested) dictionaries.
It is inconvenient to download 17 files (plus the XLS) to a local PC to get the desired measurements, especially since profiling is rarely done for a single application. Usually, one does multiple measurements to compare different runs/applications, or one is attempting to improve an application based on the measurements. Using the Excel approach for the first situation is tedious and error-prone, and for the second situation extremely frustrating. It is also common for the programmers to be using Linux (and not Windows), which makes running Excel a further burden.
The ‘xls-parser’ tool can be OSSed with following conditions:
Distribution of the generated file (xls_parse.out.py) is not permitted.
No support for the values derived from the tool.
‘xls-parser’ツールは、次の条件でOSSにすることができます。
ツールからの生成されたファイル(xls_parse.out.py)の配布は許可されません。
ツールから派生した値は保証されません。
If you find the software useful, please cite it:
@software{vatai2022xlsparser,
author = {Vatai, Emil},
month = {March},
year = {2022},
title = {{XLS parser}},
howpublished = {\url{https://github.com/RIKEN-RCCS/xls-parser}},
version = {v1.0.3}
}