Backend that converts qastle
to run on an ATLAS xAOD backend.
This allows you to query hierarchical data stored in a root file that has been written using the ATLAS xAOD format. This code allows you to query that.
A short list of some of the features that are supported by the xAOD
C++ translator follows.
Many, but not all, parts of the python
language are supported. As a general rule, anything that is a statement or flow control is not supported. So no if
or while
or for
statements, for example. Assignment isn't supported, which may sound limiting - but this is a functional implementation so it is less to than one might think.
What follows are the parts of the language that are covered:
- Function calls, method calls, property references, and lambda calls (and lambda functions), with some limitations.
- Integer indexing into arrays
- Limited tuple support as a means of collecting information together, or as an output to a ROOT file.
- Limited list support (in same way as above). In particular, the
append
method is not supported as that modifies the list, rather than creating a new one. - Unary, Binary, and comparison operations. Only 2 argument comparisons are supported (e.g.
a > b
and nota > b > c
). - Using
and
andor
to combine conditional expressions. Note that this is written as&
and|
when writing an expression due to the factpython
demands abool
return fromand
andor
when written in code. - The conditional if expression (
10 if a > 10 else 20
) - Floating point numbers, integers, and strings.
You can call the functions that are supported by the C++ objects as long as the required arguments are primitive types. Listed below are special extra functions attached to various objects in the ATLAS xAOD data model.
The event object has the following special functions to access collections:
Jets
,Tracks
,EventInfo
,TruthParticles
,Electrons
,Muons
, andMissingET
. Each function takes a single argument, the name of the bank in the xAOD. For example, for the electrons one can pass"Electrons"
.
Adding new collections is fairly easy.
Template functions don't make sense yet in python.
getAttribute
- this function is templated, so must be called as eithergetAttributeFloat
orgetAttributeVectorFloat
.
- Math Operators: +, -, *, /, %, **
- Comparison Operators: <, <=, >, >=, ==, !=
- Unary Operators: +, -, not
- Math functions are pulled from the C++
cmath
library:sin
,cos
,tan
,acos
,asin
,atan
,atan2
,sinh
,cosh
,tanh
,asinh
,acosh
,atanh
,exp
,ldexp
,log
,ln
,log10
,exp2
,expm1
,ilogb
,log1p
,log2
,scalbn
,scalbln
,pow
,sqrt
,cbrt
,hypot
,erf
,erfc
,tgamma
,lgamma
,ceil
,floor
,fmod
,trunc
,round
,rint
,nearbyint
,remainder
,remquo
,copysign
,nan
,nextafter
,nexttoward
,fdim
,fmax
,fmin
,fabs
,abs
,fma
. - Do not use
math.sin
in a call. Howeversin
is just fine. If you do, you'll get an exception during resolution that it doesn't know how to translatemath
. - for things like
sum
,min
,max
, etc., use theSum
,Min
,Max
LINQ predicates.
It is possible to inject metadata into the qastle
query to alter the behavior of the C++ code production. Each sub-section below has a different type of metadata. In order to invoke this, use the Metadata
call, which takes as input stream and outputs the same stream, but the argument is a dictionary which contains the metadata.
A few things about metadata:
- No two metadata blocks can have the same name and different content. However, it is legal for them to have different dependencies. In that case, the multiple blocks are treated as a single block with a union of the dependencies.
- Exceptions (
ValueError
) are raised if the dependency graph can't be completed, or a circular dependency is discovered.
If you have a method that returns a non-standard type, use this metadata type to specify to the backend the return type. There are two different forms for this metadata - one if a single item is returned, and another if a collection of items are returned.
For a single item:
Key | Description | Example |
---|---|---|
metadata_type | The metadata type | "add_method_type_info" |
type_string | The object the method applies to, fully qualified, C++ | "xAOD::Jet" |
method_name | Name of the method | "pT" |
return_type | Type returned, C++, fully qualified | "float" , "float*" , "float**" |
deref_count | Number of times to dereference object before invoking this method (optional) | 2 |
Note: deref_count
is used when an object can "hide" hold onto other objects by dereferencing them (e.g. by overriding the operator operator*
). If it is zero (as it mostly is since operator*
isn't often overridden), then it can be omitted.
For a collection:
Key | Description | Example |
---|---|---|
metadata_type | The metadata type | "add_method_type_info" |
type_string | The object the method applies to, fully qualified, C++ | "xAOD::Jet" |
method_name | Name of the method | "jetWeights" |
return_type_element | The type of the collection element | "float" |
return_type_collection | The type of the collection | vector<float> , vector<float>* |
deref_count | Number of times to dereference object before invoking this method (optional) | 2 |
CMS and ATLAS store their basic reconstruction objects as collections (e.g. jets, etc.). You can define new collections on the fly with the following metadata
For atlas:
Key | Description | Example |
---|---|---|
metadata_type | The metadata type | "add_atlas_event_collection_info" |
name | The name of the collection (used to access it from the dataset object) | "TruthParticles" |
include_files | List of include files to use when accessing collection | ['file1.h', 'file2.h'] |
container_type | The container object that is filled | "'xAOD::ElectronContainer'" |
element_type | The element in the container. In atlas this is a pointer. | "xAOD::Electron" |
contains_collection | Some items are singletons (like EventInfo ) |
True or False |
For cms:
Key | Description | Example |
---|---|---|
metadata_type | The metadata type | "add_cms_event_collection_info" |
name | The name of the collection (used to access it from the dataset object) | "Vertex" |
include_files | List of include files to use when accessing collection | ['DataFormats/VertexReco/interface/Vertex.h'] |
container_type | The container object that is filled | "'reco::VertexCollection'" |
element_type | The element in the container. In atlas this is a pointer. | "reco::Vertex" |
contains_collection | Some items are singletons (like EventInfo ) |
True or False |
element_pointer | Indicates if the element type is a pointer | True or False |
Any include files listed will be added to the top of the query.cpp
file that is generated. While ordering is maintained within a single Metadata
query here, it is not maintained between different Metadata
calls.
All includes are done with straight quotes:
#include "file1.hpp"
#include "file2.hpp"
Key | Description | Example |
---|---|---|
metadata_type | The metadata type | "include_files" |
files | List of files to include. | ["file1.hpp", "file2.hpp"] |
The xAOD
code only renders the func_adl
expression as a ROOT file. The ROOT file contains a simple TTree
in its root directory.
- If
AsROOTTTree
is the top levelfunc_adl
node, then the tree name and file name are taken from that expression. Only a sequence of pythontuples
or a single item can be understood byAsROOTTTree
. - If a
Select
sequence ofint
ordouble
is the lastfunc_adl
expression, then a file calledxaod_output.root
will be generated, and it will contain aTTree
calledatlas_xaod_tree
with a single column, calledcol1
. - If a
Select
sequence of atuple
is the lastfunc_adl
expression, then a file calledxaod_output.root
will be generated, and it will contain aTTree
calledatlas_xaod_tree
with a columns namedcol1
,col2
, etc. - If a
Select
sequence of dictionary's is the lastfunc_adl
expression, then a file calledxaod_output.root
will be generated, and it will contain aTTree
calledatlas_xaod_tree
, with column names taken from the dictionary keys.
ServiceX
(and the servicex
frontend package) can convert from ROOT to other formats like a pandas.DataFrame
or an awkward
array.
Setting up the development environment:
- After creating a virtual environment, do a setup-in-place:
pip install -e .[test]
To run tests:
pytest -m "not atlas_xaod_runner and not cms_aod_runner"
will run the fast tests.pytest -m "atlas_xaod_runner"
andpytest -m "cms_aod_runner"
will run the slow tests for ATLAS and CMS respectively that require docker installed on your command line.docker
is involved via pythonsos.system
- so it needs to be available to the test runner.- The CI on github is setup to run tests against python
3.7
,3.8
, and3.9
(only the non-xaod-runner tests).
Contributing:
- Develop in another repo or on a branch
- Submit a PR against the
master
branch.
In general, the master
branch should pass all tests all the time. Releases are made by tagging on the master branch.
Publishing to PyPi:
- Automated by declaring a new release (or pre-release) in github's web interface
Designed for running locally, it is possible to setup and use the xAOD
backend if you have docker
installed on your local machine. To use this you first need to install the local flavor of this package:
pip install func_adl_xAOD[local]
You can then use the xAODDataset
object or the CMSRun1AODDataset
object to execute qastle
running on a docker image for ATLAS or CMS Run 1 AOD, locally.
- Specify the local path to files you want to run on in the arguments to the constructor
- Files are run serially, and in a blocking way
- This code is designed for development and testing work, and is not designed for large-scale production running on local files (not that that couldn't be done).
When something odd happens and you really want to look at the C++ output, you can do this by including the following code somewhere before the xAOD
backend is executed. This will turn on logging that will dump the output from the run and will also dump the C++ header and source files that were used to execute the query.
import logging
logging.basicConfig()
logging.getLogger("func_adl_xAOD.common.local_dataset").setLevel(level=logging.DEBUG)
- In general, the first two lines are a good thing to have in your notebooks, etc. It allows you to see where warning messages are coming from and might help when things are going sideways.