Tools for working with NanoAOD (requiring only python + root, not CMSSW)
First of all, make sure the active branch of this repo is "VBS_PG" and not master. If you are reading these lines, you should already be in the right branch.
After that, fork the repo and verify that, in your github, the default branch of nanoAOD-tools is VBS_PG. Otherwise, you can set that in "Settings".
In order to make ML machinery to properly work, you need to run Python3. As consequence, you should install CMSSW release greater than 11_3_X.
After logged in and moved in your favourite CMSSW_X_Y_Z/src, run the following command line:
git clone git@github.com:<your_github_name>/nanoAOD-tools.git PhysicsTools/NanoAODTools
scram b -j 8
cd PhysicsTools/NanoAODTools
It's important that, before locally cloning the repo, you have made sure the default branch of YOUR github repo is VBS_PG. If "master" is the default one, then you will clone all my W' stuff (good, but not great)
After cloned, run:
git branch -u origin/VBS_PG
git remote add upstream git@github:anpicci/nanoAOD-tools.git
In this way, you can commit and push in your github repo any changes you made in lxplus in the right branch, as follows:
git commit -a -m "anything useful to describe your changes"
git push
As usual, if you add/remove any file, run
git add/rm <your-local-file>
before commiting-and-pushing.
If you want your changes to appear in central (i.e., this one) repo, in order to share your work with the other people working with it, make a pull request from browser and send me an email to inform me about your update.
If you want to pull central updates in both lxplus and github, I suggest you to run this command line:
git pull upstream VBS_PG
Tools for working with NanoAOD (requiring only python + root, not CMSSW)
You need to setup python 2.7 and a recent ROOT version first.
git clone https://github.com/cms-nanoAOD/nanoAOD-tools.git NanoAODTools
cd NanoAODTools
bash standalone/env_standalone.sh build
source standalone/env_standalone.sh
Repeat only the last command at the beginning of every session.
Please never commit neither the build directory, nor the empty init.py files created by the script.
cd $CMSSW_BASE/src
git clone https://github.com/cms-nanoAOD/nanoAOD-tools.git PhysicsTools/NanoAODTools
cd PhysicsTools/NanoAODTools
cmsenv
scram b
The script to run the post-processing step is scripts/nano_postproc.py
.
The basic syntax of the command is the following:
python scripts/nano_postproc.py /path/to/output_directory /path/to/input_tree.root
Here is a summary of its features:
- the
-s
,--postfix
option is used to specify the suffix that will be appended to the input file name to obtain the output file name. It defaults to _Friend in friend mode, _Skim in full mode. - the
-c
,--cut
option is used to pass a string expression (using the same syntax as in TTree::Draw) that will be used to select events. It cannot be used in friend mode. - the
-J
,--json
option is used to pass the name of a JSON file that will be used to select events. It cannot be used in friend mode. - if run with the
--full
option (default), the output will be a full nanoAOD file. If run with the--friend
option, instead, the output will be a friend tree that can be attached to the input tree. In the latter case, it is not possible to apply any kind of event selection, as the number of entries in the parent and friend tree must be the same. - the
-b
,--branch-selection
option is used to pass the name of a file containing directives to keep or drop branches from the output tree. The file should contain one directive amongkeep
/drop
(wildcards allowed as in TTree::SetBranchStatus) orkeepmatch
/dropmatch
(python regexp matching the branch name) per line, as shown in the this example file.--bi
and--bo
allows to specify the keep/drop file separately for input and output trees.
- the
--justcount
option will cause the script to printout the number of selected events, without actually writing the output file.
Please run with --help
for a complete list of options.
It is possible to import modules that will be run on each entry passing the event selection, and can be used to calculate new variables that will be included in the output tree (both in friend and full mode) or to apply event filter decisions.
We will use python/postprocessing/examples/exampleModule.py
as an example. The module definition file, containing a simple constructor
exampleModuleConstr = lambda : exampleProducer(jetSelection= lambda j : j.pt > 30)
should be imported using the following syntax:
python scripts/nano_postproc.py outDir /eos/cms/store/user/andrey/f.root -I PhysicsTools.NanoAODTools.postprocessing.examples.exampleModule exampleModuleConstr
Let us now examine the structure of the exampleProducer
module class. All modules must inherit from PhysicsTools.NanoAODTools.postprocessing.framework.eventloop.Module
.
- the
__init__
constructor function should be used to set the module options. - the
beginFile
function should create the branches that you want to add to the output file, calling thebranch(branchname, typecode, lenVar)
method ofwrappedOutputTree
.typecode
should be the ROOT TBranch type ("F" for float, "I" for int etc.).lenVar
should be the name of the variable holding the length of array branches (for instance,branch("Electron_myNewVar","F","nElectron")
). If thelenVar
branch does not exist already - it can happen if you create a new collection, see an example here) - it will be automatically created. - the
analyze
function is called on each event. It should returnTrue
if the event is to be retained,False
if it should be dropped.
See the effect of keep/drop instructions by running:
python scripts/nano_postproc.py outDir /eos/cms/store/user/andrey/f.root -I PhysicsTools.NanoAODTools.postprocessing.examples.exampleModule exampleModuleConstr -s _exaModu_keepdrop --bi scripts/keep_and_drop_input.txt --bo scripts/keep_and_drop_output.txt
comparing to the previous command (without --bi
and --bo
).
The output branch created by exampleModuleConstr produces the same result in both cases. But this one drops all other branches when creating output tree. It also runs faster.
The event interface, defined in PhysicsTools.NanoAODTools.postprocessing.framework.datamodule
, allows to dynamically construct views of objects organized in collections, based on the branch names, for instance:
electrons = Collection(event, "Electron")
if len(electrons)>1: print electrons[0].someVar+electrons[1].someVar
electrons_highpt = filter(lambda x: x.pt>50, electrons)
and this will access the elements of the Electron_someVar
, Electron_pt
branch arrays. Event variables can be accessed simply by event.someVar
, for instance event.rho
.
The output branches should be filled calling the fillBranch(branchname, value)
method of wrappedOutputTree
. value
should be the desired value for single-value branches, an iterable with the correct length for array branches. It is not necessary to fill the lenVar
branch explicitly, as this is done automatically using the length of the passed iterable.
Now, let's have a look at another example, python/postprocessing/examples/mhtjuProducerCpp.py
, file. Similarly, it should be imported using the following syntax:
python scripts/nano_postproc.py outDir /eos/cms/store/user/andrey/f.root -I PhysicsTools.NanoAODTools.postprocessing.examples.mhtjuProducerCpp mhtju
This module has the same structure of its producer as exampleProducer
, but in addition it utilizes a C++ code to calculate the mht variable, src/mhtjuProducerCppWorker.cc
. This code is loaded in the __init__
method of the producer.