TreeCorr is a package for efficiently computing 2-point and 3-point correlation functions.
- The code is hosted at https://github.com/rmjarvis/TreeCorr
- It can compute correlations of regular number counts, weak lensing shears, or scalar quantities such as convergence or CMB temperature fluctutations.
- 2-point correlations may be auto-correlations or cross-correlations. This includes shear-shear, count-shear, count-count, kappa-kappa, etc. (Any combination of shear, kappa, and counts.)
- 3-point correlations currently can only be auto-correlations. This includes shear-shear-shear, count-count-count, and kappa-kappa-kappa. The cross varieties are planned to be added in the near future.
- Both 2- and 3-point functions can be done with the correct curved-sky calculation using RA, Dec coordinates, on a Euclidean tangent plane, or in 3D using either (RA,Dec,r) or (x,y,z) positions.
- The front end is in Python, which can be used as a Python module or as a standalone executable using configuration files. (The executable is corr2 for 2-point and corr3 for 3-point.)
- The actual computation of the correlation functions is done in C++ using ball trees (similar to kd trees), which make the calculation extremely efficient.
- When available, OpenMP is used to run in parallel on multi-core machines.
- Approximate running time for 2-point shear-shear is ~30 sec * (N/10^6) / core for a bin size b=0.1 in log(r). It scales as b^(-2). This is the slowest of the various kinds of 2-point correlations, so others will be a bit faster, but with the same scaling with N and b.
- The running time for 3-point functions are highly variable depending on the range of triangle geometries you are calculating. They are significantly slower than the 2-point functions, but many orders of magnitude faster than brute force algorithms.
- If you use TreeCorr in published research, please reference: Jarvis, Bernstein, & Jain, 2004, MNRAS, 352, 338 (I'm working on new paper about TreeCorr, including some of the improvements I've made since then, but this will suffice as a reference for now.)
- Record on the Astrophyics Source Code Library: http://ascl.net/1508.007
- Developed by Mike Jarvis. Fee free to contact me with questions or comments at mikejarvis17 at gmail. Or post an issue (see below) if you have any problems with the code.
The code is licensed under a FreeBSD license. Essentially, you can use the
code in any way you want, but if you distribute it, you need to include the
file TreeCorr_LICENSE
with the distribution. See that file for details.
The easiest way to install TreeCorr is with pip:
pip install treecorr
If you have previously installed TreeCorr, and want to upgrade to a new released version, you should do:
pip install treecorr --upgrade
Depending on the write permissions of the python distribution for your specific system, you might need to use one of the following variants:
sudo pip install treecorr pip install treecorr --user
The latter installs the Python module into ~/.local/lib/python3.7/site-packages
,
which is normally already in your PYTHONPATH, but it puts the executables
corr2
and corr3
into ~/.local/bin
which is probably not in your PATH.
To use these scripts, you should add this directory to your PATH. If you would
rather install into a different prefix rather than ~/.local, you can use:
pip install treecorr --install-option="--prefix=PREFIX"
This would install the executables into PREFIX/bin
and the Python module
into PREFIX/lib/python3.7/site-packages
.
If you would rather download the tarball and install TreeCorr yourself, that is also relatively straightforward:
All required dependencies should be installed automatically for you by setup.py, so you should not need to worry about these. But if you are interested, the dependencies are:
- numpy
- pyyaml
- LSSTDESC.Coord
- cffi
They can all be installed at once by running:
pip install -r requirements.txtThe last dependency is the only one that typically could cause any problems, since it in turn depends on a library called libffi. This is a common thing to have installed already on linux machines, so it is likely that you won't have any trouble with it, but if you get errors about "ffi.h" not being found, then you may need to either install it yourself or update your paths to include the directory where ffi.h is found.
See https://cffi.readthedocs.io/en/latest/installation.html for more information about installing cffi, including its libffi dependency.
Note
Two additional modules are not required for basic TreeCorr operations, but are potentially useful.
- fitsio is required for reading FITS catalogs or writing to FITS output files.
- pandas will signficantly speed up reading from ASCII catalogs.
These are both pip installable:
pip install fitsio pip install pandas
But they are not installed with TreeCorr automatically.
You can download the latest tarball from:
https://github.com/rmjarvis/TreeCorr/releases/Or you can clone the repository using either of the following:
git clone git@github.com:GalSim-developers/GalSim.git git clone https://github.com/GalSim-developers/GalSim.gitwhich will start out in the current stable release branch.
Either way, cd into the TreeCorr directory.
You can then install TreeCorr in the normal way with setup.py. Typically this would be the command:
python setup.py installIf you don't have write permission in your python distribution, you might need to use:
python setup.py install --userIn addition to installing the Python module
treecorr
, this will install the executablescorr2
andcorr3
in abin
folder somewhere on your system. Look for a line like:Installing corr2 script to /anaconda3/binor similar in the output to see where the scripts are installed. If the directory is not in your path, you will also get a warning message at the end letting you know which directory you should add to your path if you want to run these scripts.
If you want to run the unit tests, you can do the following:
pip install -r test_requirements.txt cd tests nosetests
This software is able to compute a variety of two-point correlations:
NN: | The normal two-point correlation function of number counts (typically galaxy counts). |
---|---|
GG: | Two-point shear-shear correlation function. |
KK: | Nominally the two-point kappa-kappa correlation function, although any scalar quantity can be used as "kappa". In lensing, kappa is the convergence, but this could be used for temperature, size, etc. |
NG: | Cross-correlation of counts with shear. This is what is often called galaxy-galaxy lensing. |
NK: | Cross-correlation of counts with kappa. Again, "kappa" here can be any scalar quantity. |
KG: | Cross-correlation of convergence with shear. Like the NG calculation, but weighting the pairs by the kappa values the foreground points. |
This software is not yet able to compute three-point cross-correlations, so the only avaiable three-point correlations are:
NNN: | Three-point correlation function of number counts. |
---|---|
GGG: | Three-point shear correlation function. We use the "natural components" called Gamma, described by Schneider & Lombardi (2003) (Astron.Astrophys. 397, 809) using the triangle centroid as the reference point. |
KKK: | Three-point kappa correlation function. Again, "kappa" here can be any scalar quantity. |
The executables corr2 and corr3 each take one required command-line argument, which is the name of a configuration file:
corr2 config_file corr3 config_file
A sample configuration file for corr2 is provided, called sample.params. See the TreeCorr wiki for the complete documentation about the allowed parameters.
You can also specify parameters on the command line after the name of the configuration file. e.g.:
corr2 config_file file_name=file1.dat gg_file_name=file1.out corr2 config_file file_name=file2.dat gg_file_name=file2.out ...
This can be useful when running the program from a script for lots of input files.
The typical usage in python is in three stages:
- Define one or more Catalogs with the input data to be correlated.
- Define the correlation function that you want to perform on those data.
- Run the correlation by calling
process
. - Maybe write the results to a file or use them in some way.
For instance, computing a shear-shear correlation from an input file stored in a fits file would look something like the following:
>>> import treecorr >>> cat = treecorr.Catalog('cat.fits', ra_col='RA', dec_col='DEC', ... ra_units='degrees', dec_units='degrees', ... g1_col='GAMMA1', g2_col='GAMMA2') >>> gg = treecorr.GGCorrelation(min_sep=1., max_sep=100., bin_size=0.1, ... sep_units='arcmin') >>> gg.process(cat) >>> xip = gg.xip # The xi_plus correlation function >>> xim = gg.xim # The xi_minus correlation function
For more involved worked examples, see our Jupyter notebook tutorial.
And for the complete details about all aspects of the code, see the Sphinx-generated documentation.
If you are used to using corr2
with a configuration file,
and want to learn how to do the same thing in pythonn, there is also a
guide
to migrating over.
If you find a bug running the code, please report it at:
https://github.com/rmjarvis/TreeCorr/issues
Click "New Issue", which will open up a form for you to fill in with the details of the problem you are having.
If you would like to request a new feature, do the same thing. Open a new issue and fill in the details of the feature you would like added to TreeCorr. Or if there is already an issue for your desired feature, please add to the discussion, describing your use case. The more people who say they want a feature, the more likely I am to get around to it sooner than later.