Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing NN correlation from simulated catalogues without a random catalogue #124

Open
alexander-mead opened this issue Apr 26, 2021 · 5 comments

Comments

@alexander-mead
Copy link

I've been using treecorr to compute the halo-halo correlation function from haloes identified in an N-body simulation. I am working in periodic 3D, with x, y and z positions of haloes. I am interested in auto correlations, as well as cross correlations between haloes in different mass bins. If I understand treecorr correctly, then I also have to generate some random catalogues (e.g., https://rmjarvis.github.io/TreeCorr/_build/html/guide.html#using-random-catalogs) to form the estimator.

However, I feel that if I am using simulated data with no masks etc. then I shouldn't need to generate a random catalogue, since in this case I know exactly what RR, DR, RD (https://rmjarvis.github.io/TreeCorr/_build/html/nn.html#treecorr.NNCorrelation.calculateXi) are in this case, since the background density should be completely uniform on average. Is there a way to bypass the need for a random catalogue in this case? Or am I misunderstanding something?

@rmjarvis
Copy link
Owner

You can do whatever you want with the NN pair counts. The calculateXi function is available for people who do use randoms (which is typical for data). But if you want to do something else with the pair counts, you certainly can.

@alexander-mead
Copy link
Author

alexander-mead commented Apr 28, 2021

Ah, I see. I had to dig into the details of the NNcorrelation object, but now I have a function that does exactly what I need.

def calculateXi_sim(nn, Vsim, N1, N2=None):
    
    # Checks
    if (nn.npairs.all() != nn.weight.all()):
        raise ValueError('This does not currently work for weighted data')
    
    # Get the limits of the bin in r
    rmin, rmax = _calculate_rmin_rmax(nn)
    nr = nn.nbins
    
    # Calculate the correlation function from the pair counts
    nn.xi = np.zeros((nr))
    nn.varxi = np.zeros((nr))
    for i in range(nr):
        V = (4./3.)*np.pi*(rmax[i]**3-rmin[i]**3) # Volume of (possibly thick) shell
        if N2 is None:
            N12 = (N1**2)*V/Vsim # Expected number of pairs: Auto-correlation
        else: 
            N12 = N1*N2*V/Vsim # Expected number of pairs: Cross-correlation
        nn.xi[i] = -1.+nn.npairs[i]/N12 # Construct correlation function
        nn.varxi[i] = nn.npairs[i]/N12**2 # Variance in measured correlation function assuming Poisson stats for pair counts
        
    return nn.xi, nn.varxi

Would something like this (admittedly very basic) functionality be something worth including within the official treecorr codebase? I only suggest this because I asked some colleagues about how to do this calculation with treecorr and they told me that I should create a random catalogue with ~100 times the number of points relative to my halo samples and then to use the calculateXi function. If I understand treecorr correctly, using a random catalogue in this case is equivalent to using Monte-Carlo integration to find the volume of a spherical shell very precisely, which seems like a waste of computing power (and time).

@rmjarvis
Copy link
Owner

rmjarvis commented May 1, 2021

If it's really onerous, I can think about adding something to TreeCorr to do this for you. But (a) it feels fairly trivial to me, and (b) the code you posted is currently hard-coded to your particular scenario (periodic metric, 3D coords), so I'd have to generalize that if this were going to be a regular feature.

Also, you didn't post your _calculate_rmin_rmax function, but I believe it could just be

rmin, rmax = nn.left_edges, nn.right_edges

I'll leave this issue open in case any other users want to weigh in on how useful it would be. If there is enough interest, I'll consider it.

@alexander-mead
Copy link
Author

My function does exactly what I need it to do now, so this certainly doesn't need to be added to TreeCorr for my sake. However, I can tidy my function up, insert it into TreeCorr and make a pull request if you ever need that; although I'm sure you could probably do it much faster and more neatly than I could.

I can't think how to generalise it beyond a periodic metric since I think you need no mask for this to work... I guess you could make it work for a full curved sky simulation easily enough. The 2D-periodic case is straightforward, but I don't know how to assess the 'dimension' from the NNcorrelation object. I also can't see how one could generalise the function to use weighted data.

PS. My _calculate_rmin_rmax function should have been what you wrote... but I was recalculating these values.

@rmjarvis
Copy link
Owner

rmjarvis commented Sep 24, 2021

FYI, a recent paper has formulae generalizing this to non-periodic geometries, so apparently other people have been thinking along these lines as well.

https://arxiv.org/abs/2107.06918

If there is enough interest, I could implement these in TreeCorr. I'm imagining an API along the lines of:

rr = NNCorrelation(...)
rr.make_analytic_rr_periodic(npoints, period=period)  # periodic geometry, as above
rr.make_analytic_rr_rec_2d(npoints, x_size=x_size, y_size=y_size)  # 2d rectangle geometry
rr.make_analytic_rr_rec_3d(npoints, x_size=x_size, y_size=y_size, z_size=z_size)  # 3d prism geometry
... etc.

This would be done in lieu of running on a random catalog for cases where the geometry is simple enough that the analytic formulae are accurate. Then you would follow this with the normal calculateXi function or whatever else you would have done if using a real random catalog.

I don't think it's generally possible to do this when there are weights or non-trivial masks. But I suppose there are enough use cases where people run on simulated data with simple geometries, or where an approximate value of the random is sufficient, that it could be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants