Skip to content

Calculate the correlation of millions of index pairs #348

Answered by mulimoen
Liripo asked this question in Q&A
Discussion options

You must be logged in to vote

You should retain the sparse structure and try to use the cheapest way of calculating the coefficients. Something like below might be faster (OBS: have not tested this myself, this is merely an example, you need to test this throughly yourself)

fn pearson_correlation(x: sprs::CsVecView<f64>, y: sprs::CsVecView<f64>) -> f64 {
    fn sum_x_x2(x: sprs::CsVecView<f64>) -> (f64, f64) {
        x.data()
            .iter()
            .fold((0.0, 0.0), |(x0, x1), &x| (x0 + x, x1 + x.powi(2)))
    }

    assert_eq!(x.dim(), y.dim());

    let (sum_x, sum_x2) = sum_x_x2(x);
    let (sum_y, sum_y2) = sum_x_x2(y);

    let sum_xy = x.dot(y);

    let n = x.dim() as f64;
    let numerator = n * sum_…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by Liripo
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants