Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Louvain algorithm for undirected graphs #1277

Draft
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

jpacold
Copy link
Contributor

@jpacold jpacold commented Sep 5, 2024

I finally had some time to work on #1141, so far only on the Rust side and only for undirected graphs. Opening this for feedback on how the Rust code is organized, and maybe to restart the discussion on whether this should go here or in petgraph.

The big additions here are:

  • metrics.rs

    • Modularity: a trait for graphs for which we can compute modularity (by extension this means we can apply the Louvain method)
    • modularity: computes the modularity, given a graph and a set of subsets.
    • Partition: a struct holding a graph partition. This mainly exists to keep the implementation of modularity organized.
  • louvain.rs

    • InnerGraph: at each level of the algorithm we work with an aggregated graph, where each node of this graph corresponds to one of the communities that were identified in the previous level). InnerGraph keeps track of this correspondence.
    • LouvainAlgo: helper functions for updating the Partition at each level of the algorithm.
    • one_level_undirected: most of the business logic is in here since it constructs the new communities at each level, roughly equivalent to _one_level in the networkx implementation.
    • louvain_communities: repeatedly calls one_level_undirected, stopping either after a fixed number of iterations or if the modularity improvement falls below the given threshold.

I realize this is a lot to review at once and can go into more detail about each piece if necessary.

Copy link
Collaborator

@IvanIsCoding IvanIsCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a great start! I would call it more than a draft given it already has good tests.

I left some Rust comments, I need to study the algorithm more myself to give more "algorithmic" advice

rustworkx-core/src/community/louvain.rs Outdated Show resolved Hide resolved
rustworkx-core/src/community/louvain.rs Outdated Show resolved Hide resolved
}

#[test]
fn test_louvain_karate_club_graph() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly reminder to myself that we should add a Karate club graph generator. It could be a separate PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once #1280 gets merged this test will hopefully be much shorter

// happen to get the same result as:
// import networkx as nx
// g = nx.karate_club_graph()
// communities = nx.community.louvain_communities(g, weight='weight', seed=12)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a brittle test (https://testing.googleblog.com/2024/04/how-i-learned-to-stop-writing-brittle.html) but leave a TODO there and we can revisit it later. I am glad it passes though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet what to do with this. I can adjust the resolution parameter so that we consistently get two output communities. However, there is still a lot of dependence on the seed (in networkx I get at least 5 different partitions with the same parameters)

rustworkx-core/src/community/louvain.rs Outdated Show resolved Hide resolved
};

let m: f64 = total_edge_weight(self.graph);
sigma_internal / m - resolution * sigma_total_squared / (m * m)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to reorganize this expression to avoid floating point erros, specially because of the common factor 1/m

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started doing this in 58857fd. Other than the common factor what are the possible issues? I think sigma_internal, sigma_total_squared / m, and m are all about the same order of magnitude but could be missing something.

@jpacold
Copy link
Contributor Author

jpacold commented Sep 10, 2024

  • Fixed two mistakes in the modularity gain calculation (I am actually a little surprised that the tests passed earlier and need to come up with some more)
  • Addressed some of the comments

It looks as if the CI is failing because I used associated type bounds.

@IvanIsCoding
Copy link
Collaborator

  • Fixed two mistakes in the modularity gain calculation (I am actually a little surprised that the tests passed earlier and need to come up with some more)
  • Addressed some of the comments

It looks as if the CI is failing because I used associated type bounds.

Indeed our current MSRV is Rust 1.70 from June 2023. Rust 1.79 is from June 2024. In the past we used to follow Debian stable's Rustc which believe it or not is in 1.63 still.

I'd try rewriting it before we discuss bumping MSR, 1.79 might break a couple people that install from source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants