Skip to content

Commit

Permalink
communities
Browse files Browse the repository at this point in the history
  • Loading branch information
yy committed Feb 20, 2024
1 parent dfe73d5 commit d681d62
Show file tree
Hide file tree
Showing 4 changed files with 582 additions and 0 deletions.
6 changes: 6 additions & 0 deletions docs/m07-communities/class.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Probably the most basic way to quantify network community structure is in terms of categorical homophily.


This paper overviews multiple perspectives for community detection.

- M. T. Schaub et al., [The many facets of community detection in complex networks](https://appliednetsci.springeropen.com/articles/10.1007/s41109-017-0023-6), Applied Network Science 2:4 (2017) https://doi.org/10.1007/s41109-017-0023-6
279 changes: 279 additions & 0 deletions docs/m07-communities/lab07-link.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<table class=\"m01-notebook-buttons\" align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/yy/netsci-course/blob/master/m08-communities/link_communities_assignment.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
" <td>\n",
" <a href=\"https://github.com/yy/netsci-course/blob/master/m08-communities/link_communities_assignment.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View on Github</a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Link communities\n",
"For this assignment we will take a look at link communities and how they differ from node communities. In order to do this we will use the algorithm discussed in the reading (\"Link communities reveal multiscale complexity in networks\") and link community video from canvas.\n",
"\n",
"A small python module has been prepared that will allow you to use the link community algorithm with Anaconda and Python 3.5. To install the module you want to open a terminal or shell and use:\n",
"\n",
"```\n",
"pip install git+https://github.com/Nathaniel-Rodriguez/linkcom.git\n",
"```\n",
"\n",
"This will install the package locally in your Anaconda site-packages directory (the same place where the `conda` command would install new packages). Make sure you have [git](https://git-scm.com/) installed first. If you are using Windows you will need to use the Anaconda command prompt when using pip, so that it adds the package to Anaconda. If you have trouble installing the package you can just unpack the zip file from the [github repository](https://github.com/Nathaniel-Rodriguez/linkcom/tree/master/linkcom) and put the linkcom folder in your working directory.\n",
"\n",
"To use the package you can do:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import linkcom"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code has been adapted so that it will work with networkx graphs. The link communities algorithm requires simple undirected graphs to use. This means there can't be any self-loops or parallel edges. However, you can use weighted graphs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# First lets import networkx\n",
"import networkx as nx\n",
"\n",
"# And generate a new graph\n",
"my_graph = nx.erdos_renyi_graph(100, 0.1)\n",
"\n",
"# We need to make sure this is a graph of type Graph\n",
"print(type(my_graph).__name__)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the graph you load in isn't of type Graph (it maybe a Multigraph or DiGraph), it is easy to convert it to one:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"my_graph = nx.Graph(my_graph)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using linkcom\n",
"Now lets call the `cluster` method in `linkcom` to cluster the links of the graph. The `cluster` method takes several optional arguments:\n",
"\n",
"```\n",
"linkcom.cluster(nx_graph, threshold=None, is_weighted=False, weight_key='weight', dendro_flag=False, to_file=False, basename=\"clustering\", delimiter='\\t')\n",
"```\n",
"\n",
"`Threshold`: sets the cut-off for the dendrogram.\n",
"\n",
"`is_weighted`: can be `True` or `False` depending upon whether the graph has weights or not. Set this to `True` if the graph is weighted.\n",
"\n",
"`weight_key`: specifies what attribute the edges have that has weight values. In networkx it is convention that this key be set to `weight`. Most functions in networkx will assume this is the key. This is also the default value for the cluster method.\n",
"\n",
"`dendro_flag`: specifies whether to return the dendrogram (only applicable if the graph is unweighted and no threshold is given).\n",
"\n",
"`to_file`: specifies whether to write the outputs to file. Several files will be written and given names based on `basename` with elements separated by `delimiter`. \n",
"\n",
"These outputs will be written to file:\n",
"\n",
"```\n",
"Three text files with extensions .edge2comm.txt, .comm2edges.txt,\n",
"and .comm2nodes.txt store the communities.\n",
"\n",
"edge2comm, an edge on each line followed by the community\n",
"id (cid) of the edge's link comm:\n",
"node_i <delimiter> node_j <delimiter> cid <newline>\n",
"\n",
"comm2edges, a list of edges representing one community per line:\n",
"cid <delimiter> ni,nj <delimiter> nx,ny [...] <newline>\n",
"\n",
"comm2nodes, a list of nodes representing one community per line:\n",
"cid <delimiter> ni <delimiter> nj [...] <newline>\n",
"\n",
"The output filename contains the threshold at which the dendrogram\n",
"was cut, if applicable, or the threshold where the maximum\n",
"partition density was found, and the value of the partition \n",
"density.\n",
"\n",
"If no threshold was given to cut the dendrogram, a file ending with\n",
"'_thr_D.txt' is generated, containing the partition density as a\n",
"function of clustering threshold.\n",
"\n",
"If the dendrogram option was given, two files are generated. One with\n",
"'.cid2edge.txt' records the id of each edge and the other one with\n",
"'.linkage.txt' stores the linkage structure of the hierarchical \n",
"clustering. In the linkage file, the edge in the first column is \n",
"merged with the one in the second at the similarity value in the \n",
"third column.\n",
"```\n",
"\n",
"The cluster method will return a tuple with different elements:\n",
"\n",
" If no threshold is given, then a tuple is returned with: (dict) dictionary with keys=edges and values=community membership, (float) best similarity, (float) best partition density, (list) partition density list.\n",
"\n",
" If dendro_flag is given (only applicable if no threshold), then a tuple is returned with: (dict) dictionary with keys=edges and values=community membership, (float) best similarity, (float) best partition density, (list) partition density list, (dict) keys=edges and values=community membership for original, (list) dendrogram.\n",
"\n",
" If threshold is given a tuple is returned with: (dict) dictionary with keys=edges and values=community membership, partition density at threshold.\n",
"\n",
"You will mostly just be interested in using the dictionary which contains the community assignment data, which is always the first element of the tuple. Lets do a short example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"e2c, S, D, Dlist = linkcom.cluster(my_graph)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Dlist"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we print `e2c` we will see that each edge has a community membership:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(e2c)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since this is a random graph, we expect there not to be any meaningful communities, and indeed they are all labelled `56` (the community ID), so there doesn't appear to be any link communities in the graph.\n",
"\n",
"We can now readily take our results and put them back into our graph so that it can be saved and viewed in Gephi. We can do this using the [`set_edge_attributes`](https://networkx.github.io/documentation/stable/reference/generated/networkx.classes.function.set_edge_attributes.html?highlight=set_edge_attributes#networkx.classes.function.set_edge_attributes) function in networkx. It works just like the `set_node_attributes` function from previuos assignments, but with edges instead:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Put the link communities into the graph\n",
"# Note this function's syntax depends on the networkx version.\n",
"# If you use networkx 1.9 and below this line should be\n",
"# nx.set_edge_attributes(my_graph, \"linkcom\", e2c)\n",
"nx.set_edge_attributes(my_graph, e2c, \"linkcom\")\n",
"\n",
"# Save the graph to file\n",
"nx.write_gexf(my_graph, \"my_graph.gexf\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now if we were to open the graph in Gephi we should be able to use the link communities to color the graph edges. In Gephi you may need to click the little attribute type button so that edge attributes are set to [ranked rather than numeric](https://gephi.org/tutorials/gephi-tutorial-quick_start.pdf). This is because we want to color the edges according to their membership and not with a gradient. Additionally, in order to keep the edge colors when saving the graph you will need to make sure the edge color in the Preview tab is set to `original`. Lastly, since you will be looking at link communities (which determine the node membership in link clustering) it will be helpful to increase the size of the edges in Gephi so the colors are more visible and so you can detect nodes that belong to multiple communities.\n",
"\n",
"## The task\n",
"You can choose one from these two datasets:\n",
"\n",
"One is the NetSci collaboration graph. The nodes of the graph are people and links are formed between people who co-author a scientific paper together in network science. You can download it from [here](http://vlado.fmf.uni-lj.si/pub/networks/data/collab/netscience.htm). Hint: load with `read_pajek`.\n",
"\n",
"The graph has ~1500 nodes and is partly disconnected. If you have difficulty working with the full graph or trouble loading it into Gephi, you can use the largest connected component (which only has about 350 nodes). The largest connected component of a graph can be returned from networkx using:\n",
"\n",
"```\n",
"largest_component = my_graph.subgraph(sorted(nx.connected_components(my_graph), key=len, reverse=True)[0])\n",
"```\n",
"\n",
"Most of the interesting stuff is happening around this component anyway. So you don't lose much except for the scientists and groups that decided to work alone.\n",
"\n",
"Another option is the word association graph. Download: [here](https://www.dropbox.com/s/oky3cwwtwy1dfs0/word.edgelist?dl=0)\n",
"\n",
"Follow these steps for the assignment:\n",
"\n",
"Gephi users:\n",
"1. Load the graph and run the link communities clustering algorithm on it using linkcom module. You can follow the above-mentioned instructions to use the linkcom module.\n",
"2. Save the link communities to the graph and save the graph to file.\n",
"3. In Gephi choose a good layout for the graph.\n",
"4. Run the modularity command to generate communities for the nodes.\n",
"5. Color the edges according to the link communities and the nodes according to the communities found by Gephi. Remember to take care in choosing the resolution parameter.\n",
"6. How well does link clustering do at detecting community structure? How well does Gephi's node modularity do at detecting community structure? What do you think the communities represent? You can just provide a qualitative observation based on your intuition.\n",
"7. What are the similarities and differences between the communities detected by either algorithm?\n",
"8. i. (For NetSci Collaboration Graph) Which authors have a prominent position in multiple communities? What do you think these author's roles are? <br>\n",
" ii. (For Word Association Graph) Which words have a prominent position in multiple communities? What do you think these words' roles are?\n",
"9. What other features do you notice about the graph that are captured with overlaping communities?\n",
"10. Save your visualization to file.\n",
"11. Once complete, submit a PDF document to Canvas that contains your responses and your graph visualization (since this is a larger graph feel free to crop the figure so that it only includes parts relevant to your responses).\n",
"\n",
"Cytoscape users:\n",
"1. Instead of the code above, you may want to write the edges and their attributes into a `.csv` (comma seperated values) file. In Cytoscape, use \"File\" -> \"Import\" -> \"Table from file\".\n",
"2. Follow the instructions above. You can use some community detection plugins of your choice. It is ok if some details are different."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
295 changes: 295 additions & 0 deletions docs/m07-communities/lab07.ipynb

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,5 +97,7 @@ nav:
- m05-scalefree/lab05.ipynb
- "Module 6: Centralities":
- m06-centrality/lab06.ipynb
- "Module 7: Communites":
- m07-communities/lab07.ipynb
- "Resources":
- resources/index.md

0 comments on commit d681d62

Please sign in to comment.