A recipe for analysing co-hashtag networks with Gephi
This recipe illustrates how to analyse a co-hashtag network with Gephi. In particular, this recipe shows how to deal with very dense networks (i.e., hairballs), and it focuses on which strategies may be used to detect clusters of hashtags used together. The recipe touches upon four main strategies, which can and should be used iteratively until clusters of hashtags can be identified, named and interpreted: tweaking layout parameters, resizing nodes based on frequency, filtering nodes, and colouring nodes based on a community detection algorithm.
This recipe can be used to work with any co-hashtag network (e.g., from Twitter or Instagram). It assumes you already have a working network file that can be opened with Gephi (.gdf, .gephi, or two tables with edges and nodes data).
The goal of the following procedure is to identify communities of clusters more often used together. Depending on the size and shape of the dataset, different parameters may be needed. Therefore this guide is not meant to be followed literally. You might need to go over these four strategies more than once, adjust parameters, and repeat. Moreover, it is good to embrace ambiguity as an interpretative asset for visual network analysis.
When working with particularly big networks, you need to filter out some nodes based on various parameters to find a balance between complexity and readability. In this step, we apply different filters that can be used to reduce the size of the network. Depending on your dataset, you might need to filter more or fewer nodes (or not filter any).
- The first thing to do is always to delete the query nodes, that is, the hashtags used for collecting the co-hashtag network. These are usually the biggest nodes in the network (as they are connected to most of the other nodes). Go to the Data Laboratory, order the column containing hashtags frequency from high to low, select the first nodes, right-click, and select Delete All.
- We could also filter out hashtags with a low frequency. In the Overview panel, find the Filters panel (usually on the right of the interface). In the section Attributes → Range → select the filter called like the column containing hashtags frequency. Drag the filter in the bottom panel. Using the slider at the bottom, set the minimum value of hashtags frequency you want to keep. (you can also write the number).
- We then calculate the Degree of each node. In the Statistics Panel, select Average Degree and click Run.
- We then filter again the network based on the calculated Degree. In the filter panel, go to In the section Attributes → Range → select Degree. Drag the filter in the bottom panel as a subfilter (over the “Drag subfilter here” button). Using the slider at the bottom, select the minimum Degree (you can also write the number)
- Select Filter to activate the filters
In this step, we resize nodes based on how many times hashtags occur in the dataset. This step assumes that you have a column in the node table containing hashtags frequency.
- In the Data Laboratory, select the node tab and locate the column containing hashtags frequency.
- In Overview, go to the Appearance panel and select Nodes
- In the Appearance panel, click on the Size icon (three sized circles)
- In the Appearance panel, click on Ranking
- From the drop-down menu, select the name of the column containing hashtag frequency
- Set values for minimum and maximum size: Gephi will map hashtag frequency values against this numerical domain.
- Click Apply
- You may want to modify minimum and maximum values and click Apply again.
In this step, we apply a layout algorithm for network spatialisation. We will use ForceAtlas2, a force-directed layout, where nodes tend to be pushed apart, and edges attract nodes. The result is a continuous layout that seeks a balance between these two opposing forces. Parameters can be tweaked to improve the layout. For more information on reading force-directed layouts in visual network analysis, you can refer to this paper.
- In Overview, locate the Layout panel (usually in the right bottom part of the interface)
- From the drop-down menu, choose a layout algorithm. We will use ForceAtlas2
- Modifying the parameters in the menu will change the layout behaviour. You can over to each with the mouse to read a detailed description. This guide focuses on the main ones.
- Scaling and Gravity: these two parameters in combination can control how sparse or dense your network will be. Increase scaling for a more sparse graph. Increase Gravity for a denser graph.
- LinLog mode: for very connected networks, it may be useful to check this option. The LinLog option modifies how distance is calculated, making it proportional to the logarithm of the distance (instead of linearly proportional). Choosing LinLog mode will result in more visible clusters.
- After setting all the parameters, click Run to run the layout algorithm. As long as you do not press Stop, Gephi will continue to calculate the position of the nodes, trying to balance the forces of repulsion and attraction.
- You may want to modify parameters according to the aspect of the network. The process may take some time to be able to identify clusters of nodes.
- When done, press Stop to stop the layout algorithm.
- The Prevent overlap function may be used at the end of the process to avoid overlapping nodes. Check the box and click Run again.
Community detection techniques are helpful in identifying tightly connected nodes (i.e. clusters of most-used hashtags together). In this step, we apply a community detection algorithm and colour nodes accordingly.
- In Overview, go to the Statistics panel
- Run Modularity
- In the Modularity settings box, you can set resolution value: lower to get more communities and higher to get fewer bigger communities.
- Click Ok and close the Report box
- Go to the Appearance panel and select Nodes
- In the Appearance panel, click on the Color icon (small icon of the colour palette)
- Choose Partition. From the drop-down menu, select Modularity Class and Apply