A recipe for finding associated hashtags from a co-hashtag network
This recipe illustrates a way to streamline the process of hashtag snowballing, that is, starting from one selection of hashtags, choosing other associated hashtags and creating a more extensive hashtags list. This process can be helpful to expand a list of hashtags for querying social media (Twitter or Instagram) to obtain a set of posts. This recipe assumes that you have an initial list of hashtags (provided by domain experts or otherwise compiled) which you have used to collect a co-hashtag network (i.e. network of hashtags used together). This is based on the notion of “query snowballing” (Rogers, 2018). We will use Gephi and Google Spreadsheet to explore the co-hashtag network looking for other interesting hashtags.
This recipe can be used to work with any co-hashtag network (e.g., from Twitter or Instagram). It assumes you already have a working network file that can be opened with Gephi (.gdf, .gephi, or two tables with edges and nodes data).
- Open the co-hashtag network with Gephi. Depending on what type of file you have (.gephi, .gdf), you might need to set some import options.
- Go to the Data Laboratory and select the edges table.
- Click on Export Table, choose a filename and a folder and save the table in a .csv format.
- Go to the Data Laboratory and select the nodes table.
- Click on Export Table, choose a filename and a folder and save the table in a .csv format.
- Create a new file in Google Spreadsheet
- Import nodes table: File → Import → Select nodes table
- In the import file box, remember to select “Replace current sheet”
- In the same Spreadsheet, create a new sheet by clicking on the + button on the left bottom part of the interface.
- In the new sheet, Import edges table: File → Import → Select edges table.
- In the import file box, remember to select “Replace current sheet.”
- Rename the two sheets by double-clicking on their name in the bottom part of the interface. Rename them as “nodes” and “edges” to avoid confusion
The edges table describes connections between hashtags and the strength of those connections (weight), that is, how many times two hashtags have been used together. Depending on how the network dataset is generated, this might change, but most commonly, connections among nodes in the edges table are described by a numerical Id that uniquely identifies a node. Each unique numerical Id is coupled with the hashtag name in the column Label in the nodes table. In this step, we use a spreadsheet function (i.e., vertical lookup) to copy hashtag names from the nodes table to the edges table. It will help to explore the dataset in the next step. Practically, we will add two columns in the edges table: source label and target label, and we will populate these columns with hashtags names from the nodes table.
- Create a new empty column next to the Source column, and name it “Source Label.”
- In the first cell of the “Source Label” column, write the following function and press enter:
=VLOOKUP(A2,nodes!A:B,2,false)
(This assumes that the nodes table is in a sheet named “nodes”, and the first two columns (A and B) are the Id column and the Label column.)
3. Drag down the function from the first cell by double-clicking on the right bottom corner of the cell
4. Create a new empty column next to the Target column, and name it “Target Label.”
5. In the first cell of the “Target Label” column, write the following function and press enter:
=VLOOKUP(C2,nodes!A:B,2,false)
(This assumes that the nodes table is in a sheet named “nodes”, and the first two columns (A and B) are the Id column and the Label column.)
6. Drag down the function from the first cell by double-clicking on the right bottom corner of the cell
Now that we have prepared a table with co-hashtag connections and their strength (i.e. weight), we can explore hashtags and expand the list.
- Select an interesting hashtag on the Source Label column table and filter the table
- Order edges table by weight to identify most connected hashtags
- When you spot an interesting hashtag from the most connected ones, you can copy it into a new sheet
- Reiterate for multiple hashtags