Friday, 10 January 2020

Social media network Analysis: Going from TAGS to Gephi

In a previous post the use of the brilliant Twitter capture and analysis tool TAGS developed by Martin Hawksey (@mhawksey) was introduced, click here to read it. In this post, I am going to show using the data collected and transferring some of it into a Network Visualisation tool Gephi.


Setting up Gephi
The software is open-source and can be downloaded from https://gephi.org/ Gephi does have some system requirements, which can be found here, but it should run on lot of machines (Mac, PC and Linux with a graphics card) but Java will need to added or already installed. 

I have found a problem, and I was very relieved to find out I was not the only one, with Gephi not working because I couldn't finding Java. The bug is known and there is a fix is available and how to do it can be found at https://bryanemichael.wordpress.com/2017/04/11/gephi-0-9-1-wont-work-unless-you-specific-a-java-folder/ (Thankyou  @NomadWarMachine for sharing this) or on a Mac http://forum-gephi.org/viewtopic.php?f=3&t=3580#p10873

Hopefully Gephi now works



Transferring the data from TAGS 
We are going to have a play with showing who replies to who (crude method but shows some of the links happening), The video below shows the stages in action




Stage 1: Open the TAGS spreadsheet and go to Archive sheet. Save this sheet as CSV file.

Stage 2: In a spreadsheet package open the new CSV file and remove the first column (id_str). If we save it again as a CSV file it is ready to load in Gephi. Don't close down the spreadsheet we will want it later.

Stage 3: Open Gephi and from the file menu import the spreadsheet. It will give an error, just change the Import as: to Nodes table and then press Next > and then Finish and then ok. You will probably get a graph with lots of dots (e.g. like the figure below) but not much more there are no connections between the nodes.



Stage 4: Go back to the spreadsheet used in stage 2, in this example we are only interested in connections based on replies, so we delete all columns except from_user and in_reply_to_screen_name and now save the spreadsheet as a CSV file but with a different file name. Change from_user to Source and in_reply_to_screen_name to Target, Gephi needs these two headings for the next stage and now save again.

Stage 5: Now go back into Gephi and import the new spreadsheet from stage 4 in this time it should find Source and Target and not give an error. Follow the clicks through it will say at some point that there is missing values for Target ignore it but make the merged setting be set to something like Sum. You should now get a plot a bit like the one seen in the video with nodes connected by edges in most cases. If we go to the bottom of the graph there is an arrow icon; click on this and we add labels to the graph, it is worth playing around with this to work out the operation of this function



Now we have it in Gephi we apply different ways to present and analyse the network. This will be discussed in a later blog post.

Some more information on this and some other techniques can be found at:








All views and opinions are the author's and do not necessarily reflected those of any organisation they are associated with. Twitter: @scottturneruon

2 comments: