- can this be the visualised?
- how does the group work? Or rather can we get a sense of how the group works?
- Is there some insights that we can draw about the group from more quantitative approaches - if you like (and I do) social network analysis based on publications.
The particular group is a group of computing researchers based at the University of Northampton. The data comes from the University's repository - NECTAR. It is not expected that the process will reveal a highly nuanced analysis, lots of personal aspects important in research collaboration won't be picked up; but the goal is just to have a starting point.
Collecting the Data.
This is the easiest part, in this case, because the data comes from the repository we just need it in a form suited to the tools that follow. A two-stage process was followed
- Go to http://nectar.northampton.ac.uk/view/divisions/SSTCT.html the page linking to all the papers in the repository categorised as from the computing team;
- From there export the list into Reference Manager/.RIS format the tool for the next stage can take information in that format.
Starting the visualisation: coauthors
My tool of choice is VosViewer (http://www.vosviewer.com/). Once the software is running, find the create button and then load your .RIS file from the previous stage, I select to not include all groups just to visualise the main group (13 records were excluded). Some of the other settings including authors even if they were a co-author on one paper.
In this example we get the following:
Connections appear as straight lines and the maximum possible lines was set to show all the lines. It seems to show 'hubs' the larger the circle the more documents they have in the repository. The last thing I am going to do is save the network as a Pajek network file using the save button on the lefthand menu and select the option.
Putting some numbers to it!
Love the visuals but now I want to quantify the node (the authors) role in the network of co-authors. All of that is really saying can we get some new insights from looking at the network, as a social networking analysis task. My favourite tool for this is the widely used free and open-source Gephi (https://gephi.org/).
Load in the Pajek formatted file saved in the last stage (it should have a .net extension). Once the file is loaded (in the options make sure is set to be an undirected graph), it usually puts the viewer into the Data Laboratory view, change the view to Overview. The graph may now appear as a single circle on the screen, now the fun can begin. Down the lefthand side of the screen there is a Tab marked Layout which brings-up a dropdown menu of different layouts have a play, select one and press run and see what they do.
Down the right side of the screen there a number of options. What we are going to focus on here, as an example, is the Edge Overview and the one option there Average Path Length; find the option and press run a screen will come up saying the three measures (see above) we will get, press ok, and then change the view to Data Laboratory. Below are some initial insights for these measures
- Betweenness Centrality- sorting based on the one showed the 'hubs' that were presented in the first figure score highly. One aspect I wasn't expecting was two MSc Computing student who have published scored quite highly due to the papers they published linking authors who haven't collaborated previously
- Closeness Centrality: Again the 'hubs' scored highly but also so did one of the MSc students.
- Eccentricity: A low eccentricity score was seen for the hubs in general.
The insight I have found most interesting is the idea that MSc students publishing might have a positive knock-on effect on connecting staff researchers.
All views and opinions are the author's and do not necessarily reflected those of any organisation they are associated with. Twitter: @scottturneruon
No comments:
Post a Comment