An initial study of researcher relationships within DCU’s school of computing revealed little collaboration. In this investigation, we explored this observation further. First, we scraped faculty member’s Google Scholar and DORAS profiles to extract nuggets of information using the BeatuifulSoup and Selenium libraries on pySpark. Next, we explored our inital observation further by employing graph algorithms using GraphX. Overall, we believe that the results generated from this investigation are crucial for understanding research partnerships within DCU. We hope that our work may provide useful information which will enable cooperation, especially for early researchers who may not know the expertise and influence of certain faculty members.
Our midway and final report can be found in the reports folder.
We utilise GraphX (scala) to implement our graph algrorithims. Sbt assembly
is used to create a scala aplication where a endpoint exists for every graph algorithim. The bash script used to start the master/worker nodes and run the graph algorithms can be found here. The scala app can be found here
Our pyspark analysis can be found in this notebook
The link to the video showcasing us running the described technologies can be found here.