Skip to content

Graph Analysis of citations among DCU researchers using GraphX (Apache Spark).

Notifications You must be signed in to change notification settings

reidya3/CitationGraphAnalysis

Repository files navigation

collaborations
Citation Graph Analysis

Anthony Reidy, 18369643. Kian Sweeney, 18306226.


Table of Contents

Preamble

An initial study of researcher relationships within DCU’s school of computing revealed little collaboration. In this investigation, we explored this observation further. First, we scraped faculty member’s Google Scholar and DORAS profiles to extract nuggets of information using the BeatuifulSoup and Selenium libraries on pySpark. Next, we explored our inital observation further by employing graph algorithms using GraphX. Overall, we believe that the results generated from this investigation are crucial for understanding research partnerships within DCU. We hope that our work may provide useful information which will enable cooperation, especially for early researchers who may not know the expertise and influence of certain faculty members.

Report

Our midway and final report can be found in the reports folder.

Technologies Used

Tech_used

Graph Analytics

We utilise GraphX (scala) to implement our graph algrorithims. Sbt assembly is used to create a scala aplication where a endpoint exists for every graph algorithim. The bash script used to start the master/worker nodes and run the graph algorithms can be found here. The scala app can be found here

Pypark anaylysis

Our pyspark analysis can be found in this notebook

Video Link

The link to the video showcasing us running the described technologies can be found here.

Releases

No releases published

Packages

No packages published