Citation Graph Analysis

Anthony Reidy, 18369643. Kian Sweeney, 18306226.

Preamble

An initial study of researcher relationships within DCU’s school of computing revealed little collaboration. In this investigation, we explored this observation further. First, we scraped faculty member’s Google Scholar and DORAS profiles to extract nuggets of information using the BeatuifulSoup and Selenium libraries on pySpark. Next, we explored our inital observation further by employing graph algorithms using GraphX. Overall, we believe that the results generated from this investigation are crucial for understanding research partnerships within DCU. We hope that our work may provide useful information which will enable cooperation, especially for early researchers who may not know the expertise and influence of certain faculty members.

Report

Our midway and final report can be found in the reports folder.

Technologies Used

Graph Analytics

We utilise GraphX (scala) to implement our graph algrorithims. Sbt assembly is used to create a scala aplication where a endpoint exists for every graph algorithim. The bash script used to start the master/worker nodes and run the graph algorithms can be found here. The scala app can be found here

Pypark anaylysis

Our pyspark analysis can be found in this notebook

Video Link

The link to the video showcasing us running the described technologies can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
construct-collaborations-graphs		construct-collaborations-graphs
data		data
graph-anayltics		graph-anayltics
images		images
reports		reports
scholar		scholar
.gitignore		.gitignore
README.md		README.md
construct-graph-spark.ipynb		construct-graph-spark.ipynb
scraped-data-cleaning-analysis.ipynb		scraped-data-cleaning-analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Citation Graph Analysis

Anthony Reidy, 18369643. Kian Sweeney, 18306226.

Table of Contents

Preamble

Report

Technologies Used

Graph Analytics

Pypark anaylysis

Video Link

About

Releases

Packages

Contributors 2

Languages

reidya3/CitationGraphAnalysis

Folders and files

Latest commit

History

Repository files navigation

Citation Graph Analysis

Anthony Reidy, 18369643. Kian Sweeney, 18306226.

Table of Contents

Preamble

Report

Technologies Used

Graph Analytics

Pypark anaylysis

Video Link

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages