Citation Network Graph DataBase

About

This project focuses on implementing a graph-oriented database using Neo4j to explore and analyze citation networks.

Dataset

The dataset used in this project is a network of articles and their related information.

The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. Each paper is associated with abstract, authors, year, venue, and title.

The data set can be used for clustering with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc.

DBLP-Citation-network V12: 4,894,081 papers and 45,564,149 citation relationships (2020-04-09)

Deployment

For deployment, the project uses a Neo4j database to store the data. The database can be deployed locally or on a cloud service. For local deployment, download and install Neo4j Desktop.

Database Configuration

Create a new database in Neo4j Desktop and set the following configurations:

Database Name: citation-network
Password: <db_pass>
User: <db_user>
Port: 7687
URI: localhost

Open the database configuration and configure the following settings.
For this project, with a PC with 16GB of RAM, I used the following settings:

server.memory.heap.initial_size=8g
server.memory.heap.max_size=8g
server.memory.pagecache.size=6g

Download the Dataset

Download the dataset from Kaggle and extract 'dblp.v12.json' in ./dataset folder.

Setting the Environment

Create an .env file in the project's root folder and add the following variables with the corresponding values:

DB_URI="localhost:7687"

DB_NAME="citation-network"
DB_PASS="<db_pass>"
DB_USER="<db_user>"

# Optional. If you want to use a test database.
TEST_DB_NAME="citation-network-test"
TEST_DB_PASS="<test_db_pass>"
TEST_DB_USER="<test_db_user>"

DATASET_PATH="./dataset/dblp.v12.json" # If you downloaded a different version, update the file name.

# Optional. If not set, the default values are used.
BATCH_SIZE_PAPER_NODES=5000
BATCH_SIZE_REQUIRED_NODES=10000

Virtual environment

Open a terminal in the project's root folder and run:

python -m venv .venv

Activate virtual environment:

.venv\Scripts\activate

Install Dependencies

Install the required dependencies by running:

pip install -r requirements.pip

Apply Constraints and Relations into the Database

Apply labels in the database by running and following the instructions in the terminal:

cd database/utils
install_labels.bat

Load Data into the Database

Load the data into the database by running and following the instructions in the terminal:

python database/populate_db_batches.py

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
apps		apps
core		core
database		database
dataset		dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.pip		requirements.pip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Citation Network Graph DataBase

About

Dataset

Deployment

Database Configuration

Download the Dataset

Setting the Environment

Virtual environment

Install Dependencies

Apply Constraints and Relations into the Database

Load Data into the Database

Useful Links

Websites

Documentation

Downloads

Courses

About

Releases

Packages

Languages

License

caupolicanre/citation-network-graphdb

Folders and files

Latest commit

History

Repository files navigation

Citation Network Graph DataBase

About

Dataset

Deployment

Database Configuration

Download the Dataset

Setting the Environment

Virtual environment

Install Dependencies

Apply Constraints and Relations into the Database

Load Data into the Database

Useful Links

Websites

Documentation

Downloads

Courses

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages