Skip to content

Latest commit

 

History

History
591 lines (527 loc) · 26.7 KB

ricgraph_install_configure.md

File metadata and controls

591 lines (527 loc) · 26.7 KB

Install and configure Ricgraph

This page describes how to install Ricgraph for a single user on Linux. If you would like to use Ricgraph in a multi-user environment on Linux, you will need to install Ricgraph differently. Please read Ricgraph as a server on Linux. In case you have no idea what would be the best for your situation, please install Ricgraph for a single user on Linux, as described on this page. Continue reading here if you would like to install Ricgraph on Windows (not recommended).

On this page you can find:

Return to main README.md file.

Ricgraph Makefile

A Ricgraph installation involves a number of steps. Ricgraph uses a Makefile to make installation of (parts of) Ricgraph easier. Such a Makefile automates a number of these steps. A Makefile command is executed by typing:

make [target]

To use the Ricgraph Makefile, first go to your home directory on Linux and then download it, by typing:

cd
wget https://raw.githubusercontent.com/UtrechtUniversity/ricgraph/main/Makefile

In the example above, the [target] specifies what has to be done. Assuming that you are in your home directory, you can execute one of these commands to find the possible targets:

make
make help
make all

Most often, you do not need to install the make command, but if you get a "command not found" error message, you need to install it using your Linux package manager.

If you read the documentation below or on page Ricgraph as a server on Linux, you will notice that some sections start with mentioning a Makefile command. That means, that if you execute that command, the steps in that section will be done automatically. Sometimes, you will have to do some post-install steps, e.g. because you have to choose a password for the graph database.

Installation instructions for a single user

Ricgraph can use two graph database backends: Neo4j and Memgraph.

Neo4j

Neo4j has several products:

Memgraph

Memgraph is an in memory graph database and therefore faster than Neo4j. However, it has not been tested extensively with Ricgraph yet. Read Install and start Memgraph.

Requirements

The easiest method for using Ricgraph is by using a Linux virtual machine (VM) such as you can create using VirtualBox. A VM of size 25GB with 4GB memory will work. Of course, this depends on the (size of the) sources you plan to harvest and the capabilities of your computer. The more, the better. The author uses a VM of 35GB with 10GB memory and 3 vCPUs on an 11th gen Intel i7 mobile processor.

Ricgraph has been developed with Python 3.11. For some features you need at least Python 3.9. E.g., if you have Ubuntu 20.04, you can install Python 3.11 as follows:

  • Login as user root.
  • Type the following commands:
    add-apt-repository ppa:deadsnakes/ppa
    apt install python3.11
    
  • Exit from user root.

Steps to take

  1. Install your graph database backend (choose one of these):
  2. Download Ricgraph.
  3. Use a Python virtual environment and install Python requirements.
  4. Create and update the Ricgraph initialization file. This is also the place where you specify which graph database backend you use.
  5. Start
  6. Execute queries and visualize the results.

Other things you might want to do, if you use Neo4j:

Install Neo4j Desktop

To install, you can either use the Ricgraph Makefile and execute command make install_neo4j_desktop, or follow the steps below.

  1. Install Neo4j Desktop Edition (it is free). To do this, go to the Neo4j Deployment Center. Go to section "Neo4j Desktop". Choose the latest version of Neo4j Desktop. Download the Linux version. It is an AppImage, so it can be installed and used without root permissions. You will be asked to fill in a form before you can download. In the following screen you will be given a "Neo4j Desktop Activation Key". Save it.
  2. The downloaded file is called something like neo4j-desktop-X.Y.Z-x86_64.AppImage, where X.Y.Z is a version number. Make it executable using "chmod 755 [filename]".

Post-install steps Neo4j Desktop

  1. Start Neo4j Desktop by clicking on the downloaded file.
  2. Accept the license.
  3. Enter your activation key in the right part of the screen. Click "Activate". If you do not have a key, fill in the left part of the screen. Click "Register with Email". Wait awhile.
  4. Choose whether you would like to participate in anonymous reporting.
  5. You may be offered updates for Neo4j Desktop components, please update.
  6. Move your mouse to "Example Project" in the left column. A red trash can icon appears. Click it to remove the Example Project database "Movie DBMS". Confirm. Then wait awhile.
  7. The text "No projects found" will appear. Create a project by clicking the button "+ New Project".
  8. The text "Project" appears with the text "Add a DBMS to get started". Click on the "+ Add" button next to it and select "Local DBMS". Leave the name as it is ("Graph DBMS") and fill in a password. Click "Create". Also, insert the password in field graphdb_password in the Ricgraph initialization file, see below.
  9. Exit Neo4j Desktop using the "File" menu and select "Quit". If your database was active a message similar to "Your DBMS [name] is running, are you sure you want to quit" appears, choose "Stop DBMS, then quit".
  10. Ready.

Now we need to find the port number which Neo4j Desktop is using:

  1. Start Neo4j Desktop. Start the Graph DBMS.
  2. Click on the words "Graph DBMS". At the right (or below, depending on the width of the Neo4j Desktop window) a new screen appears. Look at the tab "Details". Note the port number next to "Bolt port" (the default value is 7687). Insert this port number in field graphdb_port in the Ricgraph initialization file, see below.
  3. Ready.

Install Bloom configuration

This is only necessary if you plan to use Bloom. If you don't know, skip this step for now, you can come back to it later.

  1. Start Neo4j Desktop.
  2. Click on the icon on the left side of Neo4j Desktop.
  3. Click on "Neo4j Bloom". A new window appears.
  4. In this window, click on the icon at the top left. A Bloom "Perspective" slides out (Neo4j has an extensive description how to use it).
  5. Click on "neo4j > Untitled Perspective 1".
  6. A new window appears. Right of the words "Untitled Perspective 1" there are three vertical dots. Click on it. Click on "Delete". The perspective "Untitled Perspective 1" is removed.
  7. In the same window, right of the word "Perspectives" click on the word "Import". A file open window appears. Go to directory neo4j_config that is part of Ricgraph and select file ricgraph_bloom_config.json. Click "Open". The perspective "ricgraph_bloom_config" is loaded.
  8. Click on the text "ricgraph_bloom_config".
  9. Note that the text "neo4j > Untitled Perspective 1" has been changed in "neo4j > ricgraph_bloom_config".
  10. A few centimeters below "neo4j > ricgraph_bloom_config", just below the text "Add category", click on the oval "RicgraphNode". At the right, a new window will appear.
  11. In this window, below the word "Labels", check if an oval box with the text "RicgraphNode" is shown. If not, click on "Add labels", click on "RicgraphNode".
  12. Click on the icon to go back to the main screen of Bloom.
  13. Click on the cog icon below , you might want to set "Use classic search" to "on".
  14. Ready.

Download Ricgraph

To you use the Ricgraph Makefile, this will be done automatically while creating a Python virtual environment (see the following section).

You can choose two types of downloads for Ricgraph:

  • The latest released version. Go to the Release page of Ricgraph, choose the most recent version, download either the zip or tar.gz version.
  • The "cutting edge" version. Go to the GitHub page of Ricgraph, click the green button "Code", choose tab "Local", choose "Download zip".

Use a Python virtual environment and install Python requirements

To do this, you can either use the Ricgraph Makefile and execute command make install_ricgraph_as_singleuser, or follow the steps below.

To be able to use Ricgraph, you will need a Python virtual environment. Virtual environments are a kind of lightweight Python environments, each with their own independent set of Python packages installed in their site directories. A virtual environment is created on top of an existing Python installation. There are two ways of doing this:

  • Using Python's venv module;
  • Using a Python Integrated development environment (IDE).

Using Python's venv module

Using a Python Integrated development environment (IDE)

  • Using a Python Integrated development environment (IDE), such as PyCharm. An IDE will automatically generate a virtual environment, and any time you use the IDE, it will "transfer" you to that virtual environment. It will also help to execute and debug your scripts.
    • If PyCharm does not automatically generate a virtual environment, you need to go to File --> Settings --> Project: [your project name] --> Python Interpreter, and check if there is a valid interpreter in the right column next to "Python Interpreter". If not, add one, using "Add Interpreter", and choose for example "Add Local Interpreter". A venv will be generated.

    • Next, unzip or tar xf the downloaded file for Ricgraph (see previous section).

    • Install the Python requirements. Depending on the Python IDE, single or double-click on file requirements.txt. Probably, a button or text appears that asks you to install requirements. Click on it.

      If this does not work, type in the IDE (PyCharm) Terminal:

      pip3.11 install -r requirements.txt
      

      You may want to change 3.11 in pip3.11 for the Python version you use.

Notable dependencies used in Ricgraph:

  • PyAlex. PyAlex is a Python library for OpenAlex. OpenAlex is an index of hundreds of millions of interconnected scholarly papers, authors, institutions, and more. OpenAlex offers a robust, open, and free REST API to extract, aggregate, or search scholarly data. PyAlex is a lightweight and thin Python interface to this API.

Ricgraph initialization file

Ricgraph requires an initialization file. A sample file is included as ricgraph.ini-sample. You need to copy this file to ricgraph.ini and modify it to include settings for your graph database backend, and API keys and/or email addresses for other systems you plan to use.

Settings for graph database backend

Ricgraph has a [GraphDB] section where you have to specify the graph database backend that you will be using. First, you will need to set the parameter graphdb to the graph database backend name (you can choose between neo4j and memgraph). Further down that section, you will have to fill in six parameters for hostname, port number, username, etc. The comments in the initialization file explain how to do that.

Extending Ricgraph with new properties in the nodes

Optionally, you can extend Ricgraph by adding new properties of nodes. Before you can do this, download Ricgraph.

RICGRAPH_NODEADD_MODE

There is a parameter RICGRAPH_NODEADD_MODE in the initialization file which influences how nodes are added to Ricgraph. Suppose we harvest a source system and that results in the following table:

FULL_NAME ORCID
Name-1 0000-0001-1111-1111
Name-2 0000-0001-1111-2222
Name-3 0000-0001-1111-2222
Name-4 0000-0001-1111-3333

Name-2 and Name-3 have the same ORCID. This may be correct, e.g. if Name-2 is a name variant of Name-3, e.g. John Doe vs J. Doe, but it also may not be correct, e.g. if Name-2 is John and Name-3 is Peter (possibly caused by a typing mistake in a source system). There is no way for Ricgraph to know which of these two options it is.

RICGRAPH_NODEADD_MODE can be either strict or lenient:

  • strict (default setting): only add nodes to Ricgraph which conform to the model described in the Implementation details. In the example above, ORCID 0000-0001-1111-2222 will not be inserted.
  • lenient: add every node. In the example above, ORCID 0000-0001-1111-2222 will be inserted.

This will have the following consequences:

  • strict: since ORCID 0000-0001-1111-2222 will not be inserted, a research output from a person with that ORCID may not be inserted in Ricgraph. Or the research output will be inserted, but it might not be linked to the person with this ORCID.

  • lenient: as has been described Implementation details, person-root "represents" a person. Person identifiers (such as ORCID) and research outputs are connected to the person-root node of a person. That means that the person-root node is connected to everything a person has contributed to.

    In the example above, ORCID 0000-0001-1111-2222 is inserted. That means that the person-roots of the two persons Name-2 or Name-3 are "merged" and that all research outputs of Name-2 and Name-3 will be connected to one person-root node. After this has been done, there is no way to know which research output belongs to Name-2 or Name-3.

    As said, that is fine if Name-2 and Name-3 are name variants, but not fine if they are different names. (Side note: if you want to capture spelling variants, you may want to use a fuzzy string match library such as TheFuzz.)

Lenient is advisable if the sources you harvest from do not contain errors. However, the author of Ricgraph has noticed that this does not occur often, therefore the default is strict.

Using Ricgraph

Before you can do anything with Ricgraph, you need to harvest sources, see Ricgraph harvest scripts. After you have harvested sources, you can execute queries and visualize the results, see Query and visualize Ricgraph.

Dumping and restoring the Ricgraph database

Depending on your situation (whether you use Neo4j Desktop or Neo4j Community Edition), this section lists the methods for dumping and restoring the Ricgraph database:

Create a Neo4j Desktop database dump of Ricgraph

To create a Neo4j Desktop database dump of Ricgraph, follow these steps:

  1. Start Neo4j Desktop if it is not running, or stop the graph database if it is running.
  2. Hoover over the name of your graph database (probably "Graph DBMS"), and click on the three horizontal dots at the right.
  3. Select "Dump".
  4. Your graph database will be dumped. This may take a while. When it is ready, a message appears.
  5. Ready.

Create a Neo4j Community Edition database dump of Ricgraph

To do this, you can either use the Ricgraph Makefile and execute command make dump_graphdb_neo4j_community, or follow the steps below.

To create a Neo4j Community Edition database dump of Ricgraph, follow these steps:

  1. Login as user root.
  2. Stop Neo4j Community Edition:
    systemctl stop neo4j.service
    
  3. To be able to restore a Neo4j database dump you need to set several permissions on /etc/neo4j:
    chmod 640 /etc/neo4j/*
    chmod 750 /etc/neo4j
    
  4. Do the database dump:
    neo4j-admin database dump --expand-commands system --to-path=[path to database dump directory]
    neo4j-admin database dump --expand-commands neo4j --to-path=[path to database dump directory]
    
  5. Start Neo4j Community Edition:
    systemctl start neo4j.service
    
  6. Check the log for any errors, use one of:
    systemctl -l status neo4j.service
    journalctl -u neo4j.service
    
  7. Exit from user root.

Restore a Neo4j Desktop database dump of Ricgraph in Neo4j Desktop

To restore a Neo4j Desktop database dump of Ricgraph in Neo4j Desktop, follow these steps:

  1. Start Neo4j Desktop if it is not running, or stop the graph database if it is running.
  2. Click on the button "Add" on the right side of "Project" and select "File".
  3. Select the file "neo4j.dump" from a previous Neo4j Desktop database dump. This file will be added to the "File" section a little down the "Project" window.
  4. Hoover over this file and click on the three horizontal dots at the right.
  5. Select "Create new DBMS from dump".
  6. Give it a name, e.g. "Graph DBMS from import file".
  7. When asked, enter the password you have specified in the Ricgraph initialization file (this saves you from entering a new password in that file).
  8. A new local graph database is being created. This may take a while.
  9. Hoover over the newly created graph database and click "Start" to run it.
  10. Once it is active, install the Bloom configuration.
  11. Now you are ready to explore the data using Bloom or Ricgraph Explorer.

Restore a Neo4j Desktop database dump of Ricgraph in Neo4j Community Edition

To restore a Neo4j Desktop database dump of Ricgraph in Neo4j Community Edition, follow these steps:

  1. Login as user root.
  2. Stop Neo4j Community Edition:
    systemctl stop neo4j.service
    
  3. To be able to restore a Neo4j database dump you need to set several permissions on /etc/neo4j:
    chmod 640 /etc/neo4j/*
    chmod 750 /etc/neo4j
    
  4. Save the old database:
    cd /var/lib/neo4j
    mv data/ data-old
    
  5. Go back to your working directory and restore the database dump:
    cd
    neo4j-admin database load --expand-commands neo4j --from-path=[path to database dump directory] --overwrite-destination=true
    
    For path to database dump directory, specify the path, not the path and the name of the database dump file (this name is neo4j.dump, it will be inferred automatically by the neo4j-admin command).
  6. Set the correct permissions on /var/lib/neo4j/data:
    cd /var/lib/neo4j
    chown -R neo4j:neo4j data
    
  7. Start Neo4j Community Edition:
    systemctl start neo4j.service
    
  8. Check the log for any errors, use one of:
    systemctl -l status neo4j.service
    journalctl -u neo4j.service
    
  9. In your web browser, go to http://localhost:7474/browser.
  10. Neo4j will ask you to login, use username neo4j and password neo4j.
  11. Neo4j will ask you to change your password, for the new password, enter the password you have specified in the Ricgraph initialization file (this saves you from entering a new password in that file).
  12. Restart Ricgraph Explorer if you use Ricgraph in a multi-user environment:
    systemctl restart ricgraph_explorer_gunicorn.service
    
  13. Check the log for any errors, use one of:
    systemctl -l status ricgraph_explorer_gunicorn.service
    journalctl -u ricgraph_explorer_gunicorn.service
    
  14. Done. If all works well you might want to remove your old database:
    cd /var/lib/neo4j
    rm -r data-old
    
  15. Exit from user root.

Restore a Neo4j Community Edition database dump of Ricgraph in Neo4j Community Edition

To do this, you can either use the Ricgraph Makefile and execute command make restore_graphdb_neo4j_community, or follow the steps below.

To restore a Neo4j Community Edition database dump of Ricgraph in Neo4j Community Edition, follow these steps:

  1. Login as user root.
  2. Stop Neo4j Community Edition:
    systemctl stop neo4j.service
    
  3. To be able to restore a Neo4j database dump you need to set several permissions on /etc/neo4j:
    chmod 640 /etc/neo4j/*
    chmod 750 /etc/neo4j
    
  4. Save the old database:
    cd /var/lib
    mv neo4j/ neo4j-old
    mkdir /var/lib/neo4j
    
  5. Go back to your working directory and restore the database dump:
    cd
    neo4j-admin database load --expand-commands system --from-path=[path to database dump directory] --overwrite-destination=true
    neo4j-admin database load --expand-commands neo4j --from-path=[path to database dump directory] --overwrite-destination=true
    
    For path to database dump directory, specify the path, not the path and the name of the database dump file, it will be inferred automatically by the neo4j-admin command.
  6. Set the correct permissions on /var/lib/neo4j/data:
    cd /var/lib
    chown -R neo4j:neo4j neo4j
    
  7. Start Neo4j Community Edition:
    systemctl start neo4j.service
    
  8. Check the log for any errors, use one of:
    systemctl -l status neo4j.service
    journalctl -u neo4j.service
    
  9. Restart Ricgraph Explorer if you use Ricgraph in a multi-user environment:
    systemctl restart ricgraph_explorer_gunicorn.service
    
  10. Check the log for any errors, use one of:
    systemctl -l status ricgraph_explorer_gunicorn.service
    journalctl -u ricgraph_explorer_gunicorn.service
    
  11. Done. If all works well you might want to remove your old database:
    cd /var/lib
    rm -r neo4j-old
    
  12. Exit from user root.

Ricgraph on Windows

If you would like to install Ricgraph on Windows, you are very probably the first person to do so, as far as known. The creator of Ricgraph has no experience in developing software on Windows. So please let me know which steps you have taken, so I can add them to this documentation. If you are a Windows user, I would recommend to create a Linux virtual machine using e.g. VirtualBox, and install Ricgraph in that virtual machine as described above.