flavoenzymes

⚠️ Important:

run all the commands from the root of the repo
On windows, replace / with \ when specifying path
If you normally run python3 instead of python, try it if having difficulty setting up virtual environment

Getting Started

Prerequisites

You must have Python 3.3 or above.
- Check whether you do by running python --version or python --version.
You must have pip installed.
- Check whether you do by running pip3 --version or pip --version

Quick start

python env_setup.py
# IMPORTANT: activate your virtual environment using instructions printed from the command above
pip install -r requirements.txt
python scrape_flavoenzymes.py

If you get stuck, follow these instructions:

Virtual environment setup

Create virtual environment.
- python modules/helpers/env_setup.py
Activate the virtual environment

if you don't, all packages will be installed to your global environment, if you are ok with that, skip this step
- On MacOS or Linux run:
  - source flav_env/bin/activate
- On Windows run:
  - flav_env\Scripts\activate.bat
Install dependancies within the environment.
- pip install -r requirements.txt

Run the pipeline

Scraping all the data

python scrape_flavoenzymes.py

More information:

This will try to scrape all the information from all the websites that have been configured.
If existing file is found in ./export/scraped_flavoenzymes.json the programm will only update it if new entries will be found.
Inside the modules/scrapers you can find blacklist.csv and whitelist.csv. These files allow you to add some enzymes that should be always skipped or always fetched. Try using this approach before harcoding something in the code.

Loading data into Neo4j

Here is the list of useful commands to run

Importing files

Create from URL

WITH "https://raw.githubusercontent.com/supervanya/flavoenzymes/master/export/kegg.json" AS url

Create from local file

WITH "kegg.json" AS url

Create from JSON

if creating from a local file replace link with file name and place file within import folder of Neo4j

WITH "https://raw.githubusercontent.com/supervanya/flavoenzymes/master/export/kegg.json" AS url
CALL apoc.load.json(url) YIELD value AS enzymes
UNWIND keys(enzymes) AS enzName
	MERGE (e:Enzyme {name: enzName})
    
    FOREACH (subsName in enzymes[enzName].SUBSTRATE | 
    	MERGE (s:Substrate {name: subsName})
        MERGE (s)<-[:binds]-(e)
    )
    
    FOREACH (prodName in enzymes[enzName].PRODUCT |
    	MERGE (p:Product {name: prodName})
        MERGE (p)<-[:releases]-(e)
    )

Queries

Show all nodes (this will limit to 300 or your settings)

MATCH (n) return n

25 enzymes with anything they bind

MATCH (n:Enzyme) 
RETURN (n)-[:binds]->()
LIMIT 25

25 enzymes with anything they bind and release

MATCH (n)
RETURN ()<-[:releases]-(n)-[:binds]->() 
LIMIT 25

Specific enzyme with all links

MATCH p=(e:Enzyme)-->()
WHERE e.ec="ec:1.2.99.7" 
RETURN p

MATCH (e:Enzyme)
MATCH path = (e)-[]->(s:Substrate)
RETURN path;

Other Modules

BruceSorter: A CLI to help with sorting flavoenzymes and filtering out false positives

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.vscode		.vscode
archive		archive
export		export
modules		modules
static		static
.gitignore		.gitignore
GLOBALS.py		GLOBALS.py
Makefile		Makefile
README.md		README.md
env_setup.py		env_setup.py
requirements.txt		requirements.txt
scrape_flavoenzymes.py		scrape_flavoenzymes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

flavoenzymes

Getting Started

Quick start

Run the pipeline

Loading data into Neo4j

Importing files

Create from URL

Create from local file

Create from JSON

Queries

Show all nodes (this will limit to 300 or your settings)

25 enzymes with anything they bind

25 enzymes with anything they bind and release

Specific enzyme with all links

Other Modules

About

Releases

Packages

Contributors 2

Languages

supervanya/flavoenzymes

Folders and files

Latest commit

History

Repository files navigation

flavoenzymes

Getting Started

Quick start

Run the pipeline

Loading data into Neo4j

Importing files

Create from URL

Create from local file

Create from JSON

Queries

Show all nodes (this will limit to 300 or your settings)

25 enzymes with anything they bind

25 enzymes with anything they bind and release

Specific enzyme with all links

Other Modules

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages