Skip to content

Commit

Permalink
Add docker volume to create directory to store pickled obo files. Adj…
Browse files Browse the repository at this point in the history
…ust startup script to generate obo graph if not found in mounted host directory. Add shell script to build image, make host directory, and run the container.
  • Loading branch information
nanglo123 committed Sep 19, 2024
1 parent 3cc1ded commit a6a21c3
Show file tree
Hide file tree
Showing 6 changed files with 36 additions and 36 deletions.
15 changes: 1 addition & 14 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Create an initial docker image to generate the graph and transfer it to the
# second docker image
# We do this to avoid involving additional imports in the second docker image
FROM python:3.10-slim AS graph-builder

WORKDIR /graphs
RUN apt-get update && apt-get install -y git
RUN pip install pyobo networkx obonet

# Copy and run the script to generate the pickled graph
COPY generate_graph.py /graphs/generate_graph.py
RUN python generate_graph.py

FROM ubuntu:focal

WORKDIR /sw
Expand Down Expand Up @@ -42,6 +29,7 @@ RUN wget -O /sw/nodes.tsv.gz https://askem-mira.s3.amazonaws.com/dkg/$domain/bui
sed -i 's/#dbms.security.auth_enabled/dbms.security.auth_enabled/' /etc/neo4j/neo4j.conf && \
neo4j-admin import --delimiter='TAB' --skip-duplicate-nodes=true --skip-bad-relationships=true --nodes /sw/nodes.tsv.gz --relationships /sw/edges.tsv.gz

COPY generate_obo_graphs.py /sw/generate_obo_graphs.py
# Python packages
RUN python -m pip install --upgrade pip && \
python -m pip install git+https://github.com/gyorilab/mira.git@main#egg=mira[web,uvicorn,dkg-client,dkg-construct] && \
Expand All @@ -56,6 +44,5 @@ RUN python -m pip install --upgrade pip && \
RUN wget -O /sw/sir_flux_span.json https://raw.githubusercontent.com/gyorilab/mira/main/tests/sir_flux_span.json

RUN mkdir -p /graphs
COPY --from=graph-builder /graphs/relabeled_obo_graph.pkl /graphs/relabeled_obo_graph.pkl
COPY startup.sh startup.sh
ENTRYPOINT ["/bin/bash", "/sw/startup.sh"]
7 changes: 7 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,13 @@ docker run -p 8771:8771 -p 7687:7687 -e MIRA_NEO4J_URL=bolt://0.0.0.0:7687 mira:

This exposes a REST API at `http://localhost:8771`. This also exposes Neo4j's bolt port at port 7687.


Running the `build_run_docker.sh` script builds the docker image,
create directory `docker/mounted_graph_storage` to store the pickled obo
graphs, and start the container. When you first run the script and
start the container, it will take a few minutes to generate and store the
pickled graphs.

## MIRA Metaregistry

The MIRA metaregistry contains the prefixes and their associated metadata for all use cases.
Expand Down
5 changes: 5 additions & 0 deletions docker/build_run_docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash

docker build --tag mira_epi_dkg:latest .
mkdir -p mounted_graph_storage
docker run --detach -v ./mounted_graph_storage:/graphs -p 7474:7474 -p 8771:8771 -p 7687:7687 -e MIRA_NEO4J_URL=bolt://0.0.0.0:7687 --name mira mira_epi_dkg:latest
4 changes: 3 additions & 1 deletion docker/generate_graph.py → docker/generate_obo_graphs.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,13 @@ def download_convert_ncbitaxon_obo_to_graph():
version = get_version(resource_prefix)

# Checks to see if the pickled ncbitaxon obo graph exists in the container
cached_relabeled_obo_graph_path = Path("/graphs/relabeled_obo_graph.pkl")
cached_relabeled_obo_graph_path = Path("/graphs/ncbitaxon_obo_graph.pkl")
if not cached_relabeled_obo_graph_path.exists():
_, obo_path = _ensure_ontology_path(resource_prefix, force=False,
version=version)
obo_graph = read_obo(obo_path)

# Normalize node indices
relabeled_graph = networkx.relabel_nodes(obo_graph,
lambda node_index:
node_index.lower())
Expand Down
10 changes: 10 additions & 0 deletions docker/startup.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
#!/bin/bash

# Check if the ncbitaxon pickled graph file exists
if [ ! -f /graphs/ncbitaxon_obo_graph.pkl ]; then
echo "Pickled ncbitaxon obo graph file not found. Generating it"
python /sw/generate_obo_graphs.py
else
echo "Pickled ncbitaxon obo graph file already exists in the container in
/graphs/"
fi

neo4j start
sleep 100
neo4j status
Expand Down
31 changes: 10 additions & 21 deletions mira/dkg/construct.py
Original file line number Diff line number Diff line change
Expand Up @@ -434,14 +434,11 @@ def extract_ontology_subtree(curie: str, add_subtree: bool = False):
under the corresponding entry's subtree in its respective ontology.
Relation information is also extracted with this option.
Running this method for the first time for each specific resource will
take a long time (minutes) as the obo resource file has to be downloaded,
converted to a networkx graph, have their node indices normalized, and
pickled.
Subsequent runs of this method will take a few seconds as the pickled
Execution of this method will take a few seconds as the pickled
graph object has to be loaded.
Currently we only support the addition of ncbitaxon terms.
Parameters
----------
curie :
Expand All @@ -463,21 +460,13 @@ def extract_ontology_subtree(curie: str, add_subtree: bool = False):
resource_prefix = curie.split(":")[0]
if resource_prefix == "ncbitaxon":
type = "class"
version = get_version(resource_prefix)
cached_relabeled_obo_graph_path = prefix_directory_join(resource_prefix,
name="relabeled_obo_graph.pkl",
version=version)
if not cached_relabeled_obo_graph_path.exists():
_, obo_path = _ensure_ontology_path(resource_prefix, force=False,
version=version)
obo_graph = read_obo(obo_path)
relabeled_graph = networkx.relabel_nodes(obo_graph,
lambda node_index: node_index.lower())
with open(cached_relabeled_obo_graph_path,'wb') as relabeled_graph_file:
pickle.dump(relabeled_graph, relabeled_graph_file)
else:
with open(cached_relabeled_obo_graph_path,'rb') as relabeled_graph_file:
relabeled_graph = pickle.load(relabeled_graph_file)
cached_relabeled_obo_graph_path = (Path(__file__).resolve().parents[2]
/ "docker" /
"mounted_graph_storage" /
"ncbitaxon_obo_graph.pkl")

with open(cached_relabeled_obo_graph_path,'rb') as relabeled_graph_file:
relabeled_graph = pickle.load(relabeled_graph_file)
else:
return nodes, edges

Expand Down

0 comments on commit a6a21c3

Please sign in to comment.