Skip to content

Commit

Permalink
Merge pull request #384 from ielis/python/include-generated-files
Browse files Browse the repository at this point in the history
Include the generated files into Python library
  • Loading branch information
pnrobinson authored Jul 4, 2024
2 parents 04bbf84 + 7372760 commit 804a217
Show file tree
Hide file tree
Showing 27 changed files with 640 additions and 200 deletions.
45 changes: 45 additions & 0 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# This workflow will generate a Python protobuf bindings and type stubs with Maven and run Python tests.

name: Python CI with Maven and Pytest

on:
push:
branches: [ "master" ]
pull_request:
branches: [ "master" ]
workflow_dispatch:

jobs:
run-python-ci:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']

steps:
- uses: actions/checkout@v4
with:
submodules: recursive

- name: Set up JDK 17
uses: actions/setup-java@v4
with:
java-version: '17'
distribution: 'temurin'

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Generate Python bindings with Maven
run: ./mvnw -B package -DskipTests # we run tests elsewhere

- name: Install Python bindings
run: |
cd python && python3 -m pip install .[test]
- name: Run Python tests
run: |
cd python && pytest
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ buildNumber.properties
# Avoid ignoring Maven wrapper jar file (.jar files are usually ignored)
!maven-wrapper.jar

# VS Code files
.vscode/

### JetBrains template
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio

Expand Down Expand Up @@ -127,6 +130,10 @@ nb-configuration.xml
### Python template
!requirements.txt

# We do not track the generated Protobuf files for now.
python/**/*_pb2.py
python/**/*_pb2.pyi

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
2 changes: 1 addition & 1 deletion .mvn/wrapper/maven-wrapper.properties
Original file line number Diff line number Diff line change
@@ -1 +1 @@
distributionUrl=https://repo1.maven.org/maven2/org/apache/maven/apache-maven/3.8.1/apache-maven-3.8.1-bin.zip
distributionUrl=https://repo1.maven.org/maven2/org/apache/maven/apache-maven/3.9.6/apache-maven-3.9.6-bin.zip
77 changes: 26 additions & 51 deletions deploy-python.sh
Original file line number Diff line number Diff line change
@@ -1,30 +1,15 @@
# Create Temporary Destination
# Phenopackets folder
TEMP_DIRECTORY=$(mktemp -d)
echo "Building phenopacket distribution files in temporary directory at $TEMP_DIRECTORY"
TEMP_DIRECTORY_PYTHON_MODULE="$TEMP_DIRECTORY/phenopackets"
TEMP_DIRECTORY_TESTS_MODULE="$TEMP_DIRECTORY/tests"
TEMP_DIRECTORY_VIRTUAL_ENV="$TEMP_DIRECTORY/phenopackets-venv"
declare -a pyfiles=("base" "phenopackets" "biosample" "disease" "genome" "individual" "interpretation" "medical_action" "measurement" "meta_data" "pedigree" "phenotypic_feature" "vrsatile")
# Functions
createInitFile(){
echo "import pkg_resources" >> "$TEMP_DIRECTORY/phenopackets/__init__.py"
echo "__version__ = pkg_resources.get_distribution('phenopackets').version" >> "$TEMP_DIRECTORY/phenopackets/__init__.py"
for i in "${pyfiles[@]}"
do
echo "from .${i}_pb2 import *" >> "$TEMP_DIRECTORY/phenopackets/__init__.py"
done
}
#!/usr/bin/env bash

replaceImports(){
for i in "${pyfiles[@]}"
do
sed -i '' 's/from phenopackets.schema.v2.core/from . /g' "$TEMP_DIRECTORY_PYTHON_MODULE/${i}_pb2.py"
sed -i '' 's/from ga4gh.vrsatile.v1/from . /g' "$TEMP_DIRECTORY_PYTHON_MODULE/${i}_pb2.py"
sed -i '' 's/from ga4gh.vrs.v1/from . /g' "$TEMP_DIRECTORY_PYTHON_MODULE/${i}_pb2.py"
done
}
# Build and Deploy Python Package
# We assume the script is ran from the top-level repository folder as ./deploy-python.sh

DIRECTORY=./python
echo "Building phenopacket distribution files in directory at $DIRECTORY"

# Ensure we generated the protobuf Python files.
./mvnw clean compile

cd $DIRECTORY || { echo "Deployment FAILED. Couldn't find directory" ; exit 1; }
createVirtualEnvironment(){
echo "Creating Python virtual environment at ${1}"
python3 -m venv "${1}" &> /dev/null
Expand All @@ -39,42 +24,32 @@ createVirtualEnvironment(){
echo "Virtual environment created successfully";
}

# Create python module
mkdir $TEMP_DIRECTORY_PYTHON_MODULE
createInitFile
cp ./target/generated-sources/protobuf/python/phenopackets/schema/v2/phenopackets_pb2.py $TEMP_DIRECTORY_PYTHON_MODULE
cp ./target/generated-sources/protobuf/python/phenopackets/schema/v2/core/* $TEMP_DIRECTORY_PYTHON_MODULE
cp ./target/generated-sources/protobuf/python/ga4gh/vrsatile/v1/vrsatile_pb2.py $TEMP_DIRECTORY_PYTHON_MODULE
cp ./target/generated-sources/protobuf/python/ga4gh/vrs/v1/vrs_pb2.py $TEMP_DIRECTORY_PYTHON_MODULE
replaceImports
# Create tests module
mkdir $TEMP_DIRECTORY_TESTS_MODULE
cp ./src/test/python/* $TEMP_DIRECTORY_TESTS_MODULE
# Copy Packaging files
cp requirements.txt setup.py pom.xml LICENSE README.rst $TEMP_DIRECTORY

# Create Python venv in virtual directory
TEMP_DIRECTORY_VIRTUAL_ENV="phenopackets-venv"
createVirtualEnvironment $TEMP_DIRECTORY_VIRTUAL_ENV
cd $TEMP_DIRECTORY || { echo "Deployment FAILED. Couldn't cd to temp directory" ; exit 1; }
# shellcheck disable=SC1090
source "$TEMP_DIRECTORY_VIRTUAL_ENV/bin/activate"
pip install -r "$TEMP_DIRECTORY/requirements.txt"
# Dependencies for building/deploying
python3 -m pip install setuptools wheel twine xmltodict || { echo "Deployment FAILED. Failed to install python dependencies" ; exit 1; }

# Test
pip install -e .
python3 setup.py test || { echo "Deployment FAILED. Unittest Failure" ; exit 1; }
# Build
python3 setup.py sdist bdist_wheel || { echo "Deployment FAILED. Building python package" ; exit 1; }
python3 -m pip install ".[test]"
pytest || { echo "Deployment FAILED. Unittest Failure" ; exit 1; }

# Install dependencies for building/deploying
python3 -m pip install build twine || { echo "Deployment FAILED. Failed to install python dependencies" ; exit 1; }
# Build
python3 -m build || { echo "Deployment FAILED. Building python package" ; exit 1; }
# Deploy - Remove --repository testpypi flag for production.
if [ $1 = "release-prod" ]; then
if [ "$1" = "release-prod" ]; then
python3 -m twine upload dist/*
elif [ $1 = "release-test" ]; then
elif [ "$1" = "release-test" ]; then
python3 -m twine upload --repository testpypi dist/*
else
echo "Python Release was prepared successfully. No release argument provided, use one of [release-prod, release-test] to make the production/test release."
fi



# Clean up
echo "Cleaning up the build environment and the build files"
deactivate
rm -rf build dist ${TEMP_DIRECTORY_VIRTUAL_ENV}
cd ..
./mvnw clean
140 changes: 140 additions & 0 deletions docs/python.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
.. _rstpython:

###################################
Working with Phenopackets in Python
###################################

Similarly to :ref:`Java <rstjava>`, the :ref:`Phenopacket Schema <rstschema>` can be considered the source of truth
for the specification, and the JSON produced by an arbitrary implementation can be used to inter-operate
with other services. Nevertheless, we **strongly** suggest to use the `phenopackets` library available
from Python Package Index (PyPi) or use the Python bindings generated by Protobuf compiler from the Protobuf files.

Here we provide a brief overview of the `phenopackets` library.


Install `phenopackets` into your Python environment
***************************************************

The `phenopackets` package can be installed from PyPi by running:

.. code-block:: shell
python3 -m pip install phenopackets
We use `pip` to install `phenopackets` and the required libraries/dependencies.


Create building blocks programmatically
***************************************

Let's start by importing all building blocks of Phenopacket Schema v2:

>>> import phenopackets.schema.v2 as pps2

Now we can access all building blocks of v2 Phenopacket Schema via `pps2` alias.

For instance, we can create an :ref:`Ontology class <rstontologyclass>` that corresponds to a Human Phenotype Ontology
term for *Spherocytosis* (`HP:0004444`):

>>> spherocytosis = pps2.OntologyClass(id='HP:0004444', label='Spherocytosis')
>>> spherocytosis # doctest: +NORMALIZE_WHITESPACE
id: "HP:0004444"
label: "Spherocytosis"

All schema building blocks, including `OntologyClass`, are available under `pps2` alias, and can be created with constructors that accept key/value arguments.
The constructors will not allow passing of arbitrary attributes:

>>> pps2.OntologyClass(foo='bar')
Traceback (most recent call last):
...
ValueError: Protocol message OntologyClass has no "foo" field.

We do not have to provide all attributes at the creation time and we can set the fields sequentially
using Python property syntax, to achieve the same outcome:

>>> spherocytosis2 = pps2.OntologyClass()
>>> spherocytosis2.id = 'HP:0004444'
>>> spherocytosis2.label = 'Spherocytosis'
>>> spherocytosis == spherocytosis2
True

However, setting the field values with property syntax only works for
`singular <https://protobuf.dev/reference/python/python-generated/#singular-fields-proto3>`_ (non-message) fields,
such as `bool`, `int`, `str`, or `float`, and the assignment will *NOT* work for message fields:

>>> pf = pps2.PhenotypicFeature()
>>> pf.type = spherocytosis # doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
AttributeError: Assignment not allowed to composite field "type" in protocol message object.

To set a message field, we must use the `CopyFrom` function:

>>> pf.type.CopyFrom(spherocytosis)
>>> pf # doctest: +NORMALIZE_WHITESPACE
type {
id: "HP:0004444"
label: "Spherocytosis"
}

Last, a repeated field can be set using list-like semantics:

>>> modifiers = (
... pps2.OntologyClass(id='HP:0003623', label='Neonatal onset'),
... pps2.OntologyClass(id='HP:0011010', label='Chronic'),
... )
>>> pf.modifiers.extend(modifiers)
>>> pf # doctest: +NORMALIZE_WHITESPACE
type {
id: "HP:0004444"
label: "Spherocytosis"
}
modifiers {
id: "HP:0003623"
label: "Neonatal onset"
}
modifiers {
id: "HP:0011010"
label: "Chronic"
}

See `Protobuf documentation <https://protobuf.dev/reference/python/python-generated/#repeated-fields>`_
for more info.


Building blocks I/O
*******************

Having an instance with data, we can write the content into Protobuf's wire format:

>>> binary_str = pf.SerializeToString()
>>> binary_str
b'\x12\x1b\n\nHP:0004444\x12\rSpherocytosis*\x1c\n\nHP:0003623\x12\x0eNeonatal onset*\x15\n\nHP:0011010\x12\x07Chronic'

and get the same content back:

>>> pf2 = pps2.PhenotypicFeature()
>>> _ = pf2.ParseFromString(binary_str)
>>> pf == pf2
True

We can also dump the content of the building block to a *JSON* string or to a `dict` with Python objects using
`MessageToJson <https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html#google.protobuf.json_format.MessageToJson>`_
or `MessageToDict <https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html#google.protobuf.json_format.MessageToDict>`_
functions:

>>> from google.protobuf.json_format import MessageToDict
>>> json_dict = MessageToDict(pf)
>>> json_dict
{'type': {'id': 'HP:0004444', 'label': 'Spherocytosis'}, 'modifiers': [{'id': 'HP:0003623', 'label': 'Neonatal onset'}, {'id': 'HP:0011010', 'label': 'Chronic'}]}

We complete the JSON round-trip using
`Parse <https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html#google.protobuf.json_format.Parse>`_
or `ParseDict <https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html#google.protobuf.json_format.ParseDict>`_
functions:

>>> from google.protobuf.json_format import ParseDict
>>> pf2 = ParseDict(json_dict, pps2.PhenotypicFeature())
>>> pf == pf2
True

7 changes: 7 additions & 0 deletions docs/variant.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,13 @@ Variation
be it a genomic, transcript or protein variation. VRS also provides mechanisms for representing haplotypes and systemic
variation such as Copy Number Variants (CNVs).

.. note::

When introduced in Phenopacket Schema v2, a protobuf version of VRS (github.com/ga4gh/vrs-protobuf)
was derived from the source VRS representation in JSON schema and used for phenopackets.
The `vrs-protobuf` message structure is losslessly transformable but syntactically distinct
from the native VRS JSON schema.

.. _rstvcfrecord:

VcfRecord
Expand Down
1 change: 1 addition & 0 deletions docs/working.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ produced as part of the build (:ref:`rstjavabuild`).
:maxdepth: 1

Working with Phenopackets in Java <java>
Working with Phenopackets in Python <python>
Working with Phenopackets in C++ <cpp>

Security disclaimer
Expand Down
Loading

0 comments on commit 804a217

Please sign in to comment.