-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include the generated files into Python library #384
Include the generated files into Python library #384
Conversation
# Conflicts: # deploy-python.sh # requirements.txt # setup.py
This is pretty non-idiomatic python
I would say if we are truly reusing a separate package's concept of gene, then let's coordinate with them, have them release a vrs package to pypi, and just reuse that. If we truly need to vary this then use standard vendoring patterns. I don't understand the need for versioning at the module/sub-package level. Why not just use standard python versioning? Poetry has really good support for this, we use this elsewhere in Monarch. Just have releases 1.x.x. and 2.x.x on PyPI, and at some point 3.0.0rc1 and then 3.0.0, etc. Follow standard semantic versioning. One exception to this is when breaking changes are introduced, e.g. with pydantic v2 there is a compatibility layer A more radical approach would be to have ga4gh at the top level, have subpackages for ga4gh.phenopackets,vrs,... I don't think ga4gh is coherent enough yet though, so I think your better having phenopackets at the top level. I can see the point about aligning with the protobuf, but is the protobuf ideally organized? Perhaps this could be reorganized after v2? (and perhaps use something less restrictive than protobuf...) |
Hi @cmungall yes, these are good points.
I assume you are referring to the So yes, we would need VRS people to release Java and Python artifacts. I am not sure, however, I would be heard there. Perhaps @cmungall, @julesjacobsen or @pnrobinson can convince them that they must deploy versions to PyPi if they want their code to be used?
Two things. First, we already have this structure in the repo from some reason (see the screenshot above for
Protobuf indeed has some quirks and I would be happy to participate in discussions regarding what is the best for In overall, I think your points regarding the namespaces, versions, and protobuf are good. Do you think they should be addressed within this PR or addressing them later would be sufficient? |
To add to the above, I would support exploring a linkML approach to a version 2.1 or 3 of the schema. There a many moving parts that constrain the current version to have some of the oddities it has above, but these are implementation details that extremely few users will care about, and software should hide them for everybody but developers. |
@ahwagner 👀 👆 |
I'd like to add that all planning and provisions for a potential v3 of Phenopackets must originate, assess requirements, be discussed, and receive approval in the context of GA4GH ClinPheno. @ielis, if you aren't already, let's ensure you onboard to GA4GH and join the ClinPheno WS. @pnrobinson @julesjacobsen @cmungall @ahwagner @mcourtot -- 👀 👆 |
Hi @monicacecilia it is understood that the updates must be discussed in the manner you outlined. This PR makes no backward incompatible changes - everything that worked in |
…enable generating Python type stubs.
Include Python type stubs in `phenopackets` library
I don't have any strong feelings on what should be done in the short term. It seems we have to do some unusual things to make up for limitations of protobuf. I can be of more use when we start planning on v3 and do things in a more conventional way, without protobuf constraints. |
Thanks for the tag @monicacecilia. @ielis, FWIW, the VRS implementation in Phenopackets carries a lot of unusual artifacts by virtue of being "round-tripped" from JSON Schema (VRS native) -> protobuf (Phenopackets native) -> JSON Schema. The VRS team does maintain both JSON Schema and Python implementations for VRS, I've linked the VRS v1.2 specs that were used with Phenopackets v2. |
Thanks @ahwagner! Maybe this is known already but it looks like there are significant deviations E.g. if we look at the SequenceInterval in this phenopackets example: And compare with VRS SequenceInterval: https://github.com/ga4gh/vrs/blob/76542a903b913110e67811885a8958625bbc3aae/schema/vrs.yaml#L301-L326
|
Hello @ahwagner thank you very much for your comment and for pointing out the Python implementation. Unfortunately, the data classes of the VRS Python implementation most likely cannot be imported/reused in Phenopacket Schema, and we must refer to their protobuf definitions vrsatile.proto and vrs.proto. The protobuf definitions are namespaced at # Import a Phenopacket Schema element
from phenopackets.schema.v2.phenopackets_pb2 import Phenopacket
# Import a VRS element
from ga4gh.vrs.v1.vrs_pb2 import Gene The issue with above is that the bindings are generated into two top-level packages: This PR proposes to move the VRS stuff within PS, to result in imports like this (Python): from phenopackets.schema.v2.phenopackets_pb2 import Phenopacket
from phenopackets.vrs.v1.vrs_pb2 import Gene and all the other languages will stay the same. |
Yes–the protobuf implementation was there to help the Phenopackets team implement VRS under Protobuf, but we don't support protobuf as part of the GA4GH Standard or reference implementation precisely to avoid the deviations that we have observed in the Phenopackets project, caused by translating the Protobuf'd VRS schema back to JSON Schema. My recommendation is as it was at the time this decision was made by the Phenopackets team: Phenopackets in JSON Schema should use the GA4GH standard representation of VRS, not an alternate schema derived from VRS-protobuf and sharing the VRS name. Relying on the standard representation would also enable reuse of Python libraries for VRS. Assuming that using the standard VRS JSON Schema is off the table, then I think you are unfortunately left with publishing the phenopackets-flavored VRS as part of the phenopackets package, as you suggest. Ideally this could be distinguished from the VRS standard in the documentation in some way. |
It looks like there is no remaining controversy, I will merge this so we can move forward unless somebody protests in the next week. There are various issues with advantages and disadvantages, but I think we will leave major changes for a future version 3.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only request is that the documentation reflects this is not the standard JSON representation of VRS, otherwise I have no concerns. Would we be able to add that to the scope of this PR?
That makes sense. We need to add a comment to this page: https://github.com/ga4gh/vrs-protobuf/tree/d045bb0c65152a0cb32177dfc21148cc13d40fbe |
I think it is as simple as adding a
|
…not care as long as we get the expected exception type.
Thanks, everybody, I think we are all set! |
The PR improves the code of
phenopackets
- the library for working with phenopackets in Python.Current state of things
phenopackets
consists of the Python bindings for Phenopacket Schema building blocks. The library allows creating phenopackets programatically, as well as JSON and protobuf I/O.All building blocks are exported from the top level package:
or
The issue
We do not "namespace" the building blocks. For instance, to use VRS Gene, we do:
This is not good, because
Gene
is here to stay and we cannot add anotherGene
to the Schema without breaking changes. On top of this, we cannot work with >1 Schema versions (unlike Java).The proposal
The PR updates the deployment script to maintain the hierarchy set by the proto files. For the time being (open to discussion), we keep the
v2.0.2
elements at the top-level, in order to not break existing code. So, the code below will still work:However, we need to discourage the users from using this from now on and this is the new way to import the building blocks:
These imports are longer, but they provide several benefits
v1.0.0
elements, or the future iterations. The following should work after the PR is merged:@julesjacobsen @cmungall @pnrobinson please have a look and let me know your thoughs. I hope the PR is not too big, we tried to document well. I am not asking @iimpulse for review since we worked on this together.