-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate other databases #6
Comments
Some more databases:
And can use this one to get SMILES from IUPAC |
After some preliminary research it seems MetaCyc is the easiest to add since they have a REST API. They even have a nice service to search for foreign keys (e.g. PubChem or KEGG), see here. However, the only issue is that you have to search on an organism specific basis. The url to search is something like:
Where ORGID is the organism ID. @DeniseSl22 Is it okay to make the ranker only usable for human datasets for now? It should be easy to add other organisms in the future. However, bare in mind that the more databases are added, the harder it will be to keep the organism restriction broad since some databases may support fewer organisms. |
Yeah sure. Is PubChem then searched for humans only as well? |
Oh btw; Egon just told me there is a new service (I will get the details through mail) which allows automated search through articles (for a lot of publishers, not Elsevier). Perhaps we can do something with that as well (I remembered you told me that a specific search through literature was really missing when you guys were looking at the VOCs dataset) |
Here the info from Egon: |
Hmm cool! CrossRef I guess is the most well known so can integrate that first. |
Using MetaCyc, the flow is:
The second step is what is needed to be discussed. What information do we actually want to retrieve from MetaCyc? If you see here, there is quite a lot we can do. The obvious ones are:
But, there are some others in this list that could be interesting. Potentially, you can go however deep you like - getting the ID required for the next query from the previous query.
@egonw and @DeniseSl22 could you provide some input? |
I would go to number of pathways and number of substrates... |
Hi Jacob, Just found some info on the ChEBI website that they have an API.... |
Oh wow! How did I not see that? For my reference: here's the API library for Python |
?Yeah I am a awesome googler :p
Kind regards,
Denise Slenter MSc
UNS50 H1.302
T: +316-50585586
…________________________________
From: Jacob Windsor <notifications@github.com>
Sent: Tuesday, May 16, 2017 11:25
To: jacobwindsor/pubchem-ranker
Cc: Slenter Denise (BIGCAT); Mention
Subject: Re: [jacobwindsor/pubchem-ranker] Integrate other databases (#6)
Oh wow! How did I not see that?
For my reference: here's the API library for Python<https://github.com/libChEBI/libChEBIpy>
-
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#6 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AZD3yJtY7r2-TyRShp0SX3Cx0okuCjHEks5r6WuDgaJpZM4Maj8J>.
|
Oh and another one I can across (HMDB API): I think you didn't look at this, cause Egon already checked if the compounds were in HMBD and ChEBI (which a lot f them weren't). SO, this could help other people to find which compounds they do not have to investigate any further :) |
Currently, this only ranks through PUBCHEM's API. It would be nice to use other APIs to rank compounds. Then probably rename this project too. We would have to discuss how other databases are implemented. I.e. simply rank by the total number of "hits" across all databases, or allow filtering of search parameters, who knows. Probably the algorithm needs to be a bit more complex to get an accurate indication of the amount of data available for each compound in the dataset
Other databases (please add):
The text was updated successfully, but these errors were encountered: