Integrate other databases #6

jacobwindsor · 2017-03-12T14:54:43Z

Currently, this only ranks through PUBCHEM's API. It would be nice to use other APIs to rank compounds. Then probably rename this project too. We would have to discuss how other databases are implemented. I.e. simply rank by the total number of "hits" across all databases, or allow filtering of search parameters, who knows. Probably the algorithm needs to be a bit more complex to get an accurate indication of the amount of data available for each compound in the dataset

Other databases (please add):

Google scholar (would have to pay a little bit)
Scifinder (don't think it's possible)
WikiData
Brenda (http://www.brenda-enzymes.org/)
KEGG (although Pubchem already uses KEGG)
HMDB
WikiPathways
ChEBI (https://www.ebi.ac.uk/chebi/)

jacobwindsor · 2017-03-27T07:50:16Z

Some more databases:

And can use this one to get SMILES from IUPAC

jacobwindsor · 2017-04-21T18:13:19Z

After some preliminary research it seems MetaCyc is the easiest to add since they have a REST API. They even have a nice service to search for foreign keys (e.g. PubChem or KEGG), see here.

However, the only issue is that you have to search on an organism specific basis. The url to search is something like:

http://websvc.biocyc.org/[ORGID]/foreignid?ids=[DATABASE-NAME]:[FOREIGNID]

Where ORGID is the organism ID.

@DeniseSl22 Is it okay to make the ranker only usable for human datasets for now? It should be easy to add other organisms in the future. However, bare in mind that the more databases are added, the harder it will be to keep the organism restriction broad since some databases may support fewer organisms.

DeniseSl22 · 2017-04-24T06:15:12Z

Yeah sure. Is PubChem then searched for humans only as well?
Perhaps we can add a option in the future where people can say which organism they want to filter on ;)

DeniseSl22 · 2017-04-24T09:07:04Z

Oh btw; Egon just told me there is a new service (I will get the details through mail) which allows automated search through articles (for a lot of publishers, not Elsevier). Perhaps we can do something with that as well (I remembered you told me that a specific search through literature was really missing when you guys were looking at the VOCs dataset)

DeniseSl22 · 2017-04-24T10:08:43Z

Here the info from Egon:
CrossRef API (citation counts): https://github.com/CrossRef/rest-api-doc/blob/master/rest_api.md
EuroPubMedCentral API: http://europepmc.org/RestfulWebService#cites
Initiative for Open Citations: https://i4oc.org/

jacobwindsor · 2017-04-24T10:11:38Z

Hmm cool! CrossRef I guess is the most well known so can integrate that first.

jacobwindsor · 2017-04-24T20:14:24Z

Using MetaCyc, the flow is:

Get the MetaCyc ID using the PubChem ID with `https://metacyc.org/META/foreignid?ids=PUBCHEM:&fmt=json
Retrieve the set of MetaCyc objects concerning that compound with http://websvc.biocyc.org/apixml?fn=[API-FUNCTION]&id=[ORGID]:[OBJECT-ID]&detail=[none|low|full]

The second step is what is needed to be discussed. What information do we actually want to retrieve from MetaCyc? If you see here, there is quite a lot we can do.

The obvious ones are:

pathways-of-compound
reactions-of-compound

But, there are some others in this list that could be interesting. Potentially, you can go however deep you like - getting the ID required for the next query from the previous query.

all-products-of-gene
binding-site-transcription-factors
chromosome-of-gene
compounds-of-pathway
containers-of
containing-tus
direct-activators
direct-inhibitors
enzymes-of-gene
enzymes-of-pathway
enzymes-of-reaction
genes-of-pathway
genes-of-protein
genes-of-reaction
genes-regulated-by-gene
genes-regulating-gene
modified-containers
modified-forms
monomers-of-protein
pathways-of-compound
pathways-of-gene
reactions-of-compound
reactions-of-enzyme
reactions-of-gene
regulator-proteins-of-transcription-unit
regulon-of-protein
substrates-of-reaction
top-containers
transcription-unit-activators
transcription-unit-binding-sites
transcription-unit-genes
transcription-unit-inhibitors
transcription-unit-mrna-binding-sites
transcription-unit-promoter
transcription-unit-terminators
transcription-unit-transcription-factors
transcription-units-of-gene
transcription-units-of-protein

@egonw and @DeniseSl22 could you provide some input?

egonw · 2017-04-26T07:55:57Z

I would go to number of pathways and number of substrates...

DeniseSl22 · 2017-05-16T07:22:49Z

Hi Jacob,

Just found some info on the ChEBI website that they have an API....
Perhaps useful to add this to the Ranker Program?

https://www.ebi.ac.uk/chebi/libchebi.do

jacobwindsor · 2017-05-16T09:25:22Z

Oh wow! How did I not see that?

For my reference: here's the API library for Python

DeniseSl22 · 2017-05-16T11:00:59Z

?Yeah I am a awesome googler :p Kind regards, Denise Slenter MSc UNS50 H1.302 T: +316-50585586

…

________________________________ From: Jacob Windsor <notifications@github.com> Sent: Tuesday, May 16, 2017 11:25 To: jacobwindsor/pubchem-ranker Cc: Slenter Denise (BIGCAT); Mention Subject: Re: [jacobwindsor/pubchem-ranker] Integrate other databases (#6) Oh wow! How did I not see that? For my reference: here's the API library for Python<https://github.com/libChEBI/libChEBIpy> - You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#6 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AZD3yJtY7r2-TyRShp0SX3Cx0okuCjHEks5r6WuDgaJpZM4Maj8J>.

DeniseSl22 · 2017-05-17T07:06:55Z

Oh and another one I can across (HMDB API):
https://github.com/mzmine/mzmine2/issues/195

I think you didn't look at this, cause Egon already checked if the compounds were in HMBD and ChEBI (which a lot f them weren't). SO, this could help other people to find which compounds they do not have to investigate any further :)

jacobwindsor added enhancement help wanted question labels Mar 12, 2017

jacobwindsor self-assigned this Mar 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate other databases #6

Integrate other databases #6

jacobwindsor commented Mar 12, 2017 •

edited by DeniseSl22

Loading

jacobwindsor commented Mar 27, 2017

jacobwindsor commented Apr 21, 2017

DeniseSl22 commented Apr 24, 2017

DeniseSl22 commented Apr 24, 2017

DeniseSl22 commented Apr 24, 2017

jacobwindsor commented Apr 24, 2017

jacobwindsor commented Apr 24, 2017 •

edited

Loading

egonw commented Apr 26, 2017

DeniseSl22 commented May 16, 2017

jacobwindsor commented May 16, 2017

DeniseSl22 commented May 16, 2017 via email

DeniseSl22 commented May 17, 2017

Integrate other databases #6

Integrate other databases #6

Comments

jacobwindsor commented Mar 12, 2017 • edited by DeniseSl22 Loading

jacobwindsor commented Mar 27, 2017

jacobwindsor commented Apr 21, 2017

DeniseSl22 commented Apr 24, 2017

DeniseSl22 commented Apr 24, 2017

DeniseSl22 commented Apr 24, 2017

jacobwindsor commented Apr 24, 2017

jacobwindsor commented Apr 24, 2017 • edited Loading

egonw commented Apr 26, 2017

DeniseSl22 commented May 16, 2017

jacobwindsor commented May 16, 2017

DeniseSl22 commented May 16, 2017 via email

DeniseSl22 commented May 17, 2017

jacobwindsor commented Mar 12, 2017 •

edited by DeniseSl22

Loading

jacobwindsor commented Apr 24, 2017 •

edited

Loading