-
Notifications
You must be signed in to change notification settings - Fork 0
Protein Annotations in PINT
The annotations that are automatically imported are UniProtKB and OMIM
-
UniProtKB The pint annotations module takes care of retrieving and storing at the server the protein annotations of all the proteins stored in the server.
These annotations will be stored at the server folder at: PINT_HOME_PATH/uniprot It will create a subfolder for each new release that is found, and it will do it automatically. -
OMIM The import of OMIM annotations is made using the API provided at (https://omim.org/api). But in order to use it, you have to register in OMIM to get an API key that you should include in the configuration.
Other external resources that are linked:
- PRIDE Cluster
- IntAct database
- ComplexPortal
First, define where are you going to store the uniprot annotations in your computer:
final File uniprotReleasesFolder = new File("C:\\Users\\salvador\\Desktop\\uniprotKB");
Then, create an UniprotProteinLocalRetriever
object using that location:
final UniprotProteinLocalRetriever uplr = new UniprotProteinLocalRetriever(uniprotReleasesFolder, true);
Then, use that object to query your accession numbers of interest:
Collection<String> accs = # here is where I have my Uniprot Accession numbers
final Map<String, Entry> annotatedProteins = uplr.getAnnotatedProteins(null, accs);
Note that you can query as many accessions as you want at once. It will make multiple remote queries to the EBI Uniprot service in parallel, with 200 proteins per query (maximum defined by EBI).
This will download the XML formatted Uniprot entries (i.e. P12345), and index it so that subsequent queries of the same accessions are retrieved fast.
Once the query has finished, you obtain a map in which the keys are the accessions of the proteins and the values are the uniprot Entry
objects.
Note that in Uniprot proteins have multiple accession numbers pointing to the same entry. This query will do the same and for example, if you query the protein P12345, you will notice that it is also defined by the secondary accession G1SKL2. The map recovered will have the keys "P12345" and "G1SKL2" pointing to the same Entry
object.
Then, the object Entry
is the one representing the XML structure of the Uniprot entries.
However, you have multiple useful static methods in the class UniprotEntryUtil
that you will find very handy, such as:
*
Proteomics Yates Laboratory
Salvador Martínez-Bartolomé (salvador at scripps.edu)
Research Associate
The Scripps Research Institute
10550 North Torrey Pines Road
La Jolla, CA 92037
Git-Hub profile