Skip to content

Protein Annotations in PINT

Salvador Martínez de Bartolomé edited this page Jul 29, 2021 · 9 revisions

Annotations

The annotations that are automatically imported are UniProtKB and OMIM

  • UniProtKB The pint annotations module takes care of retrieving and storing at the server the protein annotations of all the proteins stored in the server.
    These annotations will be stored at the server folder at: PINT_HOME_PATH/uniprot It will create a subfolder for each new release that is found, and it will do it automatically.

  • OMIM The import of OMIM annotations is made using the API provided at (https://omim.org/api). But in order to use it, you have to register in OMIM to get an API key that you should include in the configuration.

Other external resources that are linked:

How to use the pint.annotations module

First, define where are you going to store the uniprot annotations in your computer:

final File uniprotReleasesFolder = new File("C:\\Users\\salvador\\Desktop\\uniprotKB");

Then, create an UniprotProteinLocalRetriever object using that location:

final UniprotProteinLocalRetriever uplr = new UniprotProteinLocalRetriever(uniprotReleasesFolder, true);

Then, use that object to query your accession numbers of interest:

Collection<String> accs = # here is where I have my Uniprot Accession numbers
final Map<String, Entry> annotatedProteins = uplr.getAnnotatedProteins(null, accs);

Note that you can query as many accessions as you want at once. It will make multiple remote queries to the EBI Uniprot service in parallel, with 200 proteins per query (maximum defined by EBI).
This will download the XML formatted Uniprot entries (i.e. P12345), and index it so that subsequent queries of the same accessions are retrieved fast.
Once the query has finished, you obtain a map in which the keys are the accessions of the proteins and the values are the uniprot Entry objects.
Note that in Uniprot proteins have multiple accession numbers pointing to the same entry. This query will do the same and for example, if you query the protein P12345, you will notice that it is also defined by the secondary accession G1SKL2. The map recovered will have the keys "P12345" and "G1SKL2" pointing to the same Entry object.

Then, the object Entry is the one representing the XML structure of the Uniprot entries.
However, you have multiple useful static methods in the class UniprotEntryUtil that you will find very handy, such as:
UniprotEntryUtil static methods *