-
Notifications
You must be signed in to change notification settings - Fork 0
Adding a new Bank Provider
This document is intended fo the following audience: developers.
BioDocumentProvider was originally designed to query and get sequence information from public web services: NCBI Entrez eUtils service (NCBI, Bethesda, USA) and EB-eye Search service (EBI, Hinxton, UK).
However, the software uses the Model-View-Controller (MVC) paradigm as well as a simple plugin architecture to enable the addition of new services, even ones that do not serve sequence data.
To achieve its business work, BioDocumentProvider (BDV) connects to these web services using what we call a BankProvider.
In turn, a BankProvider defines the list of available bank(s) for that service. Each such bank is a BankType. And this BankType is responsible for providing the MVC implementation:
- a QueryModel: it defines the fields that are available to query the remote server;
- a QueryEngine: it defines the controller, i.e. the entity responsible for managing transactions with the remote server. This entity is closely associated to a ServerConfiguration;
- a Search and a Summary: the data models handled by BDV to present query results to the user. These two data models are augmented by the presentation layer, called SummaryDocPresentationModel.
Now, all these entities are Java interfaces you have to implement to setup a new data bank provider for BDV.
##Real services
To understand how to implement such services, you can have a look at those already available in BDV to query and retrieve sequence data from NCBI, EBI and Ensembl.
BDV was originally designed to handle NCBI Entrez databanks system (NCBI, Bethesda, USA) and more precisely to query, retrieve and display sequence information out of nucleotide and protein banks.
Then, it was augmented to use EBI-Search databanks system (EBI, Hinxton, UK), again to deal with sequence information.
Finally, more recently I have added an Ensembl Service to deal with a different type of data: variations. A way to show that BDV is opened to manage different data flavours related to sequences.
Those three services are contained in appropriate packages, as follows:
##How to create a new service?
To illustrate how to add a new BankProvider service, let's take the example of the NCBI one; it is actually the one for which BDV was originally designed for.
###Step 1 - analyze the remote API
Considering NCBI databanks access, one can use the NCBI Entrez eUtils service. This is a URL-based API relying upon the use of several sevices among which:
- ESearch: to query NCBI given some terms such as sequence ID, gene name, publication date, organism, etc.
- ESummary: to retrieve short descriptions of these sequences given their IDs.
- EFetch: to query NCBI with a sequence ID and retrieve the full entry.
NCBI remote services will serve BDV with some data we have to deal with.
###Step 2 - prepare data models
Playing with the three NCBI services above mentioned led us to the design of the data model ready to be used by BDV. This data model is made of concrete implementations of Search and Summary interfaces. They are served by these two classes:
- EntrezSearchLoader: handles NCBI XML data served by ESearch service and creates BDV Search instances
- EntrezSummaryLoader: handles NCBI XML data served by ESummary service and creates BDV Summary instances
During the preparation of these classes, we had to deal with official NCBI DTDs: esearch.dtd and esummary-v1.dtd. They were used to automatically generate Java/XML binding classes using JAXB framework, part of the official Oracle's Java SDK. DTDs are located here and generated classes are in packages bzh.plealog.bioinfo.docviewer.service.ncbi.model.esearch and bzh.plealog.bioinfo.docviewer.service.ncbi.model.esummary.
###Step 3 - setup the query engine
The QueryEngine is the component used by BDV to handle connections to the remote service. Here, we target NCBI, so we setup these classes:
- EntrezServerConfiguration: contains the URL templates to use by BDV to query NCBI services;
- EntrezQueryEngine: contains the controller enabling BDV to connect to NCBI.
###Step 4 - setup the query model
As you may have seen when using BDV, it provides a graphical query editor. More precisely, you have to know that this editor directly derivates from BLAST Filter Tool. It is used to filter out BLAST results using a set of contraints called a filter.
In the context of BDV, we made a particular use of BLAST Filter Tool: we just wanted to reuse the graphical filter tool, not the filtering engine. For that purpose, we had to setup a BFilter-based data model and a concrete implementation of QueryModel interface.
Considering NCBI, we actually designed a specific query data model to target each specific databank we wanted to query, e.g. nucleotides, proteins, structures, etc. The root data model is:
and it is associated to several other ones that serve the business of:
- setting up a filter data model for each NCBI databank we want to use with BDV; e.g. ProteinQueryModel;
- converting an NCBI Entrez query into a BFilter instance; e.g. EntrezQueryExpressionBuilder.
###Step 5 - setup the list of banks
To provide list of banks available for query for a particular service, we have to implement BankType interface.
Considering NCBI service, we setup: EntrezBank.
###In short...
All in all, the entire NCBI service is contained in package src.bzh.plealog.bioinfo.docviewer.service.ncbi.
EBI and [Ensembl(https://github.com/pgdurand/BioDocumentViewer/tree/master/src/bzh/plealog/bioinfo/docviewer/service/ensembl) services (contained in their respective packages) exactly follow the same design as NCBI one. So, follow these principles to implement your own bank service provider.
##Using a plugin
You can setup your own BankProvider as an external JAR, as follows. Among others, this will enable you to maintain your code outside the BDV project.
###Step 1 - create your BankProvider
From the BDV project, make the BDV JAR:
ant makejar
Then copy that JAR, as well as BDV dependencies, to your own project and design the Java code of your own BankProvider.
###Step 2 - package your code
While packaging your code within a JAR file, add the following attribute in your manifest:
doc-viewer-bank-provider=com.foo.bar.MyBankProvider
Where "com.foo.bar.MyBankProvider" is your implementation of a BankProvider. You can also set a comma separated list of class names if you have designed several BankProviders.
###Step 3 - install your plugin
Finally, simply copy your JAR file next to the BDV application; setup "classpath" accordingly.
Start BDV with argument:
-DV_PROVIDER=your-provider-name
Your plugin will be automatically loaded and BDV will display your service.