Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Using the Graphical User Interface

mattb112885 edited this page May 16, 2014 · 2 revisions

Note: The graphical user interface requires installation of the easygui Python module and access to an X server.

Note: These features have been introduced after the initial ITEP release. As such, you might need to do a 'git pull' to obtain these scripts.

We have implemented several of the most common analyses based on genes of interest in a user-friendly graphical interface (GUI). This is useful for many quick and dirty analysis but is not as flexible or as feature-complete as the command line interface. We recommend learning both for maximum utility of the toolkit.

As a reminder, before running the UI (or any other ITEP script) make sure the SourceMe.sh file is sourced so that the UI can find the database and all the dependent libraries. To do this, CD to your ITEP root directory and run:

$ source SourceMe.sh

Finding a gene to analyze

TODO: Needs to be implemented

Single gene analysis

A UI has been implemented to study a single gene and its relatives. To begin, you will need either a supported alias for the gene (e.g. a Locus Tag) or the gene's ITEP ID. Aliases are translated to ITEP IDs using the aliases file (aliases/aliases). Begin by running

$ python gui/SingleGeneAnalysis.py

You should see a screen that looks like the following:

TODO: Screenshot

Type the locus tag or ITEP ID of the gene of interest into the box that appears. Lets suppose that we are interest in studying relatives of the C. beijerinckii phosphofructokinase, Cbei_1843. Type that into the box:

TODO: Screenshot

When you click OK the UI will check to make sure it can find your gene. If you get an error saying the gene is not found, it is probably because the alias is not in the aliases file (aliases/aliases). ITEP can only recognize Locus tags and gene names if they are found in the original Genbank file that was provided as input when it was set up.

In this case we are successful, and are greeted with a new dialog:

TODO: Screenshot

The dialog contains:

  • A list of data on the gene you picked (check to make sure you picked the gene you intended), and
  • A list of analysis options

The data provided for your gene includes its ITEP ID, organism, annotation, contig and gene location.

The following analysis options are available. In all cases you be able to look at the results and will be given an option to save the results to a file for further investigation (in some cases you will be asked if you want a file first for technical reasons).

  • Amino acid FASTA: Get an amino acid FASTA file containing your gene.
  • Gene neighborhood: Get a diagram of neighboring genes (labeled with all available aliases)
  • Get similar genes by BLASTN: Fetches the cached BLASTN results using your gene as a query and returns a table
  • Get similar genes by BLASTP: Fetches the cached BLASTP results using your gene as a query and returns a table
  • Nucleotide FASTA: Get a nucleotide FASTA file containing your gene.
  • Related genes in other organisms: Analyze clustering results for clusters (gene family) containing your gene. See below.
  • Run tBLASTn against a group of organisms: Set up and run a tBLASTn using your gene against a group of organisms in some cluster run
  • Show conserved domain hits: Generates a diagram showing the 10 strongest hits to NCBI's conserved protein domain database (CDD)

To get gene neighborhoods or run tBLASTn you will be asked for a run ID (which specifies the list of organisms and the methods and parameters used to predict protein families). The gene neighborhood script uses this information to identify how to color the neighbors - same colored genes belong to the same gene family. The tBLASTn script uses this information to identify a list of organisms against which to search for relatives to your gene.

Analysis of related genes

When you choose to study "related genes in other organisms" you will be asked to pick a Run ID from the list of available cluster runs (this specifies the organisms and algorithm parameters used to predict which genes are related). The run ID will be used to identify a particular predicted protein family to which your gene belongs.

The following analysis can be done on your protein family of interest using the UI. As before, you will always be given the option to save the results to a file for further analysis.

  • Display a crude tree with neighborhoods attached: Builds a protein tree (using MAFFT and FastTree with no curation of the alignment) for all proteins in the selected protein family, with neighborhoods attached (see this tutorial for an example)
  • Get a presence absence table: Builds a presence-absence table identifying which organisms in the cluster run possess members in the selected protein family and which do not.
  • Get BLAST support for a protein family: Find all the BLAST hits between members of the selected protein family.
  • Get information on related genes: Display the gene IDs, annotations, organisms, locations, and sequences for all the genes in the selected protein family.
  • Make a crude AA alignment: Generate a fast alignment of proteins in the selected family using MAFFT (note this is the alignment that would be used above to make the tree). Returns results in FASTA format.
  • Make a crude Newick tree from AA alignment: Make a Newick tree (does not display the tree, only makes a Newick file for you to save and analyze elsewhere)
  • Make a crude Newick tree with ITEP IDs: Same as above but does not translate ITEP IDs to human readable IDs. This is useful if you want to do further manipulation of the tree or its members using ITEP tools.
  • Make a GML file for import into cytoscape: Create a GML (graph markup language) formatted file of BLAST hits between organisms which can be imported into Cytoscape or other graph displaying programs to visualize the evidence for a cluster. More details are available here.
  • Make amino acid FASTA file: Create a (unaligned) amino acid FASTA file for proteins in the selected family.
  • Make nucleotide FASTA file: Create a (unaligned) nucleic acid FASTA file for proteins in the selected family.
Clone this wiki locally