Skip to content

arvestad/alv

Repository files navigation

PyPI version DOI DOI Downloads Tests Quality Gate Status

alv: a command-line alignment viewer

View your DNA or protein multiple-sequence alignments right at your command line. No need to launch a GUI!

Note: alv requires Python v3.6 or later. Earlier versions may also work, but this has not been tested.

Latest feature additions

  • If you have more than one alignment in your input file, then the first alignment is output unless you use the --alignment-index (-ai) option to choose another.

  • alv is now adapted for use in Python notebooks (tested on Jupyter) through two convenience functions 'view' and 'glimpse'. Both functions take a BioPython alignment object and outputs a view of the alignment.

    Writing

    from Bio import AlignIO
    msa = AlignIO.read('PF00005.fa', 'fasta')
    import alv
    alv.view(msa)
    

    in a Jupyter notebook cell and evaluating will yield a colored alignment in the alv style.

    For large alignments, the glimpse function is convenient since a subset of the alignment, selected as an easily detected conserved region, is shown.

    alv.glimpse(msa)
    

    Look for more usage information view help(alv.view) in a notebook cell.

Features

  • Command-line based, no GUI, so easy to script viewing of many (typically small) MSAs.
  • Reads alignments in FASTA, Clustal, PHYLIP, NEXUS, and Stockholm formats, from file or stdin.
  • Output is formatted to suit your terminal. You can also set the alignment width with option -w.
  • Can color alignments of coding DNA by codon's translations to amino acids.
  • Guesses sequence type (DNA/RNA/AA/coding) by default. You can override with option -t.
  • Order sequence explicitly, alphabetically, or by sequence similarity.
  • Restrict coloring to where you don't have indels or where there is a lot of conservation.
  • Focus on variable columns with the options --only-variable and --only-variable-excluding-indels, contributed by nikostr, that constrains coloring to columns with variation and variation not counting indels.
  • The command alv -g huge_msa.fa displays cut-out of the MSA, guaranteed to fit one terminal page without scrolling or MSA line breaking, that is supposed to give you an idea of alignment quality and contents.
  • Write alv -r 20 huge_msa.fa to get a view of the MSA containing only 20 randomly selected sequences.

Install

Recommended installation is:

pip install --upgrade pip
pip install alv

If you have a half-modern BioPython installed, Python v3.4 should work. BioPython is a dependency and will only get installed automatially with pip install alv if you are using Python v3.6 or later, because BioPython was apparently not on PyPi before that.

Examples

Quick viewing of a small alignment:

alv msa.fa

This autodetects sequence type (AA, DNA, RNA, coding DNA), colors the sequences, and formats the alignment for easy viewing in your terminal. When applying alv to an alignment of coding DNA, the coding property is autodetected and colors are therefore applied to codons instead of nucleotides. Seven coding DNA sequences

View three sequences, accessions a, b, and c, from an alignment:

alv -so a,b,c msa.fa

Feed alignment to less, for paging support.

alv -k msa.fa | less -R

The -k option ensures that alv keeps coloring the alignment (by default, piping and redirection removes colors), and the -R option instructs less to interpret color codes.

Choose to view a sub-alignment:

alv -sa 30 60 msa.fa

This selects and views columns 30 to 59 of msa.fa, keeping track of the "original" columns indexes in the output.

For developers

  • Run pip install -e . to get an "editable" install, while coding.
  • Run python -m build to prepare a distributable file.

Screenshots

Full PFAM domain

All of the sequences in PFAM's seed alignment for PF00005

PF00005 seed MSA

Yeast sequences from PF00005

Using the option -sm YEAST, we reduce the alignment to the ones with a matching accession.

Small MSA from PF00005