tmerge

Build highly accurate full-length transcripts from third generation sequencing alignments.

Figure: tmerge vs StringTie2 and FLAIR for transcript-level sensitivity and precision. Measurements performed with GFFCompare on 43 SIRV datasets sequenced with ONT.

tmerge compares transcript structures (or read-to-genome alignments) present in the input and attempts to reduce transcript redundancy, i.e., merge compatible input transcripts into non-redundant transcript models.

tmerge is fast and can typically process several millions of aligned long reads in a few minutes.

Installation

pip install tmerge

It is recommended to install tmerge within a virtual environment

Usage

tmerge offers both a CLI and a Python module. The CLI is built upon the Python module and includes several built in "plugins" (see below for description of plugins).

CLI

Once you have installed tmerge via pip, the CLI will be available on your PATH. If you have installed tmerge into a virtual environment, you will need to activate that virtual environment to run the CLI.

Run tmerge --help for a description of the options.

Python module

You may import tmerge as a Python module and call the merge function to run it. merge takes 2 mandatory kwargs (input_path and output_path) and two optional kwargs (tolerance and processes). Any additional keyword arguments will be sent to any registered plugins (see below).

from tmerge import merge
input_path = "my.gff"
output_path = "output.gff"

merge(input_path=input_path, output_path=output_path) # Will block until completion

# Can now do more things with the output file
with open(output_path, "a") as f:
    f.write("# A comment \n")
    f.flush()

Plugins

Plugins allow you to "hook" into tmerge's lifecycle events and allow you to view, edit or remove the transcripts passing through tmerge and adapt it to your lab's specific needs. For example, adding Hi-Seq support.

This section explains how to write plugins and register them to tmerge.

Transcript Models and Contigs

Before writing a plugin, it is important to understand the concept of Transcript Models and Contigs. Transcripts are represented in tmerge as TranscriptModel objects, at first these are the transcripts defined in the input file but are altered throughout the lifecycle of tmerge, either having other transcript models merged into them or removed entirely. Contigs are lists of overlapping transcript models. Merging of transcript models is only performed within a contig and not between contigs.

Plugin class

A plugin is a simple class that registers itself to one or more "hooks" in it's init method. It receives the hooks dict as it's first argument followed by all of the kwargs that are passed to tmerge.merge.

class Counter:
    def __init__(self, hooks,**kwargs):
        self.count = 0
        hooks["transcript_added"].tap(self.add_one)

    def add_one(self, *args):
        self.count += 1

    def print(self, *args);
        print(f"There are {self.count} transcripts")

Editing transcripts

Some of the hooks send transcripts to the hooked-in function (see table below). You can edit or remove any of these transcripts and changes will be reflected in the output merged file. Further, any key/value pairs added to the meta dict will be appended to the "attributes" column of the output merged GFF.

class MyPointlessPlugin:
    def __init__(self, hooks, extra_attribute, bad_id, **kwargs):
        self.extra_attribute = extra_attribute
        self.bad_id = bad_id

        hooks["transcript_added"].tap(self.add_meta)
        hooks["contig_merged"].tap(self.remove_if_matches)

    def add_meta(self, transcript, *args):
        # When tmerge.merge(input_path=output, output_path=output, extra_attribute="Pointless") is ran 'extra: "Pointless"' will be added to the attributes column for every transcript
        self.transcript.meta["extra"] = self.extra_attribute

    def remove_if_matches(self, contig, *args):
        # Running tmerge.merge(input_path=output, output_path=output, bad_id="bad") will remove any transcript with the id of "bad" from the result
        for transcript in contig:
            if transcript.id = self.bad_id:
                transcript.remove() # Flags a transcript for removal

Registering plugins

Simple list

The easiest way to provide tmerge with plugins is to pass the plugins kwarg to tmerge.merge.

from myplugins import MySimplePlugin, MyAdvancedPlugin
from tmerge import merge

merge(
    input_path="input.gff",
    output_path="output.gff",
    plugins=[
        MySimplePlugin,
        MyAdvancedPlugin
    ]
)

Dynamic Plugin Discovery

If you're already using setup_tools in your project, then you can use dynamic plugin discovery to easily drop in plugins to tmerge.

In your project's setup.py add your plugin to the tmerge.plugins group:

# setup.py
setup(
    ...
    entry_points={
        "tmerge.plugins": "plugin_name = my.plugin.module.MyPlugin"
    }
)

This will automatically register your plugin with tmerge and the plugin will be executed with tmerge.merge.

Lifecycle events

Hook Name	When?	Arguments sent to hooked-in functions
chromosome_parsed	When one chromosome is parsed from the input	`chromosome` (list of `TranscriptModel`s)
transcript_added	When a transcript is added to a contig	`transcript` (`TranscriptModel`)
contig_built	When one contig (group of overlapping transcripts) is built	`contig` (list of `TranscriptModel`s)
transcripts_merged	When one transcript is merged into another	`host_transcript` (`TranscriptModel`), `merged_transcript` (`TranscriptModel`) `host_transcript` is the transcript that has had `merged_transcript` merged into it
contig_merged	When one contig is fully merged	`contig` (list of `TranscriptModel`s)
contig_complete	Contig has been fully merged, transcript flagged for removal removed, and queued for writing	`contig` (list of `TranscriptModel`s)
merging_complete	All transcripts have been merged	None
pre_sort	Just before the merged output is sorted	None
post_sort	Just after the merged output is sorted	None
complete	Everything complete	None

Examples

See the tmerge/plugins/ folder for examples of various plugins.

Authors

Julien Lagarde, CRG, Barcelona, contact julienlag@gmail.com

Jacob Windsor, CRG, Barcelona, contact me@jcbwndsr.com

Name		Name	Last commit message	Last commit date
Latest commit History 292 Commits
images		images
tmerge		tmerge
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tmerge

Installation

Usage

CLI

Python module

Plugins

Transcript Models and Contigs

Plugin class

Editing transcripts

Registering plugins

Simple list

Dynamic Plugin Discovery

Lifecycle events

Examples

Authors

About

Releases

Packages

Languages

License

jacobwindsor/tmerge

Folders and files

Latest commit

History

Repository files navigation

tmerge

Installation

Usage

CLI

Python module

Plugins

Transcript Models and Contigs

Plugin class

Editing transcripts

Registering plugins

Simple list

Dynamic Plugin Discovery

Lifecycle events

Examples

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages