Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The meaning and differences in TisType #27

Open
ruixuan-zhang opened this issue Sep 20, 2022 · 10 comments
Open

The meaning and differences in TisType #27

ruixuan-zhang opened this issue Sep 20, 2022 · 10 comments

Comments

@ruixuan-zhang
Copy link

Dear developer,

Good day. Thank you for your development and maintenance of this software.

I was wondering if you could explain about the definitions of different classes of TisType?

I see in README that TisType refers to the relative position of the TIS to annotated ORF of the transcript.

First, in my results, I got some predictions like 3' UTR, 5'UTR and Extended.

  • Can I understand the class Extended in a way that if an assembled transcript from RiboSeq data is aligned to the annotated CDS region and the transcript is continuous without frameshift and extends outside of the annotated CDS, it is annotated as extended.
  • While the 5'UTR and 3'UTR means that the TIS of a transcript is aligned to these untranslated regions and not assembled into the transcript of the CDS part (or not in the same frame)?

Second, I also got some Internal and Internal:CDSFrameOverlap

  • I see CDSOverlap means the ORF overlaps with annotated CDS in another transcript in the same reading frame.
  • Does Internal mean that a predicted ORF
    • locates within an annotated CDS (both ends locate within the annotated one)
    • is in different frame
  • Does internal:CDSFrameOverlap means a predicted ORF locates within an annotated but in the same frame?

In the end, I am working on a virus genome with a high coding density. What if a predicted ORF, started in the upstream gene's CDS or 3'UTR region and ends in the downstream genes' CDS region in a different frame. What will the TisType be? Is that Novel or 3'UTR?

Thank you very much in advance!!
Ruixuan

@zhpn1024
Copy link
Owner

zhpn1024 commented Sep 20, 2022

Dear developer,

Good day. Thank you for your development and maintenance of this software.

I was wondering if you could explain about the definitions of different classes of TisType?

I see in README that TisType refers to the relative position of the TIS to annotated ORF of the transcript.

First, in my results, I got some predictions like 3' UTR, 5'UTR and Extended.

  • Can I understand the class Extended in a way that if an assembled transcript from RiboSeq data is aligned to the annotated CDS region and the transcript is continuous without frameshift and extends outside of the annotated CDS, it is annotated as extended.
    Yes, without frameshift or stop codon, resulted in an extended form of annotated CDS.
  • While the 5'UTR and 3'UTR means that the TIS of a transcript is aligned to these untranslated regions and not assembled into the transcript of the CDS part (or not in the same frame)?
    The ORF of 5'UTR type may have some overlap with annotated CDS, but not in the same frame.

Second, I also got some Internal and Internal:CDSFrameOverlap

  • I see CDSOverlap means the ORF overlaps with annotated CDS in another transcript in the same reading frame.

  • Does Internal mean that a predicted ORF

    • locates within an annotated CDS (both ends locate within the annotated one)
      Only consider the TIS position, not necessary both ends within annotated CDS.
    • is in different frame
      Right.
  • Does internal:CDSFrameOverlap means a predicted ORF locates within an annotated but in the same frame?
    The predicted one is not in the same frame with annotated CDS, but in the same frame with CDS in another transcript.

In the end, I am working on a virus genome with a high coding density. What if a predicted ORF, started in the upstream gene's CDS or 3'UTR region and ends in the downstream genes' CDS region in a different frame. What will the TisType be? Is that Novel or 3'UTR?
Just based on the start position. It should be 5'UTR if started in the upstream. 'Novel' means the transcript has no CDS annotation. So it depend on annotation of the given transcript.

Thank you very much in advance!! Ruixuan

@ruixuan-zhang
Copy link
Author

Thank you for your prompt reply! Now I understand better about TisType.

I was wondering if you could help me with one more question that

If a predicted ORFs, its TIS is aligned to the annotated TIS, but the end of this predicted ORFs extended outside of the annotated one. For example, stop codon recoding events or stop codon bypass events, what will be the TisType of this case?

My preliminary guess is that the TisType is Annotated and extension at 3'end can be found by comparing GenomePos or Start:Stop with the CDS region in the previous annotation file, right?

Thank you very much!

@zhpn1024
Copy link
Owner

The type should be 'Annotated'.
The prediction of 3' end extention is not supported currently. This may happen in case of different stop codons. If so, you can compare the predicted stop position with the annotated one. In addition, the 3' extended CDS region may be identified as another 3'UTR ORF, if there is a TIS codon in it.

@ruixuan-zhang
Copy link
Author

Thank you very much for your patient explanation!

@ruixuan-zhang
Copy link
Author

Dear Zhang,

Good day. Sorry, I have another question about the meaning of "GenomePos" and "Start & Stop".

In the README file, it is written as

  • Genome position and strand of TIS site, 0 based, half open
  • Start: TIS of the ORF on transcript
  • Stop: 3' end of stop codon on transcript

Can I understand in a way that

  • Genome position: ORF region (From 5'UTR to 3'UTR)
  • Start: Start codon
  • Stop: Stop codon

By the way I want to ask if truncated represents cases whose predicted start codon is in the downstream region of annotated start codon and leads the same f0 frame?

I asked this because I got a result like below.

  • The x-axis is the annotated information of a gene (0 represents the annotated start codon).
  • The orange area is from GenomePos and annotated as truncated.
  • This made me felt strange and I guess GenomePos represents the whole ORF region rather than the CDS region, right?

Screen Shot 2022-10-04 at 16 58 34

Thank you very much in advance.

Ruixuan

@zhpn1024
Copy link
Owner

zhpn1024 commented Oct 4, 2022

The 'Truncated' should be what you suppose to be.
For the example, could you provide the detailed information including Transcript, CDS, GenomePos, Start and Stop? The start and stop are relative to the 5' end of transcript (usually 5'UTR), and corresponding to the two positions of GenomePos.

@zhpn1024
Copy link
Owner

zhpn1024 commented Oct 4, 2022

A new module 'transplot' is added in the github but not formally released. You can git clone and try to plot using 'ribotish transplot' with '--morecds' option.

@ruixuan-zhang
Copy link
Author

Yeah, sure

  • GenomePos: 830018-830306
  • CDS (from gff file) 830019 - 830393
  • Strand: -
  • Start: 87
  • Stop: 375
  • TisType: Truncated

I found my mistakes in plotting that I forgot to consider the strand information.

Then, it makes sense in this case, the start part has a truncation. GenomePos represents the start codon : stop codon predicted by RiboTISH right?

I was wondering how can I know where the transcript starts? Does RiboTISH use and follow the annotation in gff file?

Thank you!

@zhpn1024
Copy link
Owner

zhpn1024 commented Oct 4, 2022

Right.
The start site is from transcript annotation in gff file.

@ruixuan-zhang
Copy link
Author

Thank you very much for your patient explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants