Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to encode non-canonical amino acids into search? #90

Open
ciancone94 opened this issue Aug 31, 2023 · 17 comments
Open

How to encode non-canonical amino acids into search? #90

ciancone94 opened this issue Aug 31, 2023 · 17 comments

Comments

@ciancone94
Copy link

ciancone94 commented Aug 31, 2023

Hello,

A collaborator asked me how to encode a non-canonical amino acid into the search. This same amino acid would also be the one that crosslinks. Let's call the amino acid 'X', with a mass difference to the standard amino acid (e.g., D)of 50 Da. Is it possible to feed XiSearch a fasta with the mass difference of the standard amino acid to the newly incorporated one? This would just be for one protein, not for the whole proteome. For example:

Sequence: ASDFK, Modified sequence: ASXFK

Can I upload a fasta with: ASD(+50)FK? Is there a format I need to follow? Francis seemed to recall being able to hard-code acetylation sites on XiSearch, but he can't remember how he did this.

Otherwise, would I just search all the 'D' residues to have a modified mass of 50?

Thanks,

Anthony

@grandrea
Copy link
Contributor

grandrea commented Aug 31, 2023

Hey,

Follow the rules for site-specific modifications here https://github.com/Rappsilber-Laboratory/xisearch#modification-settings

In short, put in your fasta something like this for a variable modification. Remove parenthesis in the fasta for having mod as a fixed modification instead. Mod names are arbitrary but have to be lowercase.

ASD(mod)FK

and then in the .config, define the modification with the deltamass relative to the unmodified amino acid.

modification:known::SYMBOLEXT:mod;MODIFIED:D;DELTAMASS:50

@ciancone94
Copy link
Author

Not sure how I missed that, thanks for your help!

@lutzfischer
Copy link
Member

if you want to have it site specific you can also encode it in the fasta-file

@cxdummies
Copy link

cxdummies commented Mar 19, 2024

Can one then make use of this specific modified amino acid in other setting lines? For example, would the following lines work?

crosslinker:AsymetricSingleAminoAcidRestrictedCrossLinker:Name:Linker;MASS:123.45678;FIRSTLINKEDAMINOACIDS:*;SECONDLINKEDAMINOACIDS:E,D,Dmod

digestion:PostAAConstrainedDigestion:DIGESTED:D,Dmod;ConstrainingAminoAcids:;NAME=Enzyme

loss:AminoAcidRestrictedLoss:NAME:Loss;aminoacids:Dmod;MASS:123;cterm

Additionally, can one create a fixed modification on this modified amino acid? Would this line work?

modification:fixed::SYMBOL:Dmodabc;MODIFIED:Dmod;MASS:100

@grandrea
Copy link
Contributor

Sorry I don't understand. Is the non canonical amino acid also a crosslinker, or just a different amino acid?

@grandrea
Copy link
Contributor

by default crosslinker that crosslink to D will also crosslink to Dmod as far as I understand.

@cxdummies
Copy link

cxdummies commented Mar 25, 2024

The question is, in general, if I define a modified amino acid in the fasta sequence, do I have to add this specific modified amino acid to the settings of an enzyme, crosslinker and fixed/variable modifications?

Can I define one specific protein that is fully 15N-labelled in the fasta file while keeping others as normal protein sequences?

for instance

ProtA
AnCnDn
ProtB
ACD
ProtC
EFG

@grandrea
Copy link
Contributor

there is no general answer to this question, is what I am trying to reply- it kind of depends what you want to do.

Labelling is typically not defined as a modification but using the label word https://github.com/Rappsilber-Laboratory/XiSearch?tab=readme-ov-file#isotope-labelling

The label word will search every amino acid as heavy or light version of itself (or whatever custom deltamass you give with the list). So my suggestion would be

 LABEL:HEAVY::SYMBOL:Dn15;MODIFIED:D;MASS:116.023978035

If instead you really want to define only a single protein as 100% labelled, I think you are going about it the right way. The crosslinker will react with the modified amino acid, but if you use a protease that cuts at that amino acid i don't know @lutzfischer may clarify this also for losses.

For modifications defined in fasta, you should use the known modifications, not fixed (again see near the end of https://github.com/Rappsilber-Laboratory/XiSearch?tab=readme-ov-file#modification-settings )

modification:known::SYMBOLEXT:ph;MODIFIED:S;DELTAMASS:79.966331

for a fasta like

ACKASphAK

No brackets in the sequence for a fixed modification.

as an aside, I suggest using the DELTAMASS and SYMBOLEXT nomenclature to use unimod modification masses rather than total masses https://github.com/Rappsilber-Laboratory/XiSearch?tab=readme-ov-file#modification-settings

@grandrea
Copy link
Contributor

grandrea commented Mar 25, 2024

I see now with the fixed modification on a site specific modified AA. Again i don't know sorry. I will test because I am also curious. With label it works

@cxdummies
Copy link

Is it permitted to define multiple modified amino acids on one line? Is it necessary to list each modified amino acid on a separate line?

modification:known::SYMBOLEXT:ph;MODIFIED:S,T;DELTAMASS:79.966331

or

modification:known::SYMBOLEXT:ph;MODIFIED:S;DELTAMASS:79.966331
modification:known::SYMBOLEXT:ph;MODIFIED:T;DELTAMASS:79.966331

@grandrea
Copy link
Contributor

Both should work but you should not use "X" for any amino acid or "nterm" for protein N terminus, those go on separate lines.

@lutzfischer
Copy link
Member

lutzfischer commented Mar 26, 2024

One note ahead: You can use any modification in other lines - but you have to define the modifications first. Xi parses the config file strictly linear - i.e. anything self-defined that you use somewhere has to be defined above of that.
So modifications that you want to use as part of digestion or crosslinking rules need to be define above these.

@grandrea

by default crosslinker that crosslink to D will also crosslink to Dmod as far as I understand.

That is only true for label - as these are assumed to not change the relevant chemical properties. But modifications need to be mentioned in enzyme and crosslinker defintions.
So if D and Dmod need to be crosslinkable or digestable, then both need to be mention in the specificities.

@cxdummies

The question is, in general, if I define a modified amino acid in the fasta sequence, do I have to add this specific modified amino acid to the settings of an enzyme, crosslinker and fixed/variable modifications?

Yes that would be the case.

Can I define one specific protein that is fully 15N-labelled in the fasta file while keeping others as normal protein sequences?

Not as a labeling schema - the closest you can do is either define variable modifications in the fasta for each residue - but that will probably result in an exploding search space - or define the protein twice in the fasta file - with and without (fixed) modification. in both cases you should define the modified residue as known modification and add the right ones to the specificities of crosslinker and enzyme.

@cxdummies
Copy link

Hi Lutz,

Would settings like this work? Can the fixed and variable modifications recongnise the declared known modification?

modification:known::SYMBOLEXT:a;MODIFIED:C;DELTAMASS:1
modification:known::SYMBOLEXT:b;MODIFIED:M;DELTAMASS:2

modification:fixed::SYMBOL:Ccm;MODIFIED:C;MASS:160.03065
modification:fixed::SYMBOL:Cacm;MODIFIED:Ca;MASS:161.03065

modification:variable::SYMBOL:Mox;MODIFIED:M;MASS:147.035395
modification:variable::SYMBOL:Mbox;MODIFIED:Mb;MASS:149.035395

@cxdummies
Copy link

Can I define one specific protein that is fully 15N-labelled in the fasta file while keeping others as normal protein sequences?

Not as a labeling schema - the closest you can do is either define variable modifications in the fasta for each residue - but that will probably result in an exploding search space - or define the protein twice in the fasta file - with and without (fixed) modification. in both cases you should define the modified residue as known modification and add the right ones to the specificities of crosslinker and enzyme.

I tried defining the protein twice in the fasta file, declaring the modification:known and adding the right ones to the specificities of crosslinker and protease. Unfortunatly, XiSearch didn't identify the modified protein at all.

@cxdummies
Copy link

Applying the heavy label scheme works, although the scheme applies to every protein in the fasta file.

Does XiSearch recognise that the following masses should also become higher in the heavy labelled proteins?

modification:variable::SYMBOL:Mox;MODIFIED:M;MASS:147.035395
modification:fixed::SYMBOL:Ccm;MODIFIED:C;MASS:160.03065

@lutzfischer
Copy link
Member

Would settings like this work? Can the fixed and variable modifications recongnise the declared known modification?

modification:known::SYMBOLEXT:a;MODIFIED:C;DELTAMASS:1 modification:known::SYMBOLEXT:b;MODIFIED:M;DELTAMASS:2

modification:fixed::SYMBOL:Ccm;MODIFIED:C;MASS:160.03065 modification:fixed::SYMBOL:Cacm;MODIFIED:Ca;MASS:161.03065

modification:variable::SYMBOL:Mox;MODIFIED:M;MASS:147.035395 modification:variable::SYMBOL:Mbox;MODIFIED:Mb;MASS:149.035395

yes but in that case you could define it a bit more compact as:

modification:known::SYMBOLEXT:a;MODIFIED:C;DELTAMASS:1 modification:known::SYMBOLEXT:b;MODIFIED:M;DELTAMASS:2

modification:fixed::SYMBOLEXT:cm;MODIFIED:C,Ca;DELTAMASS:57.021464
modification:variable::SYMBOLEXT:ox;MODIFIED:M,Mb;DELTAMASS:15.99491463

The resulting fixed modification would be Ccm and Cacm as well (symbolext is cumulative) and variable Mox and Mbox).

Can I define one specific protein that is fully 15N-labelled in the fasta file while keeping others as normal protein sequences?
Not as a labeling schema - the closest you can do is either define variable modifications in the fasta for each residue - but that will probably result in an exploding search space - or define the protein twice in the fasta file - with and without (fixed) modification. in both cases you should define the modified residue as known modification and add the right ones to the specificities of crosslinker and enzyme.

I tried defining the protein twice in the fasta file, declaring the modification:known and adding the right ones to the specificities of crosslinker and protease. Unfortunatly, XiSearch didn't identify the modified protein at all.

Not sure why this should fail. Can you send me the config/Fasta (lutz dot fischer tu-berlin dot de)? Then I can have a look if I understand what went wrong here.

Applying the heavy label scheme works, although the scheme applies to every protein in the fasta file.

Does XiSearch recognise that the following masses should also become higher in the heavy labelled proteins?

modification:variable::SYMBOL:Mox;MODIFIED:M;MASS:147.035395 modification:fixed::SYMBOL:Ccm;MODIFIED:C;MASS:160.03065

It should create a labelled version of these as well. I.e. if the label schema is n15 you should see Ccmn15 as a modification.
BUT looking at the code I think there might be some problems there. Need to check what happens there - especially in connection with fasta defined modifications on top. Sorry label are somewhat untested at the moment and will have to see when I can test/fix that.

@cxdummies
Copy link

cxdummies commented Aug 20, 2024

Applying the heavy label scheme works, although the scheme applies to every protein in the fasta file.
Does XiSearch recognise that the following masses should also become higher in the heavy labelled proteins?
modification:variable::SYMBOL:Mox;MODIFIED:M;MASS:147.035395 modification:fixed::SYMBOL:Ccm;MODIFIED:C;MASS:160.03065

It should create a labelled version of these as well. I.e. if the label schema is n15 you should see Ccmn15 as a modification. BUT looking at the code I think there might be some problems there. Need to check what happens there - especially in connection with fasta defined modifications on top. Sorry label are somewhat untested at the moment and will have to see when I can test/fix that.

I found "Mox5" and "Ccm5" on the identified peptide sequences, but they were detected only on the unlabelled peptides, although they were expected to sit on the 15N-labelled peptides.

It would be really great if this could be further developed. 15N could be very useful in some applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants