-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proteoform bifurcation when reading xml with features (chain, signal, etc.) #609
Proteoform bifurcation when reading xml with features (chain, signal, etc.) #609
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No major concerns other than fixing or commenting a magic number. Was mildly concerned about potential performance issues, but after testing it, method runtime is pretty fast.
Proteomics/Protein/Protein.cs
Outdated
{ | ||
if(_proteolysisProducts.Any(p=>p.OneBasedBeginPosition == (proteolysisProductEndPosition + 1))) | ||
{ | ||
cleavagePostions.Add(proteolysisProductEndPosition); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing added to cleavagePositions list if proteolysisProductEndPositions is greater than zero. So the foreach loop on line 839 will still evaluate the empty list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting amendment to the combinatorics. Why did you choose to use the "biomarker" term? Is similar logic used in ProSight for expanding proteoform entries and calling them biomarkers? The testing is extensive, nice work! 😃
…o fragmentpairs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid updates. Big fan of the new, informative variable names in TestFullProteinReadWrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked this over a few times, and I was trying to see if it all made sense before merging, and it's finally making more sense to me what's going on. Here are some comments and questions, and some changes and comments requested. Sorry about that, but should be pretty painless!
@@ -595,96 +593,33 @@ public static void TestProteolysisBothTermini() | |||
} | |||
|
|||
[Test] | |||
public static void TestAddBiomarkersIntactOnly() | |||
{ | |||
//Note: existing proteoloysis products remain on the list with no additional proteolysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any thoughts on these tests that were deleted? Were they not helpful regarding this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intact biomarkers were added to deal w/ methionine cleavage. these are now done at run time in mm.
@@ -930,40 +930,6 @@ public static void CountTargetsWithMatchingDecoys() | |||
} | |||
} | |||
|
|||
[Test] | |||
public static void TestPeptideWithSetModsReturnsBiomarkersInTopDown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these tests covered above, or are they not useful for the intended purpose? Are there other top-down cleavage tests that cover them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now deprecated. Biomarkers are added during database read and not during digestion.
@@ -225,7 +225,31 @@ public void ParseFeatureEndElement(XmlReader xml, IEnumerable<string> modTypesTo | |||
} | |||
else if (FeatureType == "peptide" || FeatureType == "propeptide" || FeatureType == "chain" || FeatureType == "signal peptide") | |||
{ | |||
ProteolysisProducts.Add(new ProteolysisProduct(OneBasedBeginPosition, OneBasedEndPosition, FeatureType)); | |||
string type = FeatureType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment before this block to describe what's going on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly noting that this block produces descriptions relevant to top-down biomarkers following the (
character
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. but also good for bottom up. we look for these features there too.
@@ -117,10 +117,14 @@ public class ProteinDbWriter | |||
} | |||
writer.WriteEndElement(); | |||
} | |||
foreach (var proteolysisProduct in protein.ProteolysisProducts) | |||
|
|||
//for now we are not going to write top-down biomarkers generated for top-down biomarker search. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a note that biomarker-specific type information is included after (
, so that's clearer for posterity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mzLib/Proteomics/Protein/Protein.cs
Outdated
{ | ||
if (position - 1 >= minimumProductLength) | ||
{ | ||
string leftType = "N-terminal Portion of Singly Cleaved Protein" + "(" + 1 + "-" + position + ")"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use string interpolation? It's faster and this might be done quite a bit because it's in a loop. https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated
string leftType = $"N-terminal Portion of Singly Cleaved Protein(1-{position})"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mzLib/Proteomics/Protein/Protein.cs
Outdated
|
||
if (BaseSequence.Length - position - 1 >= minimumProductLength) | ||
{ | ||
string rightType = "C-terminal Portion of Singly Cleaved Protein" + "(" + (position + 1).ToString() + "-" + BaseSequence.Length + ")"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string interpolation here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} | ||
|
||
public void CleaveOnceBetweenProteolysisProducts(int minimumProductLength = 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a method comment before this method describing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding ///
on the line before should open one up in Visual Studio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
{ | ||
foreach (int proteolysisProductEndPosition in proteolysisProductEndPositions) | ||
{ | ||
if (_proteolysisProducts.Any(p => p.OneBasedBeginPosition == (proteolysisProductEndPosition + 1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments describing what these conditionals represent would be helpful
{ | ||
string leftType = "N-terminal Portion of Singly Cleaved Protein" + "(" + 1 + "-" + position + ")"; | ||
ProteolysisProduct leftProduct = new(1, position, leftType); | ||
if (!_proteolysisProducts.Any(p => p.OneBasedBeginPosition == leftProduct.OneBasedBeginPosition && p.OneBasedEndPosition == leftProduct.OneBasedEndPosition)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment this conditional, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Proteoforms are not always fully cleaved into their constituent fragments. Some fragments are left joined together. This PR adds fragments cleaved once to the list of proteolysis products used in search. See explanation in issue #441 . Currently, biomarkers are added using default parameters when the database is loaded. Also, these added biomarkers ARE NOT WRITTEN TO DATABASES after search (just the original biomarkers are). We add biomarkers degraded at either end by up to five AAs and we allow one missed cleavage between existing proteolysis products.
#441