Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proteoform bifurcation when reading xml with features (chain, signal, etc.) #609

Merged
merged 23 commits into from
Mar 21, 2022

Conversation

trishorts
Copy link
Contributor

@trishorts trishorts commented Feb 15, 2022

Proteoforms are not always fully cleaved into their constituent fragments. Some fragments are left joined together. This PR adds fragments cleaved once to the list of proteolysis products used in search. See explanation in issue #441 . Currently, biomarkers are added using default parameters when the database is loaded. Also, these added biomarkers ARE NOT WRITTEN TO DATABASES after search (just the original biomarkers are). We add biomarkers degraded at either end by up to five AAs and we allow one missed cleavage between existing proteolysis products.

#441

@trishorts trishorts changed the title [WIP] Proteoform fragment pairs [WIP] Proteoform bifurcation when reading xml with features (chain, signal, etc.) Feb 15, 2022
@trishorts trishorts changed the title [WIP] Proteoform bifurcation when reading xml with features (chain, signal, etc.) Proteoform bifurcation when reading xml with features (chain, signal, etc.) Feb 16, 2022
avcarr2
avcarr2 previously approved these changes Feb 17, 2022
Copy link
Contributor

@avcarr2 avcarr2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major concerns other than fixing or commenting a magic number. Was mildly concerned about potential performance issues, but after testing it, method runtime is pretty fast.

Proteomics/Protein/Protein.cs Outdated Show resolved Hide resolved
{
if(_proteolysisProducts.Any(p=>p.OneBasedBeginPosition == (proteolysisProductEndPosition + 1)))
{
cleavagePostions.Add(proteolysisProductEndPosition);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing added to cleavagePositions list if proteolysisProductEndPositions is greater than zero. So the foreach loop on line 839 will still evaluate the empty list.

Proteomics/Protein/Protein.cs Outdated Show resolved Hide resolved
nbollis
nbollis previously approved these changes Feb 18, 2022
Copy link
Member

@nbollis nbollis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@trishorts trishorts dismissed stale reviews from nbollis and avcarr2 via 7f05cc7 February 23, 2022 14:14
nbollis
nbollis previously approved these changes Feb 24, 2022
Copy link
Member

@nbollis nbollis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

acesnik
acesnik previously approved these changes Mar 9, 2022
Copy link
Collaborator

@acesnik acesnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting amendment to the combinatorics. Why did you choose to use the "biomarker" term? Is similar logic used in ProSight for expanding proteoform entries and calling them biomarkers? The testing is extensive, nice work! 😃

@acesnik acesnik dismissed stale reviews from nbollis and themself via 6a1d64b March 9, 2022 20:35
@acesnik acesnik requested a review from nbollis March 9, 2022 20:37
avcarr2
avcarr2 previously approved these changes Mar 18, 2022
Copy link
Contributor

@avcarr2 avcarr2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid updates. Big fan of the new, informative variable names in TestFullProteinReadWrite.

Copy link
Collaborator

@acesnik acesnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked this over a few times, and I was trying to see if it all made sense before merging, and it's finally making more sense to me what's going on. Here are some comments and questions, and some changes and comments requested. Sorry about that, but should be pretty painless!

@@ -595,96 +593,33 @@ public static void TestProteolysisBothTermini()
}

[Test]
public static void TestAddBiomarkersIntactOnly()
{
//Note: existing proteoloysis products remain on the list with no additional proteolysis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any thoughts on these tests that were deleted? Were they not helpful regarding this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intact biomarkers were added to deal w/ methionine cleavage. these are now done at run time in mm.

@@ -930,40 +930,6 @@ public static void CountTargetsWithMatchingDecoys()
}
}

[Test]
public static void TestPeptideWithSetModsReturnsBiomarkersInTopDown()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these tests covered above, or are they not useful for the intended purpose? Are there other top-down cleavage tests that cover them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now deprecated. Biomarkers are added during database read and not during digestion.

@@ -225,7 +225,31 @@ public void ParseFeatureEndElement(XmlReader xml, IEnumerable<string> modTypesTo
}
else if (FeatureType == "peptide" || FeatureType == "propeptide" || FeatureType == "chain" || FeatureType == "signal peptide")
{
ProteolysisProducts.Add(new ProteolysisProduct(OneBasedBeginPosition, OneBasedEndPosition, FeatureType));
string type = FeatureType;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment before this block to describe what's going on?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly noting that this block produces descriptions relevant to top-down biomarkers following the ( character

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. but also good for bottom up. we look for these features there too.

@@ -117,10 +117,14 @@ public class ProteinDbWriter
}
writer.WriteEndElement();
}
foreach (var proteolysisProduct in protein.ProteolysisProducts)

//for now we are not going to write top-down biomarkers generated for top-down biomarker search.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a note that biomarker-specific type information is included after (, so that's clearer for posterity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

{
if (position - 1 >= minimumProductLength)
{
string leftType = "N-terminal Portion of Singly Cleaved Protein" + "(" + 1 + "-" + position + ")";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use string interpolation? It's faster and this might be done quite a bit because it's in a loop. https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated

string leftType = $"N-terminal Portion of Singly Cleaved Protein(1-{position})"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if (BaseSequence.Length - position - 1 >= minimumProductLength)
{
string rightType = "C-terminal Portion of Singly Cleaved Protein" + "(" + (position + 1).ToString() + "-" + BaseSequence.Length + ")";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string interpolation here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

public void CleaveOnceBetweenProteolysisProducts(int minimumProductLength = 7)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a method comment before this method describing it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding /// on the line before should open one up in Visual Studio

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

{
foreach (int proteolysisProductEndPosition in proteolysisProductEndPositions)
{
if (_proteolysisProducts.Any(p => p.OneBasedBeginPosition == (proteolysisProductEndPosition + 1)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments describing what these conditionals represent would be helpful

{
string leftType = "N-terminal Portion of Singly Cleaved Protein" + "(" + 1 + "-" + position + ")";
ProteolysisProduct leftProduct = new(1, position, leftType);
if (!_proteolysisProducts.Any(p => p.OneBasedBeginPosition == leftProduct.OneBasedBeginPosition && p.OneBasedEndPosition == leftProduct.OneBasedEndPosition))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment this conditional, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@acesnik acesnik merged commit c883155 into smith-chem-wisc:master Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants