Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[After Pools] Digestion Optimization and Terminal Bug Fix #823

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

nbollis
Copy link
Member

@nbollis nbollis commented Jan 14, 2025

PR Classification

Code cleanup, performance optimization, and terminal mod bug fix.

Pull request was loaded into MetaMorpheus 1.0.6 and profiled using Jenkins Data.

The terminal mod change resulted in an increase in PSMs, Peptides, and Proteins.

This PR reduces:

  • Total Runtime: -14%
  • Calls to Mod Fits: -4.9 billion calls
  • Calls to GetFixedModsOneIsNTerminus: -75 million calls
  • BioPolymerWithSetModsExtensions for full and essential sequences: -15% runtime due to string interpolation

PR Summary

This pull request focuses on improving code readability, performance, and maintainability by optimizing methods, refactoring logic, and removing unused code.

  • Updated BioPolymerWithSetModsExtensions to use nullable Modification? type and string interpolation.
  • Optimized DetermineFullSequence method by initializing StringBuilder with an initial capacity.
  • Refactored SetFixedModsOneIsNorFivePrimeTerminus method to use a ref parameter and simplified logic.
  • Replaced manual dictionary and list creation with pooled objects in ProteolyticPeptide.cs and NucleolyticOligo.cs.
  • Added try-finally blocks to ensure pooled objects are returned after use in ProteolyticPeptide.cs and NucleolyticOligo.cs.

Added `using` directive for `MzLibUtil`. Introduced a static readonly `HashSetPool<int>` named `HashSetPool` to manage a pool of hash sets. Updated `DigestionAgent` constructor to initialize `HashSetPool`. Refactored `GetDigestionSiteIndices` to use a hash set from `HashSetPool` for storing indices, ensuring no duplicates. Explicitly added start and end of protein sequence as cleavage sites. Implemented `try-finally` block to return hash set to pool after use. Final list of indices is now sorted before returning.
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 95.19520% with 16 lines in your changes missing coverage. Please review.

Project coverage is 77.43%. Comparing base (5443e36) to head (349d7b1).

Files with missing lines Patch % Lines
mzLib/Proteomics/ProteolyticDigestion/Protease.cs 86.11% 1 Missing and 4 partials ⚠️
...teomics/ProteolyticDigestion/ProteolyticPeptide.cs 94.44% 1 Missing and 3 partials ⚠️
mzLib/Omics/Digestion/DigestionProduct.cs 89.65% 1 Missing and 2 partials ⚠️
...zLib/Transcriptomics/Digestion/NucleolyticOligo.cs 95.89% 0 Missing and 3 partials ⚠️
mzLib/Omics/BioPolymerWithSetModsExtensions.cs 94.44% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #823      +/-   ##
==========================================
+ Coverage   77.37%   77.43%   +0.06%     
==========================================
  Files         220      223       +3     
  Lines       33197    33280      +83     
  Branches     3441     3446       +5     
==========================================
+ Hits        25686    25772      +86     
+ Misses       6939     6936       -3     
  Partials      572      572              
Files with missing lines Coverage Δ
mzLib/MzLibUtil/ObjectPools/DictionaryPool.cs 100.00% <100.00%> (ø)
mzLib/MzLibUtil/ObjectPools/HashSetPool.cs 100.00% <100.00%> (ø)
mzLib/MzLibUtil/ObjectPools/ListPool.cs 100.00% <100.00%> (ø)
mzLib/Omics/Digestion/DigestionAgent.cs 90.32% <100.00%> (+0.84%) ⬆️
...ib/Omics/Modifications/ModificationLocalization.cs 97.95% <100.00%> (+0.45%) ⬆️
mzLib/Omics/BioPolymerWithSetModsExtensions.cs 95.50% <94.44%> (+0.15%) ⬆️
mzLib/Omics/Digestion/DigestionProduct.cs 95.96% <89.65%> (+1.19%) ⬆️
...zLib/Transcriptomics/Digestion/NucleolyticOligo.cs 96.63% <95.89%> (+0.24%) ⬆️
...teomics/ProteolyticDigestion/ProteolyticPeptide.cs 92.43% <94.44%> (+0.54%) ⬆️
mzLib/Proteomics/ProteolyticDigestion/Protease.cs 95.78% <86.11%> (+0.22%) ⬆️

- Simplified initial check for `possibleVariableModifications.Count` and replaced `yield return null` with `yield break`.
- Adjusted indentation and loop structure for clarity.
- Refactored nested loop to remove unnecessary braces and streamline logic.
- Simplified construction of `modificationPattern` dictionary by removing redundant checks and directly using `modIndex`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant