Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval metrics and circular import bug fix. #380

Merged
merged 33 commits into from
Sep 25, 2024
Merged

Eval metrics and circular import bug fix. #380

merged 33 commits into from
Sep 25, 2024

Conversation

Lilferrit
Copy link
Contributor

Implemented bug fixes to resolve #378 and #379. Also implemented a unit test for ModelRunner.log_metrics to test for future incorrect behavior.

Copy link

codecov bot commented Sep 17, 2024

Codecov Report

Attention: Patch coverage is 96.03960% with 4 lines in your changes missing coverage. Please review.

Project coverage is 94.37%. Comparing base (aee3534) to head (60524af).
Report is 3 commits behind head on dev.

Files with missing lines Patch % Lines
casanovo/denovo/model_runner.py 93.02% 3 Missing ⚠️
casanovo/casanovo.py 96.29% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #380      +/-   ##
==========================================
+ Coverage   94.32%   94.37%   +0.05%     
==========================================
  Files          12       13       +1     
  Lines        1039     1102      +63     
==========================================
+ Hits          980     1040      +60     
- Misses         59       62       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

casanovo/data/pep_spec_match.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
tests/unit_tests/test_runner.py Outdated Show resolved Hide resolved
tests/unit_tests/test_runner.py Outdated Show resolved Hide resolved
tests/unit_tests/test_runner.py Outdated Show resolved Hide resolved
tests/unit_tests/test_runner.py Show resolved Hide resolved
Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One more thing to test/verify: multiple skipped spectra in a row. And then I think switching to None for skipped spectra instead of an empty string, and updating aa_match_batch to handle this, improves clarity.

casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Show resolved Hide resolved
tests/unit_tests/test_runner.py Show resolved Hide resolved
@Lilferrit
Copy link
Contributor Author

Lilferrit commented Sep 20, 2024

The updated aa_match_batch now handles amino acid precision for skipped spectra differently. For example, say for the ground truth peptides PEP and PET the model only (correctly) infers PEP and skips PET, the amino acid precision will now be 100%. Before, the first amino acid in any skipped spectrum would be treated as an incorrect inference, while the rest wouldn't be counted, so the peptide precision for the last case would be .75, and I'm not sure if that was correct.

* csv logger

* optimizer metrics logger

* metrics logging unit tests

* config item retrieval, additional requested changes

* Generate new screengrabs with rich-codex

* changelog update

* Generate new screengrabs with rich-codex

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Let's finish the discussion on Slack how to account for skipped spectra when calculating amino acid precision before merging.

I realized a situation where the evaluation might fail though: If we have multiple predictions per spectrum (i.e. top_match in the config > 1). I think that this might have failed in the previous implementation as well though. And it's not super obvious how to handle this situation (or if we should—but maybe at least a check/warning).

casanovo/denovo/evaluate.py Outdated Show resolved Hide resolved
@Lilferrit
Copy link
Contributor Author

I realized a situation where the evaluation might fail though: If we have multiple predictions per spectrum (i.e. top_match in the config > 1). I think that this might have failed in the previous implementation as well though. And it's not super obvious how to handle this situation (or if we should—but maybe at least a check/warning).

To me the most intuitive way to handle this is to only evaluate the highest confidence PSM for each spectrum. If I'm understanding everything correctly the current implementation just matches whatever PSM happens to be first in writer.psms to the ground truth spectra, which might be ok (?), but as you said I'm also not sure if it's worth implementing special handling for the case where top_match > 1. For now I'll just add a warning that pops if top_match is not one.

@wsnoble
Copy link
Contributor

wsnoble commented Sep 23, 2024

To me the most intuitive way to handle this is to only evaluate the highest confidence PSM for each spectrum.

I agree with this.

CHANGELOG.md Show resolved Hide resolved
Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.

casanovo/denovo/evaluate.py Outdated Show resolved Hide resolved
casanovo/denovo/evaluate.py Outdated Show resolved Hide resolved
casanovo/denovo/evaluate.py Outdated Show resolved Hide resolved
casanovo/denovo/evaluate.py Outdated Show resolved Hide resolved
@Lilferrit Lilferrit merged commit 396f838 into dev Sep 25, 2024
7 checks passed
@Lilferrit Lilferrit deleted the eval-metrics-fix branch September 25, 2024 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittent circular import error log_metrics skipped spectra incorrect behavior
3 participants