Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Casanovo-DB Functionality #325

Merged
merged 95 commits into from
Nov 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
258edb4
begin adding tests for annotate mode
VarunAnanth2003 Mar 27, 2024
30f5984
add basic test for annotate mode
VarunAnanth2003 Mar 29, 2024
186bc0f
added test case for annotate mode and modified method
VarunAnanth2003 Apr 9, 2024
a8f50f4
very rough sketch of db upgrade (untested)
VarunAnanth2003 Apr 11, 2024
dae9c8a
small upgrades to documentation
VarunAnanth2003 Apr 15, 2024
7f95ae5
better output formatting
VarunAnanth2003 Apr 15, 2024
278436b
all tests added
VarunAnanth2003 Apr 27, 2024
949ea93
remove minor debugging print statement
VarunAnanth2003 Apr 27, 2024
da5ef5e
Generate new screengrabs with rich-codex
github-actions[bot] Apr 27, 2024
53f6bec
remove excess info logs, add monkeypatch to tests
VarunAnanth2003 Apr 27, 2024
e2cbce8
Merge branch 'dev_db_search' of https://github.com/Noble-Lab/casanovo…
VarunAnanth2003 Apr 27, 2024
81aa073
mp fix
VarunAnanth2003 Apr 27, 2024
0ecbd80
fix line lengths and modify test
VarunAnanth2003 May 7, 2024
ee6638e
Generate new screengrabs with rich-codex
github-actions[bot] May 7, 2024
392ccaf
Merge branch 'dev' into dev_db_search
VarunAnanth2003 May 7, 2024
2d57513
justins requested fixes
VarunAnanth2003 May 7, 2024
3cfb795
added minor changes as requested by Wout
VarunAnanth2003 Jun 17, 2024
49f44ad
partial fixes requested by wout. Lots of subclassing removed
VarunAnanth2003 Jun 18, 2024
d967c42
documentation fixes and starting to cleanup batching code
VarunAnanth2003 Jun 18, 2024
ea1f97d
cleaned up on_predict_batch_end, TODOs for calc_mz
VarunAnanth2003 Jun 19, 2024
8825506
add proper calc_mz calculation with depthcharge
VarunAnanth2003 Jun 27, 2024
f25ace8
rough implementation
VarunAnanth2003 Jul 2, 2024
f7dfbc8
tested implementation of db search
VarunAnanth2003 Jul 3, 2024
e2ce317
fix for issue with 0 candidates
VarunAnanth2003 Jul 3, 2024
5ef27e0
minor fixes added
VarunAnanth2003 Jul 3, 2024
5f0675f
reordered and renamed variables for consistency
VarunAnanth2003 Jul 3, 2024
b4fd8ff
casanovo-db full working version with code simplification
VarunAnanth2003 Jul 4, 2024
35ba7d4
Generate new screengrabs with rich-codex
github-actions[bot] Jul 4, 2024
f8a1a89
fix batching issues
VarunAnanth2003 Jul 8, 2024
fe8794d
Merge branch 'db_search_full' of https://github.com/Noble-Lab/casanov…
VarunAnanth2003 Jul 8, 2024
7cb8e14
small fixes regarding documentation, import syntax, etc.
VarunAnanth2003 Aug 12, 2024
b2f08ac
add proteindatabase
VarunAnanth2003 Aug 20, 2024
3d0b0b9
Generate new screengrabs with rich-codex
github-actions[bot] Aug 20, 2024
812226e
finish proteindatabase
VarunAnanth2003 Aug 21, 2024
df68c1d
Merge branch 'db_search_full' of https://github.com/Noble-Lab/casanov…
VarunAnanth2003 Aug 21, 2024
cfd39e8
all comments addressed
VarunAnanth2003 Aug 23, 2024
106c4ec
new comments addressed
VarunAnanth2003 Aug 28, 2024
0dfdb2c
final adjustments added
VarunAnanth2003 Sep 3, 2024
4a5b238
minor changes regarding formatting and small efficiency boosts
VarunAnanth2003 Sep 3, 2024
4352bbd
changes before reformatting config
VarunAnanth2003 Sep 3, 2024
ddff67f
replace all occurences of "max_length" with "max_peptide_len"
VarunAnanth2003 Sep 3, 2024
a3548d0
added nonspecific digestion
VarunAnanth2003 Sep 3, 2024
e8d4682
minor comments
VarunAnanth2003 Sep 13, 2024
68b6926
full branch comments addressed
VarunAnanth2003 Sep 13, 2024
8153ffa
Merge pull request #352 from Noble-Lab/db_search_full
VarunAnanth2003 Sep 13, 2024
dc7d5f4
Merge branch 'dev' into dev_db_search
VarunAnanth2003 Sep 13, 2024
e8c9c7d
Generate new screengrabs with rich-codex
github-actions[bot] Sep 13, 2024
e474eee
updated and fixed failed tests
VarunAnanth2003 Sep 13, 2024
4e696b4
add mztab validation to dbsearch test
VarunAnanth2003 Sep 14, 2024
9a24817
Merge branch 'dev' into dev_db_search
VarunAnanth2003 Sep 17, 2024
4655452
lint fix
VarunAnanth2003 Sep 17, 2024
5e1b9d7
fix integration test
VarunAnanth2003 Sep 17, 2024
4d6b726
fix unit tests
VarunAnanth2003 Sep 17, 2024
5d82e1f
Merge branch 'dev' into dev_db_search
VarunAnanth2003 Sep 20, 2024
e7f0fdc
force fix test
VarunAnanth2003 Sep 21, 2024
813fac0
clean up test_digest_fasta_enzyme
VarunAnanth2003 Sep 21, 2024
310c3fd
adjust test_digest_fasta_mods
VarunAnanth2003 Sep 21, 2024
1651fd5
Merge branch 'dev' into dev_db_search
VarunAnanth2003 Oct 2, 2024
775def7
allows top_match filtering for casanovo-db
VarunAnanth2003 Oct 2, 2024
e35c60d
change default value for protein value in PepSpecMatch
VarunAnanth2003 Oct 2, 2024
79cba59
reverse issues with decoder
VarunAnanth2003 Oct 2, 2024
c9eb8b7
update test and remove logging statement
VarunAnanth2003 Oct 2, 2024
2b6198b
Merge branch 'dev' into dev_db_search
VarunAnanth2003 Nov 3, 2024
68e67e8
db_utils fixes
VarunAnanth2003 Nov 3, 2024
d01dd7f
updates to dataloaders, model_runner, and model.py
VarunAnanth2003 Nov 3, 2024
d581944
near final changes for all but db_utils
VarunAnanth2003 Nov 3, 2024
092fa2a
line length fixes
VarunAnanth2003 Nov 3, 2024
6d0868c
Minor refactoring and type hint fixes
bittremieux Nov 10, 2024
6ea0378
Use mask for more efficient candidate filtering
bittremieux Nov 10, 2024
408aa4d
Reorder methods in logical order
bittremieux Nov 10, 2024
65189ee
Fix unit tests
bittremieux Nov 10, 2024
1efd9dd
Directly generate DB peptides as DataFrame
bittremieux Nov 10, 2024
f679cdc
Fix type hints and line lengths
bittremieux Nov 10, 2024
c07ef57
Generate new screengrabs with rich-codex
github-actions[bot] Nov 10, 2024
09ffdfb
Refactor batching to avoid code repetition
bittremieux Nov 10, 2024
b17acef
Merge remote-tracking branch 'origin/dev_db_search' into dev_db_search
bittremieux Nov 10, 2024
ee78442
More minor refactoring
bittremieux Nov 10, 2024
7fa5f6f
Reformat with black
bittremieux Nov 10, 2024
7a42e8b
Minor fix
bittremieux Nov 10, 2024
17d5880
Fix output name crash
bittremieux Nov 10, 2024
fff5ca4
Fix AA score masking
bittremieux Nov 10, 2024
d18d874
Fix PSM export
bittremieux Nov 10, 2024
b12abd6
Less verbose logging of skipped peptides
bittremieux Nov 14, 2024
b577d59
Appropriate end-of-run reporting
bittremieux Nov 14, 2024
510953c
Fix PSM export from de novo
bittremieux Nov 14, 2024
3c69711
Generalize end-of-run reporting
bittremieux Nov 14, 2024
1526504
Log additional information on spectra with no matching candidates
bittremieux Nov 14, 2024
f18332d
Fix linting issue
bittremieux Nov 14, 2024
a71c440
Fix some testing warnings
bittremieux Nov 14, 2024
4aa257b
Log digestion settings
bittremieux Nov 14, 2024
d54b66f
Reduce logging level for spectra without candidates
bittremieux Nov 14, 2024
d97e251
Round peptide masses for consistent sorting
bittremieux Nov 18, 2024
db5e00f
Fox linting
bittremieux Nov 18, 2024
1e565c4
Remove superfluous PSM export
bittremieux Nov 18, 2024
18999cf
Update changelog
bittremieux Nov 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

### Added

- Casanovo-DB mode (`casanovo db_search`) to use Casanovo as a learned score function for sequence database searching (given a FASTA protein database).
- During training, model checkpoints will be saved at the end of each training epoch in addition to the checkpoints saved at the end of every validation run.
- Besides as a local file, model weights can be specified from a URL. Upon initial download, the weights file is cached for future re-use.
- Training and optimizer metrics can now be logged to a CSV file by setting the `log_metrics` config file option to true - the CSV file will be written to under a sub-directory of the output directory named `csv_logs`.
Expand Down
Loading
Loading