From 2e8f57936f300020cec48bdd7d47a3fd78dc97fb Mon Sep 17 00:00:00 2001 From: Wout Bittremieux Date: Mon, 25 Dec 2023 10:42:31 +0100 Subject: [PATCH 1/8] Allow manually triggering the PyPI upload --- .github/workflows/publish.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index dea50c18..cccf3356 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -6,6 +6,7 @@ name: PyPI on: release: types: [created] + workflow_dispatch: jobs: deploy: From 226b2cfcac3f82ee0a75533035d766d0441bf585 Mon Sep 17 00:00:00 2001 From: Wout Bittremieux Date: Mon, 25 Dec 2023 10:57:00 +0100 Subject: [PATCH 2/8] Update Actions versions --- .github/workflows/lint.yml | 4 ++-- .github/workflows/publish.yml | 7 +++---- .github/workflows/screenshots.yml | 2 +- .github/workflows/tests.yml | 4 ++-- 4 files changed, 8 insertions(+), 9 deletions(-) diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index fb937494..2d7fa400 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -14,9 +14,9 @@ jobs: lint: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v2 + - uses: actions/checkout@v4 - name: Setup Python 3.10 - uses: actions/setup-python@v2 + uses: actions/setup-python@v5 with: python-version: "3.10" diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index cccf3356..e64608c5 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -6,18 +6,17 @@ name: PyPI on: release: types: [created] - workflow_dispatch: jobs: deploy: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v2 + - uses: actions/checkout@v4 - name: Set up Python - uses: actions/setup-python@v2 + uses: actions/setup-python@v5 with: - python-version: '3.x' + python-version: "3.x" - name: Install dependencies run: | python -m pip install --upgrade pip diff --git a/.github/workflows/screenshots.yml b/.github/workflows/screenshots.yml index a9bcf896..7fb04c0d 100644 --- a/.github/workflows/screenshots.yml +++ b/.github/workflows/screenshots.yml @@ -16,7 +16,7 @@ jobs: ref: ${{ github.head_ref }} - name: Set up Python - uses: actions/setup-python@v4 + uses: actions/setup-python@v5 with: python-version: "3.10" diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml index 08001ed5..53483060 100644 --- a/.github/workflows/tests.yml +++ b/.github/workflows/tests.yml @@ -21,9 +21,9 @@ jobs: os: [ubuntu-latest, windows-latest, macos-latest] steps: - - uses: actions/checkout@v2 + - uses: actions/checkout@v4 - name: Set up Python 3.10 - uses: actions/setup-python@v2 + uses: actions/setup-python@v5 with: python-version: "3.10" From 6ee6351831411b87411fb3c5618961850542df57 Mon Sep 17 00:00:00 2001 From: Wout Bittremieux Date: Fri, 26 Jan 2024 09:55:32 +0100 Subject: [PATCH 3/8] Document how to use multiple GPUs --- docs/faq.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/docs/faq.md b/docs/faq.md index 15103cac..efc11808 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -1,5 +1,7 @@ # Frequently Asked Questions +## Running Casanovo + **I installed Casanovo and it worked before, but I after reopening Anaconda it says that Casanovo is not installed.** Make sure you are in the `casanovo_env` environment. You can ensure this by typing: @@ -27,6 +29,8 @@ However, the GitHub API is limited to maximum 60 requests per hour per IP addres Consequently, if Casanovo has been executed multiple times already, it might temporarily not be able to communicate with GitHub. You can avoid this error by explicitly specifying the model file using the `--model` parameter. +## GPU Troubleshooting + **Casanovo is very slow even when running on the GPU. How can I speed it up?** It is highly recommended to run Casanovo on the GPU to get the maximum performance. @@ -52,6 +56,22 @@ This means that there was not enough (free) memory available on your GPU to run We recommend trying to decrease the `train_batch_size` or `predict_batch_size` options in the [config file](https://github.com/Noble-Lab/casanovo/blob/main/casanovo/config.yaml) (depending on whether the error occurred during `train` or `denovo` mode) to reduce the number of spectra that are processed simultaneously. Additionally, we recommend shutting down any other processes that may be running on the GPU, so that Casanovo can exclusively use the GPU. +**How can I run Casanovo on a specific GPU device?** + +You can control which GPU(s) Casanovo uses by setting the `devices` option in the [configuration file](https://github.com/Noble-Lab/casanovo/blob/main/casanovo/config.yaml). +Analogously, this setting also controls the number of cores to use when running on a CPU only (which can be specified using the `accelerator` option). + +By default, Casanovo will automatically try to use the maximum number of devices available. +I.e., if your system has multiple GPUs, Casanovo will utilize all of those for maximum efficiency. +Alternatively, you can select a specific GPU by specifying the GPU number as the value for `devices`. +For example, if you have a four-GPU system, when specifying `devices: 1` in your config file Casanovo will only use the GPU with identifier `1`. + +The config file functionality only allows specifying a single GPU, by setting its id under `devices`, or all GPUs, by setting `devices: -1`. +If you want more fine-grained control to use some but not all GPUs on a multi-GPU system, the `CUDA_VISIBLE_DEVICES` environment variable can be used instead. +For example, by setting `CUDA_VISIBLE_DEVICES=1,3`, only GPUs `1` and `3` will be visible to Casanovo, and specifying `devices: -1` will allow it to utilize both of these. + +Note that when using `CUDA_VISIBLE_DEVICES`, the GPU numbers (potentially to be specified under `devices`) are reset to consecutively increase from `0`. + **I see "NotImplementedError: The operator 'aten::index.Tensor'..." when using a Mac with an Apple Silicon chip.** Casanovo can leverage Apple's Metal Performance Shaders (MPS) on newer Mac computers, which requires that the `PYTORCH_ENABLE_MPS_FALLBACK` is set to `1`: @@ -62,6 +82,8 @@ export PYTORCH_ENABLE_MPS_FALLBACK=1 This will need to be set with each new shell session, or you can add it to your `.bashrc` / `.zshrc` to set this environment variable by default. +## Training Casanovo + **Where can I find the data that Casanovo was trained on?** The [Casanovo results reported ](https://doi.org/10.1101/2023.01.03.522621) were obtained by training on two different datasets: (i) a commonly used nine-species benchmark dataset, and (ii) a large-scale training dataset derived from the MassIVE Knowledge Base (MassIVE-KB). @@ -107,6 +129,8 @@ To include new PTMs in Casanovo, you need to: It is unfortunately not possible to finetune a pre-trained Casanovo model to add new types of PTMs. Instead, such a model must be trained from scratch. +## Miscellaneous + **How can I generate a precision–coverage curve?** You can evaluate a trained Casanovo model compared to ground-truth peptide labels using a precision–coverage curve. From ebf5cd80c1bec17e76164d090da212d9afd0c434 Mon Sep 17 00:00:00 2001 From: Wout Bittremieux Date: Fri, 26 Jan 2024 10:04:30 +0100 Subject: [PATCH 4/8] Fix new black complaints --- casanovo/casanovo.py | 1 + casanovo/config.py | 1 + casanovo/data/datasets.py | 1 + casanovo/data/ms_io.py | 1 + casanovo/denovo/dataloaders.py | 1 + casanovo/denovo/evaluate.py | 1 + casanovo/denovo/model.py | 1 + casanovo/denovo/model_runner.py | 1 + casanovo/utils.py | 3 ++- casanovo/version.py | 1 + tests/conftest.py | 1 + tests/unit_tests/test_config.py | 3 ++- tests/unit_tests/test_runner.py | 1 + 13 files changed, 15 insertions(+), 2 deletions(-) diff --git a/casanovo/casanovo.py b/casanovo/casanovo.py index 0a1c3618..8bdfa58f 100644 --- a/casanovo/casanovo.py +++ b/casanovo/casanovo.py @@ -1,4 +1,5 @@ """The command line entry point for Casanovo.""" + import datetime import functools import logging diff --git a/casanovo/config.py b/casanovo/config.py index 0b5a1e4d..2a420de9 100644 --- a/casanovo/config.py +++ b/casanovo/config.py @@ -1,4 +1,5 @@ """Parse the YAML configuration.""" + import logging import shutil from pathlib import Path diff --git a/casanovo/data/datasets.py b/casanovo/data/datasets.py index 23b3d8e3..6244e88f 100644 --- a/casanovo/data/datasets.py +++ b/casanovo/data/datasets.py @@ -1,4 +1,5 @@ """A PyTorch Dataset class for annotated spectra.""" + from typing import Optional, Tuple import depthcharge diff --git a/casanovo/data/ms_io.py b/casanovo/data/ms_io.py index 47d99700..7be6ea8c 100644 --- a/casanovo/data/ms_io.py +++ b/casanovo/data/ms_io.py @@ -1,4 +1,5 @@ """Mass spectrometry file type input/output operations.""" + import collections import csv import operator diff --git a/casanovo/denovo/dataloaders.py b/casanovo/denovo/dataloaders.py index 998fa66a..fe5d6237 100644 --- a/casanovo/denovo/dataloaders.py +++ b/casanovo/denovo/dataloaders.py @@ -1,4 +1,5 @@ """Data loaders for the de novo sequencing task.""" + import functools import os from typing import List, Optional, Tuple diff --git a/casanovo/denovo/evaluate.py b/casanovo/denovo/evaluate.py index 75ac4b6a..cbf9e74f 100644 --- a/casanovo/denovo/evaluate.py +++ b/casanovo/denovo/evaluate.py @@ -1,4 +1,5 @@ """Methods to evaluate peptide-spectrum predictions.""" + import re from typing import Dict, Iterable, List, Tuple diff --git a/casanovo/denovo/model.py b/casanovo/denovo/model.py index 39d2027a..b1d51e9c 100644 --- a/casanovo/denovo/model.py +++ b/casanovo/denovo/model.py @@ -1,4 +1,5 @@ """A de novo peptide sequencing model.""" + import collections import heapq import logging diff --git a/casanovo/denovo/model_runner.py b/casanovo/denovo/model_runner.py index c7a9cab6..1db53289 100644 --- a/casanovo/denovo/model_runner.py +++ b/casanovo/denovo/model_runner.py @@ -1,5 +1,6 @@ """Training and testing functionality for the de novo peptide sequencing model.""" + import glob import logging import os diff --git a/casanovo/utils.py b/casanovo/utils.py index b497ac12..4125cd54 100644 --- a/casanovo/utils.py +++ b/casanovo/utils.py @@ -1,4 +1,5 @@ -"""Small utility functions""" +"""Small utility functions.""" + import logging import os import platform diff --git a/casanovo/version.py b/casanovo/version.py index d1b7f64e..579db300 100644 --- a/casanovo/version.py +++ b/casanovo/version.py @@ -1,4 +1,5 @@ """Package version information.""" + from typing import Optional diff --git a/tests/conftest.py b/tests/conftest.py index a690bd8a..d4e81e36 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -1,4 +1,5 @@ """Fixtures used for testing.""" + import numpy as np import psims import pytest diff --git a/tests/unit_tests/test_config.py b/tests/unit_tests/test_config.py index 7a0d7a26..89d32569 100644 --- a/tests/unit_tests/test_config.py +++ b/tests/unit_tests/test_config.py @@ -1,4 +1,5 @@ -"""Test configuration loading""" +"""Test configuration loading.""" + import pytest import yaml diff --git a/tests/unit_tests/test_runner.py b/tests/unit_tests/test_runner.py index 6be91831..d1e88e49 100644 --- a/tests/unit_tests/test_runner.py +++ b/tests/unit_tests/test_runner.py @@ -1,4 +1,5 @@ """Unit tests specifically for the model_runner module.""" + import pytest import torch From e17808b12da2503fa1303f09cc4d853c07aae258 Mon Sep 17 00:00:00 2001 From: Wout Bittremieux Date: Fri, 26 Jan 2024 10:06:38 +0100 Subject: [PATCH 5/8] One more black fix --- casanovo/denovo/model_runner.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/casanovo/denovo/model_runner.py b/casanovo/denovo/model_runner.py index 1db53289..85446118 100644 --- a/casanovo/denovo/model_runner.py +++ b/casanovo/denovo/model_runner.py @@ -307,9 +307,9 @@ def initialize_data_module( self, train_index: Optional[AnnotatedSpectrumIndex] = None, valid_index: Optional[AnnotatedSpectrumIndex] = None, - test_index: ( - Optional[Union[AnnotatedSpectrumIndex, SpectrumIndex]] - ) = None, + test_index: Optional[ + Union[AnnotatedSpectrumIndex, SpectrumIndex] + ] = None, ) -> None: """Initialize the data module From c3d2bbac7cc2550c524e04accde4765cdf850bd4 Mon Sep 17 00:00:00 2001 From: Melih Yilmaz <32707537+melihyilmaz@users.noreply.github.com> Date: Wed, 7 Feb 2024 00:14:39 -0800 Subject: [PATCH 6/8] Add non-enzymatic dataset to FAQ (#288) * Add non-enzymatic dataset to FAQ * Minor text changes --------- Co-authored-by: Wout Bittremieux --- docs/faq.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/faq.md b/docs/faq.md index efc11808..6096f9bf 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -86,7 +86,7 @@ This will need to be set with each new shell session, or you can add it to your **Where can I find the data that Casanovo was trained on?** -The [Casanovo results reported ](https://doi.org/10.1101/2023.01.03.522621) were obtained by training on two different datasets: (i) a commonly used nine-species benchmark dataset, and (ii) a large-scale training dataset derived from the MassIVE Knowledge Base (MassIVE-KB). +The [Casanovo results reported](https://doi.org/10.1101/2023.01.03.522621) were obtained by training on two different datasets: (i) a commonly used nine-species benchmark dataset, and (ii) a large-scale training dataset derived from the MassIVE Knowledge Base (MassIVE-KB). All data for the _nine-species benchmark_ is available as annotated MGF files [on MassIVE](https://doi.org/doi:10.25345/C52V2CK8J). Using these data, Casanovo was trained in a cross-validated fashion, training on eight species and testing on the remaining species. @@ -97,6 +97,9 @@ To compile this dataset yourself, on the [MassIVE website](https://massive.ucsd. This will give you a zipped TSV file with the metadata and peptide identifications for all 30 million PSMs. Using the filename (column "filename") you can then retrieve the corresponding peak files from the MassIVE FTP server and extract the desired spectra using their scan number (column "scan"). +The _non-enzymatic dataset_, used to train a non-tryptic version of Casanovo, was created by selecting PSMs with a uniform distribution of amino acids at the C-terminal peptide positions from two datasets: MassIVE-KB and PROSPECT. +Training, validation, and test splits for the non-enzymatic dataset are available as annotated MGF files [on MassIVE](https://doi.org/doi:10.25345/C5KS6JG0W). + **How do I know which model to use after training Casanovo?** By default, Casanovo saves a snapshot of the model weights after every 50,000 training steps. From 3a25f5e6c79cfcbdcf677e382ef4694a649c7e5d Mon Sep 17 00:00:00 2001 From: Melih Yilmaz <32707537+melihyilmaz@users.noreply.github.com> Date: Tue, 13 Feb 2024 11:45:47 -0800 Subject: [PATCH 7/8] Add lr scheduler FAQ --- docs/faq.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/faq.md b/docs/faq.md index 15103cac..395e5d56 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -107,6 +107,15 @@ To include new PTMs in Casanovo, you need to: It is unfortunately not possible to finetune a pre-trained Casanovo model to add new types of PTMs. Instead, such a model must be trained from scratch. +**How can I change the learning rate schedule used during training?** + +By default, Casanovo uses a learning rate schedule that combines linear warm up followd by a cosine wave shaped decay as implemented in [`CosineWarmupScheduler`](https://github.com/Noble-Lab/casanovo/blob/c3d2bbac7cc2550c524e04accde4765cdf850bd4/casanovo/denovo/model.py#L972C7-L972C28) during training. +To use a different learning rate schedule, the only thing you need to do is to set the [`lr_scheduler`](https://github.com/Noble-Lab/casanovo/blob/c3d2bbac7cc2550c524e04accde4765cdf850bd4/casanovo/denovo/model.py#L966) variable in the `model.py` file to the learning rate scheduler you wish to use, for example: + +`lr_scheduler = torch.optim.lr_scheduler.LinearLR(optimizer, total_iters=self.warmup_iters)` + +You can use any of the scheduler classes available in [`torch.optim.lr_scheduler`](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) or implement your custom learning rate schedule similar to `CosineWarmupScheduler`. + **How can I generate a precision–coverage curve?** You can evaluate a trained Casanovo model compared to ground-truth peptide labels using a precision–coverage curve. From 9789a4913b6f594b623bf16c1a6a9310f7657d77 Mon Sep 17 00:00:00 2001 From: Wout Bittremieux Date: Wed, 14 Feb 2024 14:30:37 +0100 Subject: [PATCH 8/8] Don't refer to specific line numbers Because they'll get outdated. --- docs/faq.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/faq.md b/docs/faq.md index 02dccec0..a3103601 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -134,10 +134,12 @@ Instead, such a model must be trained from scratch. **How can I change the learning rate schedule used during training?** -By default, Casanovo uses a learning rate schedule that combines linear warm up followd by a cosine wave shaped decay as implemented in [`CosineWarmupScheduler`](https://github.com/Noble-Lab/casanovo/blob/c3d2bbac7cc2550c524e04accde4765cdf850bd4/casanovo/denovo/model.py#L972C7-L972C28) during training. -To use a different learning rate schedule, the only thing you need to do is to set the [`lr_scheduler`](https://github.com/Noble-Lab/casanovo/blob/c3d2bbac7cc2550c524e04accde4765cdf850bd4/casanovo/denovo/model.py#L966) variable in the `model.py` file to the learning rate scheduler you wish to use, for example: +By default, Casanovo uses a learning rate schedule that combines linear warm up followed by a cosine wave shaped decay (as implemented in `CosineWarmupScheduler` in `casanovo/denovo/model.py`) during training. +To use a different learning rate schedule, you can specify an alternative learning rate scheduler as follows (in the `lr_scheduler` variable in function `Spec2Pep.configure_optimizers` in `casanovo/denovo/model.py`): -`lr_scheduler = torch.optim.lr_scheduler.LinearLR(optimizer, total_iters=self.warmup_iters)` +``` +lr_scheduler = torch.optim.lr_scheduler.LinearLR(optimizer, total_iters=self.warmup_iters) +``` You can use any of the scheduler classes available in [`torch.optim.lr_scheduler`](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) or implement your custom learning rate schedule similar to `CosineWarmupScheduler`.