Skip to content

Commit

Permalink
Updated GNOme database
Browse files Browse the repository at this point in the history
  • Loading branch information
douweschulte committed Sep 13, 2024
1 parent 8e3ac3c commit 84564d3
Show file tree
Hide file tree
Showing 8 changed files with 33 additions and 38 deletions.
26 changes: 0 additions & 26 deletions proforma_grammar.md

This file was deleted.

2 changes: 1 addition & 1 deletion rustyms-imgt-generate/src/combine.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ use crate::imgt_gene::IMGTGene;
use crate::structs::DataItem;

use crate::shared::{
AnnotatedSequence, Annotation, ChainType, Constant, Gene, GeneType, Germline, Germlines,
AnnotatedSequence, Annotation, Gene, Germline, Germlines,
Species,
};
use crate::structs::SingleSeq;
Expand Down
2 changes: 1 addition & 1 deletion rustyms/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ repository = "https://github.com/snijderlab/rustyms"
readme = "README.md"
include = [
"src/**/*",
"databases/**/*",
"databases/**/*.gz",
"README.md",
"build.rs",
"benches/**/*",
Expand Down
23 changes: 23 additions & 0 deletions rustyms/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,26 @@ It has multiple features which allow you to slim it down if needed (all are enab
* `rand` - allows the generation of random peptides.
* `rayon` - enables parallel iterators using rayon, mostly for `imgt` but also in consecutive align.
* `mzdata` - enables integration with [mzdata](https://github.com/mobiusklein/mzdata) which has more advanced raw file support.

## Sources for the downloaded files

- PSI-MOD: https://github.com/HUPO-PSI/psi-mod-CV (2021-06-13 v1.031.6)
- Unimod: http://www.unimod.org/obo/unimod.obo (2024-08-12 11:33)
- RESID: ftp://ftp.proteininformationresource.org/pir_databases/other_databases/resid/ (2018-04-31 RESIDUES.XML)
- XL-MOD: https://raw.githubusercontent.com/HUPO-PSI/mzIdentML/master/cv/XLMOD.obo (2021-03-23 1.1.12)
- GNO: http://purl.obolibrary.org/obo/gno.obo (2024-05-21) structures: https://glycosmos.org/download/ ('List of all GlyCosmos Glycans data.') (downloaded 2024-07-02)
- To save space (crates.io has a hard limit on crate size) the unused columns of the structures csv are remove (only 0 and 1 are kept) and the `gno.obo` is trimmed using the following regex: `(property_value: GNO:00000(022|023|041|042|101|102) .*$\n)|(def: .*$\n)|(synonym: .*$\n)|(name: [^ ]*$\n)` (any matching line is removed) and the following replacement regex `(is_a: [^ ]*) ! .*\n` with `$1\n`.
- The structures csv file has only the first two columns kept for the same reason, also remove the two lines starting with `"`
- Isotopic atomic masses: https://ciaaw.org/data/IUPAC-atomic-masses.csv (2021-03-17)

## Ontologies

| Name | Modifications | Numbered | Rules | Diagnostic ions / neutral losses | Description / synonyms / cross ids |
| ------- | ------------- | -------- | ----- | -------------------------------- | ---------------------------------- |
| Unimod | Yes | Yes | Yes | Yes | Yes |
| PSI-MOD | Yes | Yes | Yes | NA | Yes |
| RESID | Yes | Yes | Yes | NA | Yes |
| XL-MOD | Yes | Yes | Yes | Yes | Yes |
| GNO | Yes | NA | NA | NA (solved for all glycans) | NA |

Note some modifications that do not fit the assumptions of rustyms might be missing from the ontologies. Examples of these are cross-links with more then 2 positions from XL-MOD and RESID, and modifications with different diff_formulas based on which location they bound from RESID. Additionally only the Glycans of a specific mass or with a structure.
Binary file modified rustyms/databases/GNOme.obo.gz
Binary file not shown.
Binary file modified rustyms/databases/glycosmos_glycans_list.csv.gz
Binary file not shown.
17 changes: 7 additions & 10 deletions rustyms/src/build/gnome.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,18 +52,15 @@ fn parse_gnome(_debug: bool) -> HashMap<String, GNOmeModification> {
continue;
}
// name: glycan of molecular weight 40.03 Da
let name = &obj.lines["name"][0];
let mut modification = GNOmeModification {
code_name: obj.lines["id"][0][4..].to_lowercase(),
is_a: obj.lines["is_a"][0]
.split_once('!')
.map(|(a, _)| a.trim()[4..].to_lowercase())
.unwrap(),
mass: if name.len() > 30 {
name[27..name.len() - 3].parse::<f64>().ok()
} else {
None
},
is_a: obj.lines["is_a"][0].trim()[4..].to_lowercase(),
mass: obj
.lines
.get("name")
.map(|e| &e[0])
.filter(|n| n.len() > 30)
.and_then(|name| name[27..name.len() - 3].parse::<f64>().ok()),
..GNOmeModification::default()
};

Expand Down
1 change: 1 addition & 0 deletions rustyms/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
clippy::suboptimal_flops,
clippy::too_many_lines
)]
#![recursion_limit = "256"]

#[cfg(feature = "align")]
/// Only available with feature `align`.
Expand Down

0 comments on commit 84564d3

Please sign in to comment.