Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
rhdolin authored Sep 6, 2024
1 parent 3e18ec4 commit 3fd70a9
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion utilities/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Utilities are used primarily to help load data into MongoDB and to support fast normalization.

## SPDI_Normalization
This code converts a chromosome-level variant, as derived from a VCF, into a contextual SPDI of the same build, using the algorithm described [here](https://vrs.ga4gh.org/en/stable/impl-guide/normalization.html). To run the code, you'll need to first download GRCh38 and GRCh37 Fasta files from [NCBI Human Genome Resources page](https://www.ncbi.nlm.nih.gov/genome/guide/human/), and change the Python code to point to the downloaded files. The first time you run the code, the Fasta files get indexed, so it'll take longer.
This code converts a chromosome-level variant, as derived from a VCF, into a contextual SPDI of the same build, using the algorithm described [here](https://vrs.ga4gh.org/en/stable/impl-guide/normalization.html). To run the code, you'll need to first download GRCh38 and GRCh37 Fasta files (from [this page](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/), navigate to latest GRCh37 and GRCh38 assemblies (e.g. [GRCh37](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.25_GRCh37.p13/); [GRCh38](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/)). For each assembly, download the FASTA file, which is named "..._genomic.fna.gz") and change the Python code to point to the downloaded files. The first time you run the code, the Fasta files get indexed, so it'll take longer.

## bed2json
Converts a BED file into a format suitable for loading into MongoDB. Chromosome numbering must include 'chr': 'chr1', 'chrX', 'chrY', 'chrM'. BED file must be sorted by chromosome, by position (bedtools sort default).
Expand Down

0 comments on commit 3fd70a9

Please sign in to comment.