diff --git a/utilities/README.md b/utilities/README.md index b50f6f4e2..77419c1f4 100644 --- a/utilities/README.md +++ b/utilities/README.md @@ -2,7 +2,7 @@ Utilities are used primarily to help load data into MongoDB and to support fast normalization. ## SPDI_Normalization -This code converts a chromosome-level variant, as derived from a VCF, into a contextual SPDI of the same build, using the algorithm described [here](https://vrs.ga4gh.org/en/stable/impl-guide/normalization.html). To run the code, you'll need to first download GRCh38 and GRCh37 Fasta files from [NCBI Human Genome Resources page](https://www.ncbi.nlm.nih.gov/genome/guide/human/), and change the Python code to point to the downloaded files. The first time you run the code, the Fasta files get indexed, so it'll take longer. +This code converts a chromosome-level variant, as derived from a VCF, into a contextual SPDI of the same build, using the algorithm described [here](https://vrs.ga4gh.org/en/stable/impl-guide/normalization.html). To run the code, you'll need to first download GRCh38 and GRCh37 Fasta files (from [this page](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/), navigate to latest GRCh37 and GRCh38 assemblies (e.g. [GRCh37](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.25_GRCh37.p13/); [GRCh38](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/)). For each assembly, download the FASTA file, which is named "..._genomic.fna.gz") and change the Python code to point to the downloaded files. The first time you run the code, the Fasta files get indexed, so it'll take longer. ## bed2json Converts a BED file into a format suitable for loading into MongoDB. Chromosome numbering must include 'chr': 'chr1', 'chrX', 'chrY', 'chrM'. BED file must be sorted by chromosome, by position (bedtools sort default).