Descriptions

This folder archives the pipeline for checking simplex or duplex configurations in SRA ONT raw FAST5 data.

Execution environments

Python scripts for download and analysis These scripts require Python 3.7 or higher and a Linux distribution with hexdump installed. You can verify hexdump installation by running the following command in Linux

Python environment

To recreate the same Python conda environment use:

conda env create -f python3-env.yaml;

Linux environment

hexdump --version;

example output if you have hexdump installed

hexdump: invalid option -- '-'
usage: hexdump [-bcCdovx] [-e fmt] [-f fmt_file] [-n length]
               [-s skip] [file ...]
       hd      [-bcdovx]  [-e fmt] [-f fmt_file] [-n length]
               [-s skip] [file ...]

If you don't have hexdump in your Linux distributions, on Ubuntu try

sudo apt-get update;
sudo apt-get install bsdmainutils;

Pipelines

Build directory structure

mkdir -p ../results;
mkdir -p ../data;

Partial Download of FAST5 Data Due to the typically large size of FAST5 compressed files, only a portion of the data will be downloaded:

python partial_download.py -i DNA_raw_20240109.txt -w ../data > ../results/download_log.txt;

Preprocess and decompressing the download files This step involves preprocessing the downloaded files:

python hexdump_preprocess.py -w ../data;

Analyzing FAST5 Files using hexdump This analysis extracts the flowcell type, sequencing kit, and experiment time, which are essential for determining whether the configuration is simplex or duplex:

python hexdump_check.py -i ../data -o ../results/sra_fast5_ano.txt > ../results/annotation.log;

Repeat our results

To repeat our results, please install the aforementioned conda environment first and run

bash ./run_all.sh;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Descriptions

Execution environments

Python environment

Linux environment

Pipelines

Repeat our results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Descriptions

Execution environments

Python environment

Linux environment

Pipelines

Repeat our results