python_parsers

Python scripts for parsing various data files for extraction of relevant biological information

parse_PDB accepts a Protein Data Bank file format (such as 5kk.pdb), and parses for the distribution of atomic coordinates between hydrophilic and hydrophobic amino acids.

parse_chromosome accepts a .txt coding sequence or FASTA file (such as drosophila_2L.txt), and retrieves all coding sequences with the corresponding genes names using matched indices. Parsing of a specific helix turn helix motif in the nucleotide sequences returns a list of candidate genes containing this motif, which can be used in a BioMart search for gene ontology.

adaptor_parse accepts a file folder containing any fastq files held in within the sub-directories of the main input directory. It parses these files to retrieve their multiplexing adaptors for creating bulk scripts to execute cutadapt for adaptor trimming prior to alignment.

kallisto_quant accepts two .txt files for input: one containing a list of partial filenames for rna-seq libraries, and another containing the list of absolute/full paths for the same libraries. it generates a shell script that will quantify all of the libraries using kallisto quant in the single read format.

dna_conversion accepts a nucleotide coding file and returns all open reading frames with the amino acid equivalent codons.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

python_parsers

Files

README.md

Latest commit

History

README.md

File metadata and controls

python_parsers