GitHub - lakhujanivijay/Awesome-One-Liners: :rocket: Ditch coding! Get it done in a line! This repository consists of quick one liners in shell to manipulate NGS data.

One line, when multiple are not enough 👍

Give a 2 column, tab separated list of read no. and read length from fastq file

zcat whatever.fastq.gz | paste - - - - | awk '{print NR " " (length($3))}'

Count total reads in a fastq file

zcat whatever.fastq.gz | wc -l | awk '{print $1/4}'

Change extension of multiple files at once.

In below example, the extension changes from *.scafSeq to *.fa

for f in *.scafSeq; do mv "$f" "$(basename "$f" .scafSeq).fa"; done

rename command can also come handy in such cases. For e.g. Rename all .fastq files as .fasta

rename .fastq .fasta *.fastq

Get A T G C counts for all sequences from a multi fasta file

echo -e "seq_id\tA\tU\tG\tC"; while read line; do echo $line | grep ">" | sed 's/>//g'; for i in A U G C;do echo $line | grep -v ">" | grep -o $i | wc -l | grep -v "^0"; done; done < test.fa | paste - - - - -

Counting number of sequences in a fasta file:

grep -c "^>" file.fa

Add something to end of all header lines:

sed 's/>.*/&WHATEVERYOUWANT/' file.fa > outfile.fa

Clean up a fasta file so only first column of the header is outputted:

awk '{print $1}' file.fa > output.fa

Count the number of sequences in clusters generating using CD-HIT:

for i in *.clstr; do echo $i ; grep ">Cluster" -B 1 $i --no-group-separator | paste - - | awk '{print $1"_"$2 " "$3+1}' > $i.count.txt ; done

Change extension of fastq files in batch

rename 's/_fastp.fastq.gz/.fq.gz/' *.gz

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

One line, when multiple are not enough 👍

Give a 2 column, tab separated list of read no. and read length from fastq file

Count total reads in a fastq file

Change extension of multiple files at once.

Get A T G C counts for all sequences from a multi fasta file

Counting number of sequences in a fasta file:

Add something to end of all header lines:

Clean up a fasta file so only first column of the header is outputted:

Count the number of sequences in clusters generating using CD-HIT:

Change extension of fastq files in batch

About

Releases

Packages

lakhujanivijay/Awesome-One-Liners

Folders and files

Latest commit

History

Repository files navigation

One line, when multiple are not enough 👍

Give a 2 column, tab separated list of read no. and read length from fastq file

Count total reads in a fastq file

Change extension of multiple files at once.

Get A T G C counts for all sequences from a multi fasta file

Counting number of sequences in a fasta file:

Add something to end of all header lines:

Clean up a fasta file so only first column of the header is outputted:

Count the number of sequences in clusters generating using CD-HIT:

Change extension of fastq files in batch

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages