random transformation of sequences, including deletions, additions and mutations.
SeqBox variation
usage: SeqBox variation [-h] [-out OUT] [-od OUT_DIR] [-vi VIN] [-vr VREPLACE]
[-vd VDICT]
optional arguments:
-h, --help show this help message and exit
-out OUT, --out_file OUT
sequence out file with TSV format
-od OUT_DIR, --out_dir OUT_DIR
out direction
-vi VIN, --vinfile VIN
-vr VREPLACE, --vreplace VREPLACE
replace base number, like: 1,2,3
-vd VDICT, --vdict VDICT
variation dict, like: '1:A-2:T-3:C-4:G'
from seqbox import SEQ
seq = SEQ(name="test")
seq.name
#'test'
seq.variation(replaces="1,2,3", seq="ATCGTCGTAGTCGTAGCTAGTCGTAGTAGCTAGT")
#2024-05-21 16:44:48.188 | WARNING | seqbox.seq:variation:118 - not found params dict_base, use default dict_base
#'ATCGTCGTAGTCGTAGCTAGTCGTAGTAGCTAGT,ACGTCGTAGTCGTACTAGTCGTAGTAGCTAGT,ATCGTCGTAGTGTAGTAGTCGTAGTAGCTAT'
The main parameters are base variation length(-vr
or --vreplace
), input file(-vi
or --vinfile
), and variation encoding(-vd
or --vdict
).
SeqBox variation -vi test.tsv -out test_variation -vr 1,2,3
input: test.tsv
#seq
ACCATTAGCACCAACAGGCAAGCTCCTGCACGGTA
GTGCAGGCCCAACTTTCCCCACCTATAGGCTACGG
GACCGGGCGGGACTTTCGCCCAATCATCACATACC
AACCGGTAGTCGATGAGCGCTCATTAACACGAAGC
GTTCTGGTCATTTATCCTCCCTCAGGTACGGATTT
TTGCCGCTCAATTGAAAGGTACTGCCAGGAGTGTC
AGGCCAGAACGGATATACTAGTTGCTCCAACCTGA
ATTGACAGCAGGCGCAAGACATGCCCTAAGCCCTA
GTAACTATCCCGAGTCGACGCAGATTGTGCTTCGG
CGTAGCCTAGGCGTGGGATTATAACTCTCCGGTAA
output: test_variation.tsv
seq_raw seq_var1 seq_var2 seq_var3
ACCATTAGCACCAACAGGCAAGCTCCTGCACGGTA ACCATAGCACCAACAGGCAAGCTCCTGCACGGTA ACCATTAGCACCTACAGGCAGCTCCTGCACGGTA ACCATTAGCACCAACAGGCAAGCTCCCGCCGGA
GTGCAGGCCCAACTTTCCCCACCTATAGGCTACGG GTGCAGGCCCAACTTTCCCCACCTATAGGATACGG GTCAAGGCCCAACTTTCCCCACCTATAGGCTACGG GTGCAGCCAACTTTCCCACCTATAGGCTACGG
GACCGGGCGGGACTTTCGCCCAATCATCACATACC GACCGCGGCGGGACTTTCGCCCAATCATCACATACC GACCGGGCGGGACTTTCCCCAATGGATCACATACC GACCGGGCGGGACTTTCGGCCAATCATCACATACA
AACCGGTAGTCGATGAGCGCTCATTAACACGAAGC AACCGGTAGTCGATAGCGCTCATTAACACGAAGC AACCGGTAGTATGATGAGCGCTCATTAACACTAAGC AACCGGTGTCGATGAGCGCTTCATTAACACGAAGC
GTTCTGGTCATTTATCCTCCCTCAGGTACGGATTT GTTCTGGTCATTTATCCTCCCTCAGGTACGGATTA GTTCTGGTCATTTATCTCCCTCAGGTACGGATT GTTCTGGTCATTATCCTCTTGTTCAGGTACGGATTT
TTGCCGCTCAATTGAAAGGTACTGCCAGGAGTGTC TTGCCGCTCAATTGAAAGGTACTGCCAGGAGTGTC TTGCCGCTCAATTAAAGGTACTGCAGGAGTGTC TTGCCGCTCAATTGTGGAAGGTACTGCCAGGCGTGTC
AGGCCAGAACGGATATACTAGTTGCTCCAACCTGA AGGCCAGAACGGATATACTAGTTGCTCCAACCTTA AGGCCAGAACGCTATAAAACTAGTTGCTCCAACCTGA AGGCACAACGGATATACTAGTGCTCCAACCTGA
ATTGACAGCAGGCGCAAGACATGCCCTAAGCCCTA ATTGACAGCAGGCGCAAACACATGCCCTAAGCCCTA ATTGACAGCAGGCGCAAAACATGCCCTAAGCCTA ATTGATGAGCAGGCGCAATACATGCCCTAGCCCTA
GTAACTATCCCGAGTCGACGCAGATTGTGCTTCGG GTAACTATCCCGAGTCGACGCAGATTGTGCTCGG GTAACTATCCCGAGTCACGCAGATTGGAGCTTCGG GTAACTATCCCGATCGACGCAGATTGTCCCTTCGG
CGTAGCCTAGGCGTGGGATTATAACTCTCCGGTAA CGTAGCCTAGGCGTGGGATATAACTCTCCGGTAA CGTAAGCCTAGGCGTGGGATATAACTCTCCGGTAA CGTAGCCGAAGCGTGGGATATAACTCTCCGGTAA
default variation dict: deletions, mutations, and additions, the ratio of the three mutation types is 1:1:1. Customize using parameters(-vd
or -vdict
), like: 0- ;00- ;1-A;2-T
.
dict_var = {"0":"", "00":"","000":"","0000":"","00000":"","00001":"","00002":"","00003":"","00004":"","00005":"","00006":"","00007":"","00008":"","00009":"","000010":"","000011":"",
"0001":"A","001":"A","01":"A","1":"A","0002":"T","002":"T","02":"T","2":"T","0003":"C","003":"C","03":"C","3":"C","04":"G","004":"G","0004":"G","4":"G",
"5":"AA","6":"AT","7":"AC","8":"AG","9":"TA","10":"TT","11":"TC","12":"TG","13":"CA","14":"CT","15":"CC","16":"CG","17":"GA","18":"GT","19":"GC","20":"GG"}