Skip to content

Commit

Permalink
added a recoding function for alignments. NT and AA alphabets can now…
Browse files Browse the repository at this point in the history
… be reduced down to fewer characters
  • Loading branch information
JLSteenwyk committed Feb 28, 2024
1 parent f6e369a commit c45e3e1
Show file tree
Hide file tree
Showing 16 changed files with 837 additions and 1 deletion.
116 changes: 116 additions & 0 deletions docs/usage/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,122 @@ Options: |br|

|
Alignment recoding
##################
Function names: alignment_recoding; aln_recoding; recode |br|
Command line interface: pk_alignment_recoding; pk_aln_recoding; bk_recode

Recode alignments using reduced character states.

Alignments can be recoded using established or
custom recoding schemes. Recoding schemes are
specified using the -c/--code argument. Custom
recoding schemes can be used and should be formatted
as a two column file wherein the first column is the
recoded character and the second column is the character
in the alignment.

.. code-block:: shell
phykit alignment_recoding <fasta> [-c/--code <code>]
Codes for which recoding scheme to use: |br|
**RY-nucleotide** |br|
R = purines (i.e., A and G) |br|
Y = pyrimidines (i.e., T and C) |br|

**SandR-6** |br|
0 = A, P, S, and T |br|
1 = D, E, N, and G |br|
2 = Q, K, and R |br|
3 = M, I, V, and L |br|
4 = W and C |br|
5 = F, Y, and H |br|

**KGB-6** |br|
0 = A, G, P, and S |br|
1 = D, E, N, Q, H, K, R, and T |br|
2 = M, I, and L |br|
3 = W |br|
4 = F and Y |br|
5 = C and V |br|

**Dayhoff-6** |br|
0 = A, G, P, S, and T |br|
1 = D, E, N, and Q |br|
2 = H, K, and R |br|
3 = I, L, M, and V |br|
4 = F, W, and Y |br|
5 = C |br|

**Dayhoff-9** |br|
0 = D, E, H, N, and Q |br|
1 = I, L, M, and V |br|
2 = F and Y |br|
3 = A, S, and T |br|
4 = K and R |br|
5 = G |br|
6 = P |br|
7 = C |br|
8 = W |br|

**Dayhoff-12** |br|
0 = D, E, and Q |br|
1 = M, L, I, and V |br|
2 = F and Y |br|
3 = K, H, and R |br|
4 = G |br|
5 = A |br|
6 = P |br|
7 = S |br|
8 = T |br|
9 = N |br|
A = W |br|
B = C |br|

**Dayhoff-15** |br|
0 = D, E, and Q |br|
1 = M and L |br|
2 = I and V |br|
3 = F and Y |br|
4 = G |br|
5 = A |br|
6 = P |br|
7 = S |br|
8 = T |br|
9 = N |br|
A = K |br|
B = H |br|
C = R |br|
D = W |br|
E = C |br|

**Dayhoff-18** |br|
0 = F and Y |br|
1 = M and L |br|
2 = I |br|
3 = V |br|
4 = G |br|
5 = A |br|
6 = P |br|
7 = S |br|
8 = T |br|
9 = D |br|
A = E |br|
B = Q |br|
C = N |br|
D = K |br|
E = H |br|
F = R |br|
G = W |br|
H = C |br|

Options: |br|
*<alignment>*: first argument after function name should be an alignment file |br|
*-c/\-\-code*: argument to specify the recoding scheme to use

|
Column score
############

Expand Down
141 changes: 141 additions & 0 deletions phykit/phykit.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from .services.alignment import (
AlignmentLength,
AlignmentLengthNoGaps,
AlignmentRecoding,
ColumnScore,
CreateConcatenationMatrix,
DNAThreader,
Expand Down Expand Up @@ -144,6 +145,8 @@ def __init__(self):
- calculates alignment length
alignment_length_no_gaps (alias: aln_len_no_gaps; alng)
- calculates alignment length after removing sites with gaps
alignment_recoding (alias: aln_recoding, recode)
- recode alignments using reduced character schemes
column_score (alias: cs)
- calculate column score between a reference and query alignment
create_concatenation_matrix (alias: create_concat; cc)
Expand Down Expand Up @@ -265,6 +268,8 @@ def run_alias(self, command, argv):
return self.alignment_length(argv)
elif command in ["aln_len_no_gaps", "alng"]:
return self.alignment_length_no_gaps(argv)
elif command in ["aln_recoding", "recode"]:
return self.alignment_recoding(argv)
elif command in "cs":
return self.column_score(argv)
elif command in ["get_entry", "ge"]:
Expand Down Expand Up @@ -451,6 +456,142 @@ def alignment_length_no_gaps(argv):
args = parser.parse_args(argv)
AlignmentLengthNoGaps(args).run()

@staticmethod
def alignment_recoding(argv):
parser = ArgumentParser(
add_help=True,
usage=SUPPRESS,
formatter_class=RawDescriptionHelpFormatter,
description=textwrap.dedent(
f"""\
{help_header}
Recode alignments using reduced character states.
Alignments can be recoded using established or
custom recoding schemes. Recoding schemes are
specified using the -c/--code argument. Custom
recoding schemes can be used and should be formatted
as a two column file wherein the first column is the
recoded character and the second column is the character
in the alignment.
Aliases:
alignment_recoding, aln_recoding, recode
Command line interfaces:
bk_alignment_recoding, bk_aln_recoding, bk_recode
Usage:
phykit alignment_recoding <fasta> -c/--code <code>
Options
=====================================================
<fasta> first argument after
function name should be
a fasta file
-c/--code recoding scheme to use
Codes for which recoding scheme to use
=====================================================
RY-nucleotide
R = purines (i.e., A and G)
Y = pyrimidines (i.e., T and C)
SandR-6
0 = A, P, S, and T
1 = D, E, N, and G
2 = Q, K, and R
3 = M, I, V, and L
4 = W and C
5 = F, Y, and H
KGB-6
0 = A, G, P, and S
1 = D, E, N, Q, H, K, R, and T
2 = M, I, and L
3 = W
4 = F and Y
5 = C and V
Dayhoff-6
0 = A, G, P, S, and T
1 = D, E, N, and Q
2 = H, K, and R
3 = I, L, M, and V
4 = F, W, and Y
5 = C
Dayhoff-9
0 = D, E, H, N, and Q
1 = I, L, M, and V
2 = F and Y
3 = A, S, and T
4 = K and R
5 = G
6 = P
7 = C
8 = W
Dayhoff-12
0 = D, E, and Q
1 = M, L, I, and V
2 = F and Y
3 = K, H, and R
4 = G
5 = A
6 = P
7 = S
8 = T
9 = N
A = W
B = C
Dayhoff-15
0 = D, E, and Q
1 = M and L
2 = I and V
3 = F and Y
4 = G
5 = A
6 = P
7 = S
8 = T
9 = N
A = K
B = H
C = R
D = W
E = C
Dayhoff-18
0 = F and Y
1 = M and L
2 = I
3 = V
4 = G
5 = A
6 = P
7 = S
8 = T
9 = D
A = E
B = Q
C = N
D = K
E = H
F = R
G = W
H = C
""" # noqa
),
)

parser.add_argument("alignment", type=str, help=SUPPRESS)
parser.add_argument("-c", "--code", type=str, help=SUPPRESS)
args = parser.parse_args(argv)
AlignmentRecoding(args).run()

@staticmethod
def column_score(argv):
parser = ArgumentParser(
Expand Down
20 changes: 20 additions & 0 deletions phykit/recoding_tables/Dayhoff-12.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
0 D
0 E
0 Q
1 M
1 L
1 I
1 V
2 F
2 Y
3 K
3 H
3 R
4 G
5 A
6 P
7 S
8 T
9 N
A W
B C
20 changes: 20 additions & 0 deletions phykit/recoding_tables/Dayhoff-15.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
0 D
0 E
0 Q
1 M
1 L
2 I
2 V
3 F
3 Y
4 G
5 A
6 P
7 S
8 T
9 N
A K
B H
C R
D W
E C
20 changes: 20 additions & 0 deletions phykit/recoding_tables/Dayhoff-18.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
0 F
0 Y
1 M
1 L
2 I
3 V
4 G
5 A
6 P
7 S
8 T
9 D
A E
B Q
C N
D K
E H
F R
G W
H C
20 changes: 20 additions & 0 deletions phykit/recoding_tables/Dayhoff-6.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
0 A
0 G
0 P
0 S
0 T
1 D
1 E
1 N
1 Q
2 H
2 K
2 R
3 I
3 L
3 M
3 V
4 F
4 W
4 Y
5 C
20 changes: 20 additions & 0 deletions phykit/recoding_tables/Dayhoff-9.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
0 D
0 E
0 H
0 N
0 Q
1 I
1 L
1 M
1 V
2 F
2 Y
3 A
3 S
3 T
4 K
4 R
5 G
6 P
7 C
8 W
Loading

0 comments on commit c45e3e1

Please sign in to comment.