-
Notifications
You must be signed in to change notification settings - Fork 0
/
Notes
189 lines (152 loc) · 12 KB
/
Notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
Table of Contents:
Sections:
Coding Process:
i. Ant_Lymph
ii. Main
iii. Bcell_Selection
iv. Parameter_Clustering
v. Arg_Parser
Testing:
i. Ant_Lymph
ii. Main
Data:
i. Collection
Introduction/Objective:
Multiple disciplines such as biology, economics, and sociology use game theory to mathematically analyze and
predict the outcomes of conflict situations. Within a game, there are a known and finite number of players that
follow a given set of rules (Jarosz & Burczyński, 2011a). Within the genetics field specifically, game theory is
used to calculate the relative fitnesses of specific genotypes and phenotypes of a given population over a set
amount of time. The conflict situation of interest for geneticists is the selection of one genotype over
another. An individual has a genotype composed of AA, AB, or BB alleles representing a corresponding phenotype.
Each allele is given a calculated frequency: x and (1-x). In a simple game with 2 alleles A and B, a 2X2 matrix
can represent the conflict situation between two genotypes. For two players, traditional game concepts are
easily applied, however when multiplayer games are simulated, they can result in complicated dynamics
(Gokhale & Traulsen, 2014), thus emphasizing the need for computer generated game models. Computer generated
game theory is already used to understand cancer cell phenomena (Archetti & Pienta, 2019), to optimize cancer
treatments ( Staňková et al., 2019), and can be used to also model games within the adaptive immune system.
The clonal_selection() function represents the positive selection immunological process known as affinity
maturation where slight mutation rates gradually produce B-cells whose paratopes (lymphocyte binding site) have
a higher affinity to the antigen’s epitope (antigen binding site) and suppression() is representative of
negative selection B-cell suppression that occurs via self-antigen presentation to the B-cell which, if it
binds to the self antigen, induces apoptosis. A mutation in a single key position can increase affinity by more
than one order of magnitude against haptens, but for protein antigens more than one key position is required to
increase antibody affinity (Murugan et al., 2018).
The function then runs for a user-set number of iter = i iterations. After all iterations are complete, the
remaining individuals in the player n population are placed into result_population. From these parameters and
functions, the IMGAMO algorithm can currently be used to simulate positive and negative selection of clonal
B-cells (Jarosc & Burczyński, 2011b). The goal of this study is to include antigen-lymphocyte interactions prior
to and following the clonal selection of B-cells in the IMGAMO algorithm to provide a broader representation of
the adaptive immune system.
Coding Process:
Ant_Lymph
210130
The Antigen class organizes the properties and functionalities of a chosen antigen from the Immune Epitope
Database (IEDB). The Lymphocyte class has similar qualities, but the paratope is a randomly generated aa
sequence of the same length as the epitope.Experimental properties include epitope, pop, and n for both the
Antigen and Epitope classes. Each variable will be parameterized while the others remain constant to identify
patterns and significance of the variable’s dominance over the system.
210913
I realized that I needed a dictionary of populations to pass to Bcell_Selction.py instead of a single instance
of an object so I created a dictionary of N populations as specified in the arguments (terminal commands) to
be passed onto B-cell selection.
Main
210130
The Main class is an environment where all classes interact and perform their functionalities with one another.
The function 'binding' simulates the initial binding of the antigen and lymphocyte where the probability of
binding is calculated by the percent match of the amino acid binding sites.
210203
The function utilizes a for loop to iterate through each index of the string "a" and match it to the same index
of string "l". Each time both indices match, the "count" increases by 1. By dividing "count" by the length of
"a", the percent match is calculated and is stored as "pr_bind".A while loop is then utilized to run a randomly
chosen number from 0-1 and if that number is less than or equal to the probability of not binding, the counter
"bind_time" increases by 1 and the boolean "bound" = FALSE. Once the random number is greater than than the
probability of not binding, the "self.iteration_counts" dictionary is appended with the key "init_bind" and the
value "bind_time". Print statements were used to test the accuracy of each component of the function.
210323
Once I transferred the code into the PyCharm environment, I noticed that the binding function would continually
run for too long. I realized that if there was a 0% binding chance, the loop would continuously run (but not in
Jupyter Lab which was strange). I added a print statement as a return telling the user that there is no chance
of the sequences to bind.
210327
I identified that the lymph1.paratope property is not functioning properly in the Main class binding function.
The paratope property transfers to the Main file correctly, confirmed by a print statement, however it does not
get recognized by the binding function even though ant1.epitope functions correctly. To fix this, I called
instances of the Antigen and Lymphocyte classes in the Main file and both properties epitope and paratope ran
properly, confirmed by print statements. The next issue I encountered was the infinite while loop if there was
no probability of binding between the two strings. To rectify this I added a break statement directly after the
beginning of the while loop so that if count == 0, the loop will not run and the Init_binding_time = 0.
210930
Today I began the immune response algorithm that runs an agent-based game for each individual in each lymphocyte
population against each individual in the antigen population. To execute this, a random number between 0 and 1
is drawn. If the number is greater than the fitness of the lymphocyte population, then the antigen wins and visa
versa. If one population wins, 1 is subtracted from the number of individuals in the opposing population. Thus
far, response times for accounting for each individual and also each population are being recorded.
211115
Updated the PAM likelihood selection where probabilistic chance is added to the process to insure that the same
outcome doesn't occur everytime. I also fixed an issue that I was having in the immune response function where
populations of n=0 were not being deleted and all population n values were set to 0. I had to change the way I
iterated through the populations dictionary and selected the populations where n=0. This is now fixed.
Bcell_Selection
210411
I made the choice to set the default exchange factor at 10, but this will be manipulated in the future. Another
assumption that is made that each individual in the initial lyphocyte population will begin with exact copies
of the paratope. Populations will be represented by a single individual for simplicity. Since each population is
being compared and not individuals, the individuals themselves do not need to be represented. The number of
individuals in each Antigen/Lymphocyte population will be user set and will be used in the Immune_Response
functions.
210912
I implemented the usage of the PAM-250 matrix by using a Pandas dataframe. The matrix was imported as a txt file
located in the Resources directory. Columns and row names of the dataframe correspond to each respective
amino acid.
210921
Had to go back and rewrite a good chunk of the clonal selection function because the substitution likelihoods
were not accurate for each paratope. The substitution likelihoods from the final population were copying
themselves into the other populations' paratopes. This was recified by creating a self.likelihood dictionary
where the likelihoods are only identified once. Since the likelihoods are always the same for each amino acid,
this is plausable. I also decided to substitute with the second maximum value likelihood because the maximum
value for an amino acid is always itself.
210925
Today I picked up from where I left last time. I had most of the clonal selection algorithm completed except for
the final product where maximum likelihoods are calculated and then amino acids are substituted out for another.
I completed this and am going to implement a break in the loop where if maximum affinity is reached, selection
will cease.
Parameter_Clustering
211017
Today I created the unsupervised learning hyper-parameter clustering for data analysis. So far, the parameters
have been addressed to lists according to their respective post-game lymphocyte populations. From here, one-hot-
-encoding will be used to assign coordinates to each lymphocyte population in 5-dimensional space. A clustering
algorithm (probably k-means) will be used to generate clusters of populations to distinguish the lowest and top
performing hyper-parameter combinations.
211115
I set up the k-means clustering algorithm, although this may not be the correct algorithm to use based on the
data. Once I collect my data, I will attempt the k-means clustering algorithm, but I may have to try others.
Arg_Parser
210912
I created a new file for argument parsing to set the parameters for the antigen and lymphocyte properties such as
population number, population size, antigen division rate, and antigen epitope. This will allow for a seamless
user experience when running the scripts. The argparser package was used for this implementation. I ran into an
issue where the Ant_lymph.py file would not call properties from Arg_Parser.py since it was outside of the Model
package directory. To resolve this I moved Ant_Lymph.py outside of the directory and will most likely do the
same with the others.
Testing:
Ant_Lymph
210331
Print statements were initially used to check that the lengths of the epitope and the paratope were equal. Today
I created a function that tests that instead of me counting each index. This utilizes the unittest package
and the self.assertEqual function. If the lengths of both properties are not equal then it will return the
message "len(a) != len(l)." Initially I noticed that if the test passed, there was no notification of passing
so I added a print statement to affirm that the test passed. In the near future I will make a test.py folder
and import all relevant data from the other files.
Main
211017
The immune response algorithm provided an error where the lymphocyte populations with n=0 could not be deleted
due to improper calling of the population_dict dictionary keys. This was fixed.
Data:
Collection
211201
The simulation was run for all combinations of the following parameters:
pop_num = [100, 100], step=10
pop_size = [1000, 10000], step=100
exchange_iter = [100, 1000], step=10
response_time = [100, 1000], step=10
The dataset was stored as a numpy array in an .npz file with *kwds="data"