-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathFromDataSimulationToEvaluatingEnDED.RMD
762 lines (656 loc) · 39.5 KB
/
FromDataSimulationToEvaluatingEnDED.RMD
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
1_d# From data simulation to evaluating EnDED
Ina Maria Deutschmann, 12th May 2020
EnDED (Environmentally-Driven Edge Detection) is a program to detect indirect associations in a network.
This script will simulate abundance data without and with noise, construct networks, apply EnDED, and evaluate EnDED
# Data simulation
The following script was run with the version R-3.5.0
## Parameters for data simulation
```{r}
NUMBER_SIMULATIONS = 1000 # Number of simulation datasets to evaluate EnDED on
NUM_SPECIES = 50 # Number of species
NUM_SAMPLES = 100 # Number of samples
SAMPLE_RESOLUTION = 0.5 # Step size of time series
TIME_BEGIN = 0 # Start of time series
TIME_END = (NUM_SAMPLES - 1) * SAMPLE_RESOLUTION # End of time series
SELF_INTERACTION_SCORE = -0.5 # score in interaction matrix on diagonal (controls max population size that an environment can sustain)
PEP = 30 # Percentage of positive edges
INTERACTION_PROB = 0.01 # Interaction probability
ENV_INITIAL_ABUNDANCE = 0.001 # Initial abundance of ENV included in interaction matrix
ENV_OPT_ENV = 0.5 # Optimal environmental value of ENV included in interaction matrix
ENV_MAX_GRO = 0.8 # Maximum growth rate of ENV included in interaction matrix
ENV_ECO_TOL = 0.5 # Ecological tolerance of ENV included in interaction matrix
T_SIGNAL_PERIODICITY = 12 # Signal periodicity of seasonal environmental function
OMEGA_SIG_FREQ = -2*pi/T_SIGNAL_PERIODICITY # Signal frequency of seasonal environmental function
PATH = "1_datatables/" # Folder to store simulated files
```
## Dependencies
load required packages:
```{r}
library(deSolve)
library(tidyverse)
library(igraph)
library(devtools)
install_github("hallucigenia-sparsa/seqtime")
library(seqtime)
```
## Time
time series starts with TIME_BEGIN and ends with TIME_END, the time resolution gives the step count of the time series
```{r}
t <- seq(from=TIME_BEGIN, to = TIME_END, by=SAMPLE_RESOLUTION)
```
## Environmental parameter
For the environmental factor the periodic function sinus is used and rounded to 3 digits.
```{r}
ENV <- function(t, digits = 3) { round(sin(OMEGA_SIG_FREQ*t), digits = 3) }
```
## Simulating datasets
```{r}
for( simulation_number in c(1:NUMBER_SIMULATIONS))
{
print(paste("# Simulation number ", simulation_number, sep =""))
## Initial Abundance: To generate intial abundances we use a function of `seqtime` called `generateAbundances` with mode 5, which generates uneven initial abundances without introducing zeros. Mode 5 uses the `bstick` function from `vegan`. With the function 'count' you can control the total number of individuals, but we will keep it to 1 to get relative abundance.
y <- generateAbundances(NUM_SPECIES, count = 1, mode = 5)
y_ENV <- c(y,ENV_INITIAL_ABUNDANCE)
## Growth Rates
# The growth rate, g, of each species depends on the same environmental variable. The dependency is explained through the function:
# g = f(env, max, opt, tol) = ( max^2 ) x ( exp(-(opt-env)^2 / ( 2 x tol^2 ) ) )
# with
# env = the environmental value that the species growth rate depends on,
# max = the species specific maximum growth rate that determines the amplitude of the growth-rate curve
# opt = the species specific optimal environmental value that determines the peak of the growth-rate curve, and
# tol = the species specific ecological tolerance determines the environmental range in which the species is able to grow, which determines the length of the growth-rate curve.
### Species specific parameters: Each species is defined through their optimal environmental value, maximum growth rate and ecological tolerance. All values are drawn from a uniform distribution.
# optimal environmental value
OPT_ENV = round( runif(NUM_SPECIES, min = min(ENV(t)), max = max(ENV(t))), digits = 3 )
# maximum growth rate
MAX_GRO = round( runif(NUM_SPECIES, min = 0.3, max = 1), digits = 3 )
# ecological tolerance
ECO_TOL = round( runif(NUM_SPECIES, min = 0.3, max = 1), digits = 3 )
#### With ENV
# optimal environmental value
OPT_ENV = c(OPT_ENV, ENV_OPT_ENV)
# maximum growth rate
MAX_GRO = c(MAX_GRO, ENV_MAX_GRO)
# ecological tolerance
ECO_TOL = c(ECO_TOL, ENV_ECO_TOL)
### Growth Rate Function: Given the above parameters, the growth rate of each species is a function that depends on the environmental value.
growth_rate_function <- function(env,max=1,opt=0,tol=1) {
exp(-1*((opt-env)^2)/(2*(tol^2)))*(max^2)}
get_growth_rates <- function (t, n = NUM_SPECIES+1) {
g <- growth_rate_function(ENV(t),MAX_GRO[1],OPT_ENV[1],ECO_TOL[1])
for(i in c(2:n))
{
g <- rbind(g,growth_rate_function(ENV(t),MAX_GRO[i],OPT_ENV[i],ECO_TOL[i]))
}
return(g)
}
## Lotka-Volterra Model
# Below is an adjusted generalized Lotka-Volterra model with:
# - t: time point
# - y: vector with current values of state variables
# - b: vector of growth rates determined through species specific growth rate functions that depend on an environmental factor
# - I: Interaction Matrix
glv_ENV <-function(t, y, I){
b = get_growth_rates(t) # vector of growth rates
dydt <- y*(b+I %*% y)
list(dydt)
}
## Interaction Matrix
### Interaction between species:
### The growth rates of the species are depended on the environment, and here there are also pairwaise interactions between the species.
### These interactions are determined through the Klemm-Eguiluz algorithm to generate a modular and scale-free interaction matrix with
### diagonal elements set to zero. Afterwards these diagonal entries, which represent the caring capacity, are set to a negative value.
I_interaction = generateA(NUM_SPECIES, "klemm", pep=PEP, c=INTERACTION_PROB, d=0)
diag(I_interaction) <- SELF_INTERACTION_SCORE
### One seasonal abiotic and one abiotic environmental dependency within the interaction matrix: The environmental dependency is a entry
### in the matrix, it "interacts" positively with one half and negatively with the other half. The interaction is directional, thus the
### species to ENV score is 0.
# same bioticENV values as above
com1 <- floor(NUM_SPECIES/2)
com2 <- ceiling(NUM_SPECIES/2)
abioticENV = c(round( runif(com1, min = -0.8, max = -0.2), digits = 3),round( runif(com2, min = 0.2, max = 0.8), digits = 3))
I_interaction <- cbind(I_interaction,abioticENV)
I_interaction <- rbind(I_interaction,c(rep(0,(NUM_SPECIES)),SELF_INTERACTION_SCORE))
## Simulate Abundances of Species in time
try(
{
abundance_table <- lsoda(y=y_ENV, times=t, func=glv_ENV, parms=I_interaction)
}
,silent = TRUE
)
# Save data
ROOTPATH <- paste(PATH, "Simulation_", simulation_number, sep = "")
dir.create(ROOTPATH)
## Parameters
s <- paste("# Parameters for data simulation\n# Number of species\nNUM_SPECIES = ", NUM_SPECIES,
"\n# Number of samples\nNUM_SAMPLES = ", NUM_SAMPLES,
"\n# Step size of time series\nSAMPLE_RESOLUTION = ", SAMPLE_RESOLUTION,
"\n# Start of time series\nTIME_BEGIN = ", TIME_BEGIN,
"\n# End of time series = (NUM_SAMPLES - 1) * SAMPLE_RESOLUTION\nTIME_END = ", TIME_END,
"\n# Score in interaction matrix on diagonal (controls max population size of a species that an environment can sustain)\nSELF_INTERACTION_SCORE = ", SELF_INTERACTION_SCORE,
"\n# Percentage of positive edges in interaction matrix\nPEP = ", PEP,
"\n# Interaction probability\n\tINTERACTION_PROB = ", ENV_INITIAL_ABUNDANCE,
"\n\toptimal environmental value = ", ENV_OPT_ENV,
"\n\tmaximum growth rate = ", ENV_MAX_GRO,
"\n\tecological tolerance = ", ENV_ECO_TOL,
"\n# ENV-function\nENV <- function(t, digits = 3) { round(-1*sin(pi*t/6), digits = 3) }",
"\n# Initial Abundance\ny <- generateAbundances(NUM_SPECIES, count = 1, mode = 5)",
"\n# Growth Rate Function\ngrowth_rate_function <- function(env,max,opt,tol) { exp(-1*((opt-env)^2)/(2*(tol^2)))*(max^2) }",
"\n# Growth Rate Parameters in file\n'growth_rates_parameters.txt'",
"\n# Interaction Matrix in files\n'InteractionMatrix_*.txt",
"\n# Abundance table in file\n'species_abundances_*.txt",
sep = "")
fileConn <-file( paste( ROOTPATH, "/parameters.txt", sep = "" ))
writeLines(s, fileConn)
close(fileConn)
## Environmental parameter
E <- cbind("ENV_seasonal",t(ENV(t)))
colnames(E) <- c("Name", paste("Sample_",c(1:length(t)),sep=""))
filename <- paste( ROOTPATH, "/env_values.txt", sep = "" )
write.table(E,file = filename, col.names = TRUE,row.names = FALSE,
quote = FALSE, sep = '\t', dec = '.')
## Growth Rates
G <- cbind( c(paste("Species_",c(1:NUM_SPECIES),sep=""), "ENV_matrix"), MAX_GRO, OPT_ENV, ECO_TOL)
colnames(G) <- c("Name", "Max_Growth_Rate", "Optimal_environment", "Ecological_tolerance")
filename <- paste( ROOTPATH, "/growth_rates_parameters.txt", sep = "" )
write.table(G, file = filename, col.names = TRUE, row.names = FALSE,
quote = FALSE, sep = '\t', dec = '.')
## Interaction Matrix
I <- cbind(c(paste("Species_",c(1:NUM_SPECIES),sep=""),"ENV_matrix"),I_interaction)
colnames(I) <- c("Name",paste("Species_",c(1:NUM_SPECIES),sep=""),"ENV_matrix")
filename <- paste( ROOTPATH, "/InteractionMatrix.txt", sep = "" )
write.table(I,file=filename,sep="\t",quote=FALSE,na = "NA",row.names=FALSE,col.names=TRUE)
### Edgelist
node1 <- rep(paste("Species_",c(1:NUM_SPECIES),sep=""), NUM_SPECIES)
node2 <- rep(paste("Species_",c(1:NUM_SPECIES),sep=""), each = NUM_SPECIES)
interaction_score <- rep(0, length(node1))
I <- I_interaction[c(1:NUM_SPECIES), c(1:NUM_SPECIES)]
for(i in which(I != 0))
{
interaction_score[i] <- I[i]
}
edgelist <- data.frame(node1, node2, interaction_score)
edgelist <- edgelist[which(edgelist$interaction_score != 0),]
edgelist <- edgelist[-which(edgelist$node1 == edgelist$node2),]
filename <- paste( ROOTPATH, "/Edgelist.txt", sep = "" )
write.table(edgelist,file=filename,sep="\t",quote=FALSE,na = "NA",row.names=FALSE,col.names=TRUE)
n = NUM_SPECIES + 1
node1 <- rep(c(paste("Species_",c(1:NUM_SPECIES),sep=""), "ENV_matrix"), n)
node2 <- rep(c(paste("Species_",c(1:NUM_SPECIES),sep=""), "ENV_matrix"), each = n)
interaction_score <- rep(0, length(node1))
for(i in which(I_interaction != 0))
{
interaction_score[i] <- I_interaction[i]
}
edgelist <- data.frame(node1, node2, interaction_score)
edgelist <- edgelist[which(edgelist$interaction_score != 0),]
edgelist <- edgelist[-which(edgelist$node1 == edgelist$node2),]
filename <- paste( ROOTPATH, "/Edgelist_withENV.txt", sep = "" )
write.table(edgelist,file=filename,sep="\t",quote=FALSE,na = "NA",row.names=FALSE,col.names=TRUE)
## Abundances of Species in time
A <- cbind(c(paste("Species_",c(1:NUM_SPECIES),sep=""),"ENV_matrix"),
round(t(abundance_table[,c(2:(NUM_SPECIES+2))]),digits=3))
colnames(A) <- c("Name", paste("Sample_",c(1:length(t)),sep=""))
filename <- paste( ROOTPATH, "/species_abundances_with_ENV_matrix.txt", sep = "" )
write.table(A,file = filename,col.names = TRUE,row.names = FALSE,
quote = FALSE, sep = '\t', dec = '.')
## Save R session
#filename = paste( ROOTPATH, "/DataSimulationSession.RData", sep = "" )
#save.image( file = filename )
}
```
# Network construction with eLSA
##############################################
For each of the simulation datasets, the following command was run with the version: elsa/81a2ee0
```{bash}
echo "Network Construction for simulation dataset ${SLURM_ARRAY_TASK_ID}"
cd 1_datatables/Simulation_${SLURM_ARRAY_TASK_ID}/
## Combining species abundances with environmental factors and commentaing out header
less species_abundances_with_ENV_matrix.txt > SpENV.txt
tail -n +2 env_values.txt >> SpENV.txt
sed -i '1 s/^/#/' SpENV.txt
## Setting going to output folder and setting path to file
cd ../../2_network_eLSA
X=../1_datatables/Simulation_${SLURM_ARRAY_TASK_ID}/SpENV.txt
## Network construction with eLSA
MY_SAMPLE_COUNT="$(head -n 1 $X | awk '{$1=""}1' | awk '{print NF}')"
lsa_compute $X NW_eLSA_${SLURM_ARRAY_TASK_ID}.txt -r 1 -s $MY_SAMPLE_COUNT -d 0 -p mix -x 2000 -n percentileZ
## Filter eLSA output for significant associations: p and q value < 0.001
head -n 1 NW_eLSA_${SLURM_ARRAY_TASK_ID}.txt > NW_eLSA_QP-001_${SLURM_ARRAY_TASK_ID}.txt
awk '{ if (($21<0.001) && ($10<0.001)){print} }' NW_eLSA_${SLURM_ARRAY_TASK_ID}.txt >> NW_eLSA_QP-001_${SLURM_ARRAY_TASK_ID}.txt
```
# EnDED
EnDED is a C++ program, for running a C++ program we used version gnu7/7.3.0
```{bash}
## Parameters
METHODS=SP,OL,II,DPI
INPUT_NW=../2_network_eLSA/NW_eLSA_QP-001_${SLURM_ARRAY_TASK_ID}.txt
INPUT_II_DPI_ABUND=../1_datatables/Simulation_${SLURM_ARRAY_TASK_ID}/SpENV.txt
OUTPUT_TRIPLET=EnDED_NW_Triplet_${SLURM_ARRAY_TASK_ID}.txt
OUTPUT_NW=EnDED_NW_${SLURM_ARRAY_TASK_ID}.txt
## Running EnDED
./../../EnDED/build/EnDED --input_network_file $INPUT_NW --input_node_col 1,2 --methods ${METHODS} --SP_colnum_interaction_score 3 --OL_colnum_interactionlength_startX_startY 8,6,7 --II_DPI_abundance_file $INPUT_II_DPI_ABUND --output_network_file $OUTPUT_NW --output_triplet_info $OUTPUT_TRIPLET --do_pre_jointP_comp --II_permutation_iteration 1000
## Renaming log file to carry data simulation ID (otherwise it would have time stamp)
echo log_* > LOG_EnDED_NW_eLSA_QP-05_${SLURM_ARRAY_TASK_ID}.txt
less log_* >> LOG_EnDED_NW_eLSA_QP-05_${SLURM_ARRAY_TASK_ID}.txt
rm log_*
```
# Evaluation
The evaluation was done through exact matching, a code implemented in C++ for which we used version gnu7/7.3.0
```{bash}
## Setting input files
EnDED_NW=../3_EnDED/EnDED_NW_${SLURM_ARRAY_TASK_ID}.txt
Evaluation_Edgelist=../1_datatables/Simulation_${SLURM_ARRAY_TASK_ID}/Edgelist.txt
## evaluation (exact string matching)
../../network_evaluation/evaluate_nw --nw_file EnDED_NW_eLSA_QP-001.txt --evaluation_file ../../1_datatables/Edgelist_bioticENV.txt --col_x_y_nw 1-2 --col_x_y_evaluation 1-2 --only_evaluation_indicator --output_filename evaluated_network_eLSA_QP-001.txt
## clean up and renaming of logflie
rm evaluation_info.txt
cp logfile.txt LOG_Evaluation_eLSA_QP-001.txt
rm logfile.txt
```
## Overview table
The evaluation script added an extra column to the network file indicating if the edge was found in the true edgelist. Using the same list, and the evaluated network file, an overview table is constructed.
### Initialize Dataframe
```{r}
dt <- data.frame( ID = character(),
method = character(),
num_species = numeric(),
num_nodes = numeric(),
num_edges = numeric(),
num_possible_edges = numeric(),
TP = numeric(),
TN = numeric(),
FP = numeric(),
FN = numeric(),
TPR = numeric(),
TNR = numeric(),
PPV = numeric(),
ACC = numeric(),
ACCnoTN = numeric(),
classified_indirect = numeric(),
classified_not_indirect = numeric(),
classified_indirect_was_TP = numeric(),
classified_indirect_was_FP = numeric(),
classified_not_indirect_was_TP = numeric(),
classified_not_indirect_was_FP = numeric(),
not_classified_was_TP = numeric(),
not_classified_was_FP = numeric(),
listTP = character(),
listTN = character(),
stringsAsFactors = FALSE )
```
### Functions
```{r}
compute_values <- function( df, vector_found, trueinteractions, num_possible_edges ){
G <- graph_from_edgelist(as.matrix(df), directed = FALSE)
num_nodes <- vcount(G)
num_edges <- ecount(G)
TP <- sum(vector_found, na.rm = TRUE)
FP <- dim(df)[1] - TP
FN <- dim(trueinteractions)[1] - TP
TN <- num_possible_edges - TP - FP - FN
P <- TP + FN
N <- TN + FP
TPR <- TP / P
TNR <- TN / N
PPV <- TP / (TP + FP)
ACC <- (TP + TN) / (P + N)
ACCnoTN <- 0.5 * ( TPR + PPV )
return( c( num_nodes, num_edges, num_possible_edges, TP, TN, FP, FN, TPR, TNR, PPV, ACC, ACCnoTN ) )
}
count_classfied_edges <- function( nw, method, threshold = 0){
if( method == "beforeEnDED" ){
classified_indirect <- classified_not_indirect <- classified_indirect_was_TP <- classified_indirect_was_FP <- classified_not_indirect_was_TP <- classified_not_indirect_was_FP <- not_classified_was_TP <- not_classified_was_FP <- "NA"
listTP <- "NA"
listTN <- "NA"
}else if( method == "Combi_EnDED" ){
classified_indirect <- length(which(nw$COMBI_SP_II_DPI == "0"))
classified_not_indirect <- length(which(nw$COMBI_SP_II_DPI == "1"))
classified_indirect_was_TP <- length(which(nw$COMBI_SP_II_DPI == "0" & nw$found == "1"))
classified_indirect_was_FP <- length(which(nw$COMBI_SP_II_DPI == "0" & ( is.na(nw$found) | nw$found == "0" ) ) )
classified_not_indirect_was_TP <- length(which(nw$COMBI_SP_II_DPI == "1" & nw$found == "1"))
classified_not_indirect_was_FP <- length(which(nw$COMBI_SP_II_DPI == "1" & ( is.na(nw$found) | nw$found == "0" ) ) )
not_classified_was_TP <- length(which( is.na(nw$COMBI_SP_II_DPI) & nw$found == "1"))
not_classified_was_FP <- length(which( is.na(nw$COMBI_SP_II_DPI) & ( is.na(nw$found) | nw$found == "0" ) ) )
listTP <- paste(which(nw$COMBI_SP_II_DPI == "0" & ( is.na(nw$found) | nw$found == "0" ) ), collapse = ";")
listTN <- paste(which(nw$COMBI_SP_II_DPI == "1" & nw$found == "1"), collapse = ";")
}else if( method == "Combi_EnDED_elsa" ){
classified_indirect <- length(which(nw$COMBI_SP_OL_II_DPI == "0"))
classified_not_indirect <- length(which(nw$COMBI_SP_OL_II_DPI == "1"))
classified_indirect_was_TP <- length(which(nw$COMBI_SP_OL_II_DPI == "0" & nw$found == "1"))
classified_indirect_was_FP <- length(which(nw$COMBI_SP_OL_II_DPI == "0" & ( is.na(nw$found) | nw$found == "0" ) ) )
classified_not_indirect_was_TP <- length(which(nw$COMBI_SP_OL_II_DPI == "1" & nw$found == "1"))
classified_not_indirect_was_FP <- length(which(nw$COMBI_SP_OL_II_DPI == "1" & ( is.na(nw$found) | nw$found == "0" ) ) )
not_classified_was_TP <- length(which( is.na(nw$COMBI_SP_OL_II_DPI) & nw$found == "1"))
not_classified_was_FP <- length(which( is.na(nw$COMBI_SP_OL_II_DPI) & ( is.na(nw$found) | nw$found == "0" ) ) )
listTP <- paste(which(nw$COMBI_SP_OL_II_DPI == "0" & ( is.na(nw$found) | nw$found == "0" ) ), collapse = ";")
listTN <- paste(which(nw$COMBI_SP_OL_II_DPI == "1" & nw$found == "1"), collapse = ";")
}else if( method == "SignPattern_EnDED" ){
classified_indirect <- length(which(nw$SignPattern == "0"))
classified_not_indirect <- length(which(nw$SignPattern == "1"))
classified_indirect_was_TP <- length(which(nw$SignPattern == "0" & nw$found == "1"))
classified_indirect_was_FP <- length(which(nw$SignPattern == "0" & ( is.na(nw$found) | nw$found == "0" ) ) )
classified_not_indirect_was_TP <- length(which(nw$SignPattern == "1" & nw$found == "1"))
classified_not_indirect_was_FP <- length(which(nw$SignPattern == "1" & ( is.na(nw$found) | nw$found == "0" ) ) )
not_classified_was_TP <- length(which( is.na(nw$SignPattern) & nw$found == "1"))
not_classified_was_FP <- length(which( is.na(nw$SignPattern) & ( is.na(nw$found) | nw$found == "0" ) ) )
listTP <- paste(which(nw$SignPattern == "0" & ( is.na(nw$found) | nw$found == "0" ) ), collapse = ";")
listTN <- paste(which(nw$SignPattern == "1" & nw$found == "1"), collapse = ";")
}else if( method == "InteractionInformation_p05_EnDED" ){
classified_indirect <- length(which(nw$InterationInformation < 0 & nw$II_p_value < 0.05))
classified_not_indirect <- length(which(nw$InterationInformation > 0 | (nw$InterationInformation < 0 & nw$II_p_value >= 0.05)))
classified_indirect_was_TP <- length(which(nw$InterationInformation < 0 & nw$II_p_value < 0.05 & nw$found == "1"))
classified_indirect_was_FP <- length(which(nw$InterationInformation < 0 & nw$II_p_value < 0.05 & ( is.na(nw$found) | nw$found == "0" ) ) )
classified_not_indirect_was_TP <- length(which((nw$InterationInformation > 0 | (nw$InterationInformation < 0 & nw$II_p_value >= 0.05)) & nw$found == "1"))
classified_not_indirect_was_FP <- length(which((nw$InterationInformation > 0 | (nw$InterationInformation < 0 & nw$II_p_value >= 0.05)) & ( is.na(nw$found) | nw$found == "0" ) ) )
not_classified_was_TP <- length(which( is.na(nw$InterationInformation) & nw$found == "1"))
not_classified_was_FP <- length(which( is.na(nw$InterationInformation) & ( is.na(nw$found) | nw$found == "0" ) ) )
listTP <- paste(which(nw$InterationInformation < 0 & nw$II_p_value < 0.05 & ( is.na(nw$found) | nw$found == "0" ) ), collapse = ";")
listTN <- paste(which((nw$InterationInformation > 0 | (nw$InterationInformation < 0 & nw$II_p_value >= 0.05)) & nw$found == "1"), collapse = ";")
}else if( method == "InteractionInformation_noPval_EnDED" ){
classified_indirect <- length(which(nw$InterationInformation < 0))
classified_not_indirect <- length(which(nw$InterationInformation > 0))
classified_indirect_was_TP <- length(which(nw$InterationInformation < 0 & nw$found == "1"))
classified_indirect_was_FP <- length(which(nw$InterationInformation < 0 & ( is.na(nw$found) | nw$found == "0" ) ) )
classified_not_indirect_was_TP <- length(which(nw$InterationInformation > 0 & nw$found == "1"))
classified_not_indirect_was_FP <- length(which(nw$InterationInformation > 0 & ( is.na(nw$found) | nw$found == "0" ) ) )
not_classified_was_TP <- length(which( is.na(nw$InterationInformation) & nw$found == "1"))
not_classified_was_FP <- length(which( is.na(nw$InterationInformation) & ( is.na(nw$found) | nw$found == "0" ) ) )
listTP <- paste(which(nw$InterationInformation < 0 & ( is.na(nw$found) | nw$found == "0" ) ), collapse = ";")
listTN <- paste(which(nw$InterationInformation > 0 & nw$found == "1"), collapse = ";")
}else if( method == "DataProcessingInequality_EnDED" ){
classified_indirect <- length(which(nw$DataProcessingInequality_indirect == "0"))
classified_not_indirect <- length(which(nw$DataProcessingInequality_indirect == "1"))
classified_indirect_was_TP <- length(which(nw$DataProcessingInequality_indirect == "0" & nw$found == "1"))
classified_indirect_was_FP <- length(which(nw$DataProcessingInequality_indirect == "0" & ( is.na(nw$found) | nw$found == "0" ) ) )
classified_not_indirect_was_TP <- length(which(nw$DataProcessingInequality_indirect == "1" & nw$found == "1"))
classified_not_indirect_was_FP <- length(which(nw$DataProcessingInequality_indirect == "1" & ( is.na(nw$found) | nw$found == "0" ) ) )
not_classified_was_TP <- length(which( is.na(nw$DataProcessingInequality_indirect) & nw$found == "1"))
not_classified_was_FP <- length(which( is.na(nw$DataProcessingInequality_indirect) & ( is.na(nw$found) | nw$found == "0" ) ) )
listTP <- paste(which(nw$DataProcessingInequality_indirect == "0" & ( is.na(nw$found) | nw$found == "0" ) ), collapse = ";")
listTN <- paste(which(nw$DataProcessingInequality_indirect == "1" & nw$found == "1"), collapse = ";")
}else if( method == "Overlap" ){
classified_indirect <- length(which(nw$Overlap > threshold))
classified_not_indirect <- length(which(nw$Overlap <= threshold))
classified_indirect_was_TP <- length(which(nw$Overlap > threshold & nw$found == "1"))
classified_indirect_was_FP <- length(which(nw$Overlap > threshold & ( is.na(nw$found) | nw$found == "0" ) ) )
classified_not_indirect_was_TP <- length(which(nw$Overlap <= threshold & nw$found == "1"))
classified_not_indirect_was_FP <- length(which(nw$Overlap <= threshold & ( is.na(nw$found) | nw$found == "0" ) ) )
not_classified_was_TP <- length(which( is.na(nw$Overlap) & nw$found == "1"))
not_classified_was_FP <- length(which( is.na(nw$Overlap) & ( is.na(nw$found) | nw$found == "0" ) ) )
listTP <- paste(which(nw$Overlap > threshold & ( is.na(nw$found) | nw$found == "0" ) ), collapse = ";")
listTN <- paste(which(nw$Overlap <= threshold & nw$found == "1"), collapse = ";")
}
return( c(classified_indirect, classified_not_indirect, classified_indirect_was_TP, classified_indirect_was_FP,
classified_not_indirect_was_TP, classified_not_indirect_was_FP, not_classified_was_TP, not_classified_was_FP,
listTP, listTN) )
}
new_entry <- function(ID, method, num_species, num_possible_edges, df, vector_found, trueinteractions, nw, threshold = 0 ){
if( method == "Overlap" ){
return( c( ID, paste("Overlap_", threshold, "_EnDED", sep =""), num_species, compute_values(df, vector_found, trueinteractions, num_possible_edges), count_classfied_edges( nw, method, threshold ) ) )
}else{
return( c( ID, method, num_species, compute_values(df, vector_found, trueinteractions, num_possible_edges), count_classfied_edges( nw, method, threshold ) ) )
}
}
```
### creating overview table for simulated networks
```{r}
for( networkID in 1:1000 ){
# True interactions are direct, but used as indirect. Thus, multiple edges need to be removed
filename <- paste(PATH,"/1_datatables/Edgelist.txt",sep="")
trueinteractions <- read.table( filename, header = TRUE, stringsAsFactors = FALSE )[,-3] # contains multiple edges
for(i in c(1:dim(trueinteractions)[1]))
{
n <- as.character(trueinteractions$node1[i])
if( n < as.character(trueinteractions$node2[i]) )
{
trueinteractions$node1[i] <- as.character(trueinteractions$node2[i])
trueinteractions$node2[i] <- n
}
}
trueinteractions <- unique(trueinteractions)
# Read in network (after EnDED and evaluation) and remove environmental edges
filename <- paste(PATH, "/4_Evaluation/evaluated_NW_", networkID, ".txt", sep = "" )
nw <- read.table( filename, header = TRUE, stringsAsFactors = FALSE)
if( length(grep("ENV",nw$X)) > 0 ){
nw <- nw[-grep("ENV",nw$X),]
}
if( length(grep("ENV",nw$Y)) > 0 ){
nw <- nw[-grep("ENV",nw$Y),]
}
# Set parameters
ID <- networkID
num_species <- 50
num_possible_edges <- num_species * ( num_species - 1 ) / 2
# Before any method
dt <- rbind(dt, setNames(as.data.frame(t(new_entry(ID, "beforeEnDED", num_species, num_possible_edges, nw[,c(1:2)], nw$found, trueinteractions, nw ))), names(dt)))
# EnDED
df <- nw[,c(1:2)]
v <- nw$found
if( length(which(nw$COMBI_SP_OL_II_DPI == "0")) > 0 ){ df <- nw[-which(nw$COMBI_SP_OL_II_DPI == "0"),c(1:2)]; v <- nw$found[-which(nw$COMBI_SP_OL_II_DPI == "0")]; }else{ df = nw[,c(1:2)]; v <- nw$found; }
dt <- rbind(dt, setNames(as.data.frame(t(new_entry(ID, "Combi_EnDED_elsa", num_species, num_possible_edges, df, v, trueinteractions, nw ))), names(dt)))
if( length(which(nw$SignPattern == "0")) > 0 ){ df <- nw[-which(nw$SignPattern == "0"),c(1:2)]; v <- nw$found[-which(nw$SignPattern == "0")]; }else{ df = nw[,c(1:2)]; v <- nw$found; }
dt <- rbind(dt, setNames(as.data.frame(t(new_entry(ID, "SignPattern_EnDED", num_species, num_possible_edges, df, v, trueinteractions, nw ))), names(dt)))
if( length(which(nw$InterationInformation < 0 & nw$II_p_value < 0.05)) > 0 ){ df <- nw[-which(nw$InterationInformation < 0 & nw$II_p_value < 0.05),c(1:2)]; v <- nw$found[-which(nw$InterationInformation < 0 & nw$II_p_value < 0.05)]; }else{ df = nw[,c(1:2)]; v <- nw$found; }
dt <- rbind(dt, setNames(as.data.frame(t(new_entry(ID, "InteractionInformation_p05_EnDED", num_species, num_possible_edges, df, v, trueinteractions, nw ))), names(dt)))
if( length(which(nw$InterationInformation < 0)) > 0 ){ df <- nw[-which(nw$InterationInformation < 0),c(1:2)]; v <- nw$found[-which(nw$InterationInformation < 0)]; }else{ df = nw[,c(1:2)]; v <- nw$found; }
dt <- rbind(dt, setNames(as.data.frame(t(new_entry(ID, "InteractionInformation_noPval_EnDED", num_species, num_possible_edges, df, v, trueinteractions, nw ))), names(dt)))
if( length(which(nw$DataProcessingInequality_indirect == "0")) > 0 ){ df <- nw[-which(nw$DataProcessingInequality_indirect == "0"),c(1:2)]; v <- nw$found[-which(nw$DataProcessingInequality_indirect == "0")]; }else{ df = nw[,c(1:2)]; v <- nw$found; }
dt <- rbind(dt, setNames(as.data.frame(t(new_entry(ID, "DataProcessingInequality_EnDED", num_species, num_possible_edges, df, v, trueinteractions, nw ))), names(dt)))
for( o in c(10, 20, 30, 40, 50, 60, 70, 80, 90) )
{
if( length(which(nw$Overlap > o)) > 0 ){ df <- nw[-which(nw$Overlap > o),c(1:2)]; v <- nw$found[-which(nw$Overlap > o)]; }else{ df = nw[,c(1:2)]; v <- nw$found; }
dt <- rbind(dt, setNames(as.data.frame(t(new_entry(ID, "Overlap", num_species, num_possible_edges, df, v, trueinteractions, nw, threshold = o ))), names(dt)))
}
# Save R session and overview table
filename <- paste(PATH,"/Evaluation_Overview.R", sep = "")
save.image(filename)
filename <- paste(PATH,"/Overview_Evaluation.tsv", sep = "")
write.table(dt, filename, col.names = TRUE, row.names = FALSE, sep = "\t", dec = ".", quote = FALSE)
}
```
# How many TP have methods in common
```{r}
dt_temp <- read.table(paste(PATH,"/Overview_Evaluation.tsv", sep=""), header = TRUE, sep = "\t")
```
# Example
```{r}
library(VennDiagram)
# Visualization: Venn Diagram
# SP, OL, II, DPI, COMBI
d1 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==1 & dt_temp$method=="SignPattern_EnDED")]), split = ";")[[1]]
d2 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==1 & dt_temp$method=="Overlap_60_EnDED")]), split = ";")[[1]]
d3 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==1 & dt_temp$method=="InteractionInformation_p05_EnDED")]), split = ";")[[1]]
d4 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==1 & dt_temp$method=="DataProcessingInequality_EnDED")]), split = ";")[[1]]
d5 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==1 & dt_temp$method=="Combi_EnDED_elsa")]), split = ";")[[1]]
input_data <- list(SP=d1, OL=d2, II=d3, DPI=d4, COMBI=d5);
# Creates output
output_image_file <- paste(PATH,"/venn_TP_SP_OL_II_DPI_COMBI.tiff",sep="");
# Configs for the diagram
title <- "Overlap of true positives between methods";
fill_color <- c("mediumorchid4","azure1","gray24","darkolivegreen3","orange"); # for 5 sets
# Invoke drawing of the venn Diagram
venn.diagram(input_data,fill=fill_color,filename=output_image_file,width=5000,height=3000,main=title);
# percent
output_image_file <- paste(PATH,"/venn_TP_SP_OL_II_DPI_percent.tiff",sep="");
venn.diagram(input_data,fill=fill_color,filename=output_image_file,width=5000,height=3000,main=title,print.mode='percent');
```
```{r}
# Heat-map
library(purrr)
library(RVenn)
library(ggplot2)
# SP, OL, II, DPI, COMBI
d1 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==1 & dt_temp$method=="SignPattern_EnDED")]), split = ";")[[1]]
d2 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==1 & dt_temp$method=="Overlap_60_EnDED")]), split = ";")[[1]]
d3 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==1 & dt_temp$method=="InteractionInformation_p05_EnDED")]), split = ";")[[1]]
d4 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==1 & dt_temp$method=="DataProcessingInequality_EnDED")]), split = ";")[[1]]
d5 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==1 & dt_temp$method=="Combi_EnDED_elsa")]), split = ";")[[1]]
input_data <- list(SP=d1, OL=d2, II=d3, DPI=d4, COMBI=d5);
toy = Venn(input_data)
setmap(toy, element_clustering = TRUE, set_clustering = TRUE)
```
# all
```{r}
dt_temp <- read.table(paste(PATH,"/Overview_Evaluation_ListTP_TN.tsv",sep=""), header = TRUE, sep = "\t")
# How much overlap is between methods regarding true positive and true negative?
dt_overlap <- data.frame( ID = character(),
# overlap of TP - only 1 method
TP_SP = numeric(),
TP_OL = numeric(),
TP_II = numeric(),
TP_DPI = numeric(),
# overlap of TP - two methods
TP_SP_OL = numeric(),
# overlap of TP - three methods
TP_SP_OL_II = numeric(),
TP_SP_OL_DPI = numeric(),
# overlap of TP - all five methods
TP_SP_OL_II_DPI_COMBI = numeric(),
# other
TP_other = numeric(),
stringsAsFactors = FALSE )
for(i in c(min(dt_temp$ID):max(dt_temp$ID)))
{
d1 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==i & dt_temp$method=="SignPattern_EnDED")]), split = ";")[[1]]
d2 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==i & dt_temp$method=="Overlap_60_EnDED")]), split = ";")[[1]]
d3 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==i & dt_temp$method=="InteractionInformation_p05_EnDED")]), split = ";")[[1]]
d4 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==i & dt_temp$method=="DataProcessingInequality_EnDED")]), split = ";")[[1]]
d5 <- strsplit(as.character(dt_temp$listTP[which(dt_temp$ID==i & dt_temp$method=="Combi_EnDED_elsa")]), split = ";")[[1]]
input_data <- list(SP=d1, OL=d2, II=d3, DPI=d4, COMBI=d5);
set_all = Venn(input_data)
# in SP but not in others
set_IN = d1
set_NOT_IN = unite(set_all, c("OL","II","DPI","COMBI"))
new_TP_SP = length(setdiff(set_IN, set_NOT_IN)) # in first but not in second
# in OL but not in others
set_IN = d2
set_NOT_IN = unite(set_all, c("SP","II","DPI","COMBI"))
new_TP_OL = length(setdiff(set_IN, set_NOT_IN)) # in first but not in second
# in II but not in others
set_IN = d3
set_NOT_IN = unite(set_all, c("SP","OL","DPI","COMBI"))
new_TP_II = length(setdiff(set_IN, set_NOT_IN)) # in first but not in second
# in DPI but not in others
set_IN = d4
set_NOT_IN = unite(set_all, c("SP","OL","II","COMBI"))
new_TP_DPI = length(setdiff(set_IN, set_NOT_IN)) # in first but not in second
# in SP and OL but not in others
set_IN = overlap(set_all, c("SP","OL"))
set_NOT_IN = unite(set_all, c("II","DPI","COMBI"))
new_TP_SP_OL = length(setdiff(set_IN, set_NOT_IN)) # in first but not in second
# in SP, OL, and II but not in others
set_IN = overlap(set_all, c("SP","OL","II"))
set_NOT_IN = unite(set_all, c("DPI","COMBI"))
new_TP_SP_OL_II = length(setdiff(set_IN, set_NOT_IN)) # in first but not in second
# in SP, OL, and DPI but not in others
set_IN = overlap(set_all, c("SP","OL","DPI"))
set_NOT_IN = unite(set_all, c("II","COMBI"))
new_TP_SP_OL_DPI = length(setdiff(set_IN, set_NOT_IN)) # in first but not in second
# in all
new_TP_SP_OL_II_DPI_COMBI = length(overlap(set_all, c("SP","OL","II","DPI","COMBI")))
# total TP?
total = length(unite(set_all, c("SP","OL","II","DPI","COMBI")))
# other TP?
otherTP = total - new_TP_SP - new_TP_OL - new_TP_II - new_TP_DPI - new_TP_SP_OL - new_TP_SP_OL_II - new_TP_SP_OL_DPI - new_TP_SP_OL_II_DPI_COMBI
new <- data.frame( ID = i,
# unique TP in one method
TP_SP = new_TP_SP/total*100,
TP_OL = new_TP_OL/total*100,
TP_II = new_TP_II/total*100,
TP_DPI = new_TP_DPI/total*100,
# overlap of TP - two methods
TP_SP_OL = new_TP_SP_OL/total*100,
# overlap of TP - three methods
TP_SP_OL_II = new_TP_SP_OL_II/total*100,
TP_SP_OL_DPI = new_TP_SP_OL_DPI/total*100,
# overlap of TP - all five methods
TP_SP_OL_II_DPI_COMBI = new_TP_SP_OL_II_DPI_COMBI/total*100,
TP_other = otherTP/total*100,
stringsAsFactors = FALSE )
dt_overlap <- rbind(dt_overlap, new)
}
for(i in colnames(dt_overlap)[-1])
{
print(i)
print(summary(dt_overlap[,i]))
}
write.table(dt_overlap, paste(PATH,"/Overview_ListTP_overlap.tsv",sep=""),
col.names = TRUE, row.names = FALSE, sep = "\t", quote = FALSE)
```
[1] "TP_SP"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.2093 0.3155 0.4533 3.6585
[1] "TP_OL"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.1099 0.2373 0.3405 1.9950
[1] "TP_II"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.6522 1.2707 1.4106 1.9751 6.0364
[1] "TP_DPI"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.1135 0.3371 0.4436 0.6326 2.5901
[1] "TP_SP_OL"
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.856 12.246 14.902 15.023 17.478 29.925
[1] "TP_SP_OL_II"
Min. 1st Qu. Median Mean 3rd Qu. Max.
19.10 29.54 32.64 32.84 36.15 49.55
[1] "TP_SP_OL_DPI"
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.561 7.076 8.899 9.115 10.804 22.071
[1] "TP_SP_OL_II_DPI_COMBI"
Min. 1st Qu. Median Mean 3rd Qu. Max.
22.39 32.14 35.58 35.50 38.60 48.57
[1] "TP_other"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4338 3.3449 4.8731 5.1148 6.5583 15.3563
# Generating a simulated dataset with noise
I already simulated the datasets above. Now I want to add external noise to them. For that I use the Poisson distribution. The parameter lamda is set to the abundance value. Thus, the new abundance with noise is randomly drawn from the Poisson distribution with lamda = abundance without noise. Then, the above scripts were applied to construct networks, apply EnDED, and evaluate EnDED.
## Parameters
```{r}
PATH <- "1_datatables"
INPUT <- paste(PATH, "SpENV.txt", sep = "")
OUTPUT <- paste(PATH, "SpENV_PoissonNoise.txt", sep = "")
COL_START <- 2
COL_END <- 101
ROW_START <- 1
ROW_END <- 51
```
## Read in abundance table
```{r}
abd <- read.table(INPUT, head = TRUE, tab = "\t", dec = ".", stringsAsFactors = FALSE)
```
## Add noise
```{r}
for(i in c(ROW_START, ROW_END))
{
for(j in c(COL_START, COL_END))
{
noise <- rpois(n=1, lambda=1000*abd[i,j])/1000
abd[i,j] <- noise
}
}
```
## Save abundance+noise table
```{r}
write.table(abd, OUTPUT, col.names = TRUE, row.names = FALSE, quote = FALSE, tab = "\t", dec = ".")
```
## Combine species abundance (with noise) and environmental variables
```{bash}
cp SpENV_doublePoissonNoise.txt temp.txt
head -n 1 SpENV.txt > SpENV_doublePoissonNoise.txt
less temp.txt >> SpENV_doublePoissonNoise.txt
rm temp.txt
```
# References
## R packages
- deSolve: Karline Soetaert, Thomas Petzoldt, R. Woodrow Setzer (2010). Solving Differential Equations in R: Package deSolve. Journal of Statistical Software, 33(9), 1--25. URL http://www.jstatsoft.org/v33/i09/ DOI 10.18637/jss.v033.i09
- seqtime: Karoline Faust, Franziska Bauchinger, Sophie de Buyl and Leo Lahti (NA). seqtime: Time Series Analysis of Sequencing Data. R package version 0.1.1. https://github.com/hallucigenia-sparsa/seqtime
- tidyverse: Hadley Wickham (2017). tidyverse: Easily Install and Load the 'Tidyverse'. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse
- igraph: Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.org
## Network construction with eLSA
- Xia, L.C., Steele, J.A., Cram, J.A., Cardon, Z.G., Simmons, S.L., Vallino, J.J., Fuhrman, J.A., Sun, F.:Extended local similarity analysis (elsa) of microbial community and other time series data with replicates.BMC Systems Biology5(2), 15 (2011). doi:10.1186/1752-0509-5-S2-S1511.
- Xia, L.C., Ai, D., Cram, J., Fuhrman, J.A., Sun, F.: Efficient statistical significance approximation for localsimilarity analysis of high-throughput time series data. Bioinformatics29(2), 230???237 (2012).doi:10.1093/bioinformatics/bts66812.
## C++ code for evaluation with exact matching
- "network_evaluation.cpp"