-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathDESIGN
25 lines (19 loc) · 790 Bytes
/
DESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Densities
# Pipeline
Design for processing the gene / transposon files.
Input gene / transposon files, output density files.
Calculate the multiple TE densities for each gene, with a sweep of window values.
## Steps
The pipeline is a generic split / apply / combine.
Each density is independent, however, the densities are accumulated (summed) at the end.
0. Input gene / TE files; input window length
1. Preprocess
- chunkify gene / TE based on chromosome
- list sub-gene / sub-TE pairs
2. Split wrt gene names / Merge wrt TE overlap
- for each sub-gene / sub-TE pair
- for each window
- start workers
- request overlap for each gene name (handling failed requests!)
- merge worker results (sums of overlaps )
- calculate densities, write to file