Skip to content

Latest commit

 

History

History
90 lines (75 loc) · 3.17 KB

README.md

File metadata and controls

90 lines (75 loc) · 3.17 KB

bitepi

bitepi is a python wrapper around the BitEpi project found at https://github.com/aehrc/BitEpi. It provides a pandas interface for identification of epistasis interactions. It exposes a single Epistasis class, through which the analysis can be performed by calling compute_epistasis.

Input is two arrays, which can be lists, numpy arrays or pandas dataframes.

sample_array contains mappings of sample names to case (1) or control (0). Note that the header is ignored for numpy arrays and pandas dataframes, and should not be present in python lists.

sample case/control
S1 0
S2 1
S3 1
S4 0
S5 0

genotype_array contains the genotypes of each sample at each SNP, with a 0, 1 and 2 representing 0|0, 0|1 and 1|1 respectively. Headers are used to match samples to the sample_array, but the first column's header is ignored.

SNP S1 S2 S3 S4 S5
snpA 0 0 2 1 0
snpB 2 1 2 1 2
snpC 0 1 1 2 1
snpD 0 2 2 2 1
snpE 1 1 1 0 1

The sets of samples do not need to match exactly, unless Epistasis is called with strict_intersect=True. If the sample sets do not match, analysis is done on the intersect.

The output will be a dictionary, with metric codes e.g. "IG.1" as the keys and pandas dataframes as the values.

import bitepi
sample_array = [
    ['S1', 0],
    ['S2', 1],
    ['S3', 1],
    ['S4', 0],
    ['S5', 0],
]
genotype_array = [
    ['SNP', 'S1', 'S2', 'S3', 'S4', 'S5'],
    ['snpA', 0, 0, 2, 1, 2],
    ['snpB', 2, 1, 2, 1, 2],
    ['snpC', 0, 0, 1, 1, 1],
    ['snpD', 2, 2, 1, 2, 1],
    ['snpE', 0, 1, 1, 1, 2],
]
epistasis = bitepi.Epistasis(
    genotype_array=genotype_array,
    sample_array=sample_array,
)
interactions = epistasis.compute_epistasis(
    sort=True,
    best_ig=True,
)['best_ig']

print(interactions)

This should return:

    SNP     SNP_P    PAIR_P  TRIPLET_P  QUADLET_P    SNP_IG   PAIR_IG  TRIPLET_IG  QUADLET_IG  PAIR TRIPLET_1 TRIPLET_2 QUADLET_1 QUADLET_2 QUADLET_3
0  snpA  1.707692  2.000000   2.109091        0.0  1.187692  0.266667    0.109091         0.0  snpE      snpB      snpE      snpA      snpA      snpA
1  snpB  1.642424  1.909091   2.109091        0.0  1.122424  0.266667    0.200000         0.0  snpC      snpC      snpE      snpA      snpA      snpA
2  snpC  1.641026  1.909091   2.109091        0.0  1.121026  0.266667    0.200000         0.0  snpB      snpD      snpE      snpA      snpA      snpA
3  snpD  1.642424  1.909091   2.109091        0.0  1.122424  0.175758    0.200000         0.0  snpE      snpC      snpE      snpA      snpA      snpA
4  snpE  1.733333  2.000000   2.109091        0.0  1.213333  0.266667    0.200000         0.0  snpA      snpC      snpD      snpA      snpA      snpA

For higher order interactions (p3, ig3, p4 and ig4) Epistasis may take several minutes to run, depending on the number of SNPs. If more information is required when running, the logging level can be increased to logging.INFO or logging.DEBUG. logging.DEBUG will provide the greatest detail, including logging from within the binary.

import logging
logging.root.setLevel(logging.DEBUG)