Examples

Example of computing GRM and eigenvectors from .bed files

In your own data analysis, you may need to computed the grm and eigenvectors yourself so here are the steps to do that so that you will just be able to run the above examples afterward. Great resrources are here and here.


# Starting with .bed files
# 1. Calculate the GRM (note you can filter for alleles with unacceptably low MAF's at this step if not already done in Plink)
gcta64 --bfile Example/geno --maf 0.05 --make-grm-bin --out Example/grm

# 2. Calculate eigenvectors
gcta64  --grm Example/grm --pca 20  --out Example/pcas

Output

A .csv with heritability estimate (h2), standard error (SE), phenotype used (Pheno), number of prinicpal components controlled for (PCs), list of covariates controled for separated by a "+" (Covariates), computational time (Time for analysis(s)), and peak memory (Memory Usage (Mb)) are also provided. See the results included in the Example folder.

Example

Included in this repo is an example dataset for users to practice with and check results to the included results file.

Data description

Example data is included from this paper which simulated 10,000 SNP's for 5,000 individuals with the following files

File Name	Brief description	File contents
pheno.phen	phenotype file	First two columns contain FID, IID, one phenotype column (though there can be more)
covar.txt	covariate file	First two columns contain FID, IID, one covariate column (though there can be more)
geno (.bed, .bim, .fam)	suite of files containing genotypes in PLINK format	For more info see PLINK
grm (.grm.id, .grm.bin, .grm.N.bin)	suite of files GRM's.	For more info see Making a GRM
pcas (.eigenval, .eigenvec, .log)	suite of files containing PC's.	For more info see GCTA PCA
AdjHE_results.csv	results from example listed below
Argfile.json	File containing all arguments to run AdjHE exampel
Batch_Arg_file.txt	File containing information how to create sets of batch files

It is to be carefully noted that the order of the individuals in all the files (phenotype, covariate, GRM) have to be the same. The example dataset includes simulated phenotypes from the linked paper with a true heritability of 0.8. Also note that the phenotypes in the .fam file are not used.

Running with included example data

First, change into the "Basu_herit" directory then run the following commands that specify the arguments

module load python
prefix='EXAMPLE/grm' #We partitioned the GRM into 200 parts.
id='/EXAMPLE/ukbiobank.grm.id'
pheno='Example/phen.pheno'
PC='Example/pcas.eigenvec'
covar='Example/covar.csv'
out='Example/results.csv'

python Estimate.py --prefix ${prefix} --PC ${PC} --npc 10  --covar ${covar} --pheno ${pheno} --mpheno 1 --out ${out}

This should result in estimates for heritability stored in a .csv with the estimated heritability. For this dataset, the simulated heritability was 80%. Notice that the estimate is sensitive to the number of Prinicipal components included in the model since the data was simulated to have population stratification. The covariates don't have much of an influence on the estimates since they were not included in the simulation of this dataset. Compare your results with the [results included in the Example folder](https://github.com/coffm049/Basu_herit/blob/master/Example/results.csv.

Running Example with argfile

python Estimate.py --argfile Example/Arg_file.json

Note that this is running the same example method as the previous example, only in this case, all of the arguments are contained within the Arg_file.txt file. This helps with reproducibility and creating batch scripts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly