-
Notifications
You must be signed in to change notification settings - Fork 0
Examples
In your own data analysis, you may need to computed the grm and eigenvectors yourself so here are the steps to do that so that you will just be able to run the above examples afterward. Great resrources are here and here.
# Starting with .bed files
# 1. Calculate the GRM (note you can filter for alleles with unacceptably low MAF's at this step if not already done in Plink)
gcta64 --bfile Example/geno --maf 0.05 --make-grm-bin --out Example/grm
# 2. Calculate eigenvectors
gcta64 --grm Example/grm --pca 20 --out Example/pcas
A .csv with heritability estimate (h2), standard error (SE), phenotype used (Pheno), number of prinicpal components controlled for (PCs), list of covariates controled for separated by a "+" (Covariates), computational time (Time for analysis(s)), and peak memory (Memory Usage (Mb)) are also provided. See the results included in the Example folder.
Included in this repo is an example dataset for users to practice with and check results to the included results file.
Example data is included from this paper which simulated 10,000 SNP's for 5,000 individuals with the following files
File Name | Brief description | File contents |
---|---|---|
pheno.phen | phenotype file | First two columns contain FID, IID, one phenotype column (though there can be more) |
covar.txt | covariate file | First two columns contain FID, IID, one covariate column (though there can be more) |
geno (.bed, .bim, .fam) | suite of files containing genotypes in PLINK format | For more info see PLINK |
grm (.grm.id, .grm.bin, .grm.N.bin) | suite of files GRM's. | For more info see Making a GRM |
pcas (.eigenval, .eigenvec, .log) | suite of files containing PC's. | For more info see GCTA PCA |
AdjHE_results.csv | results from example listed below | |
Argfile.json | File containing all arguments to run AdjHE exampel | |
Batch_Arg_file.txt | File containing information how to create sets of batch files |
It is to be carefully noted that the order of the individuals in all the files (phenotype, covariate, GRM) have to be the same. The example dataset includes simulated phenotypes from the linked paper with a true heritability of 0.8. Also note that the phenotypes in the .fam file are not used.
First, change into the "Basu_herit" directory then run the following commands that specify the arguments
module load python
prefix='EXAMPLE/grm' #We partitioned the GRM into 200 parts.
id='/EXAMPLE/ukbiobank.grm.id'
pheno='Example/phen.pheno'
PC='Example/pcas.eigenvec'
covar='Example/covar.csv'
out='Example/results.csv'
python Estimate.py --prefix ${prefix} --PC ${PC} --npc 10 --covar ${covar} --pheno ${pheno} --mpheno 1 --out ${out}
This should result in estimates for heritability stored in a .csv with the estimated heritability. For this dataset, the simulated heritability was 80%. Notice that the estimate is sensitive to the number of Prinicipal components included in the model since the data was simulated to have population stratification. The covariates don't have much of an influence on the estimates since they were not included in the simulation of this dataset. Compare your results with the [results included in the Example folder](https://github.com/coffm049/Basu_herit/blob/master/Example/results.csv.
python Estimate.py --argfile Example/Arg_file.json
Note that this is running the same example method as the previous example, only in this case, all of the arguments are contained within the Arg_file.txt file. This helps with reproducibility and creating batch scripts.