Skip to content

FLsandell/BetaVulgaris_RandomForests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Logo

BetaVulgaris_RandomForests


About The Project

Analysis of genomic variants using random forests.

Getting Started

TThe code is available in three parts. The first script (RF.py) is examplary for how the models where trained. The second script (RF_input_iterations.py) is examplary for running the models with different train/test splits, and the final script (windows.py) showcases how feature importances were summarized using sliding windows. The code was designed to determine genetic regions that are decisive for separating two phenotypic groups based on variant calls. The input file is a 0|1|2 matrix generated from a VCF file using vcftools with the "--012" flag, where 0 denotes a homozygous reference position, 1 a heterozygous position, and 2 a homozygous alternative position.

The code can readily be modified to construct models that distinguish any two groups of phenotypes as long as their differentiation can be inferred from genomic variants.

Prerequisites

  • python3

The following python modules:

  • pandas

  • numpy

  • sklearn

  • matplotlib

About the ICB

If you are interested in our work you can find more information here and on X(twitter).

License

Copyright (c) 2024 Felix Sandell

Distributed under the MIT License.

(back to top)

Contact

Felix Leopold Sandell felix.sandell@boku.ac.at

(back to top)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages