Imprecise probability and population genetics in OCaml using Owl and other libraries. Experimental work in progress.
(This page needs revision.)
Some of this will be reorganized in the future. Some of it was written when I was learning OCaml. (I am still learning OCaml.)
A few resources for learning about imprecise probability or population genetics are listed below.
The modules I consider most useful here make use of what I call "distlists", which represent sequences of sets of probability distributions. These are probabilities of frequencies of organism traits in a population of organisms. For example, these might be frequencies of (type) alleles at a genetic locus, with two (token) alleles for each organism, so that the total number of (token) alleles is 2N where N the number of organisms in the population. (For "type" and "token" see Type-token distinction; these terms come from philosophy but should be a bit clearer for novices than ways of saying the same thing that are common in population genetics.)
Sets of probability distributions are often known as "credal sets" in the imprecise probability literature, although sometimes that term is used more narrowly. (However, "credal" comes from "credence", i.e. probability as degree of belief as in the Bayesian tradition. I think of the the probabilities here as objective probabilities, which are usually called "chances" in contemporary philosophy of probability.)
Credal sets here are implemented as regular OCaml lists of Owl row
vector matrices (i.e. matrices with dimension 1xn). Distlists are
implemented as lazy lists of credal sets, so a distlist is a lazy list
of regular lists of matrices. Currently, I'm using LazyList
and
List
from the Batteries
Included library.
Note that in OCaml, a file named xyz.ml typically defines a module
named Xyz
.
Some of the following might be a little bit out of date at any particular time.
For modeling finite sets of transition probabilities generated by Wright-Fisher models with natural selection.
Uses distlists from Tranmats
.
The credal sets here are supposed to represent collections of finite, discrete probability distributions. Typically, each credal set in a distlist will contain an expontentially increasing number of distributions.
setchains.ml: Implementations of algorithms in ch. 2 of Markov Set-Chains, by Darald J. Hartfiel, Springer 1998
These model continuous sets of stochastic transition matrices that fall
within a matrix "interval", i.e. all stochastic matrices such that each
element x
is s.t. l <= x <= h
, where l
and h
are corresponding
elements of a low matrix L
and a high matrix H
.
You can also start from an initial probability "interval", i.e. all
stochastic vectors such that each element x
is s.t. l <= x <= h
,
where l
and h
are correspoding elements of low and high vectors.
Note that the low and high vectors are not typically stochastic vectors,
nor are the low and high matrices typically transition matrices.
(Stochastic vectors here are row vectors, and it's each row of a matrix that
sums to 1, so multiplication of a vector and a matrix typically
happens with the vector on the left.)
Uses distlists from Tranmats
by way of Wrightfisher
.
The credal sets in the distlists are all of length 2, each pair representing a tight estimate on the upper and lower bounds of a continuous "interval" of probability distributions.
Creates data files and PDF files from distlists.
miscellaneous code, potentially useful now or in the future, including:
Once you have installed OCaml 4.06.1 and the necessary libraries (a
big job), you should be able to build executable files for the
following by entering make
in the root directory of the repo.
The exe files should show up in _build/default/src/bin.
(Note that you're not restricted to imprecise-probability models. You can use this program to create plots for standard, precise-probability Wright-Fisher diploid models.)
(I last compiled the executable using OCaml 4.06.1 and jbuilder 1.0+beta20. jbuilder has subsequently been renamed to dune, and there have been some significant changes in both OCaml, dune, and the libraries on which my code depends, so you may have to install old versions of packages to get this to work without modification.)
Miscellaneous notes. Don't expect to find any systematic documentation here.
SIPTA: The Society for Imprecise Probability: Theories and Applications.
Imprecise Probabilities by Seamus Bradley in the Stanford Encyclopedia of Philosophy.
Terrence L. Fine's papers. (There are many other important writers on imprecise probability (IP), but most of the work is focused on IP as an extension of Bayesian probability. Fine and his collaborators have done the most work on objective imprecise probability, which is what particularly interest me.)
A few good books:
- A handbook-style introductory survey, Introduction to Imprecise Probabilities edited by Augustin et al. (Remember that when it comes to math, one person's introduction is another person's advanced textbook.)
- The perhaps easier Lower Previsions by Troffaes and de Cooman. (The same comment applies.)
- A philosophical classic, The Enterprise of Knowledge by Isaac Levi.
- The philosophical and mathematical classic of the field, Statistical Reasoning with Imprecise Probaiblities by Peter Walley.
- Markov Set-Chains by Darald J. Hartfiel. A beautiful little book on methods for extending Markov chains for one kind of imprecise probability (or uncertainty, which is how Hartfiel frames it).
There's much more.
Where to start? This is a huge area in evolutionary biology.
Samir Okasha's article in the Stanford Encyclopedia of Philosophy provides an overview of basic ideas as well as history and philosophical issues.
My favorite textbooks include:
-
Population Genetics: A Concise Guide by John H. Gillespie. A great way to get into the mindset of population genetics. The first and second editions are similar, but there is valuable material in each that's not in the other. Includes some topics that are rarely covered in introductory texts.
-
Elements of Evolutionary Genetics by Charlesworth and Charlesworth. Yes, it's a thick book, but that's because the Charlesworths take care to explain ideas clearly and to explore many details and subtleties. Beautiful.
-
Evolutionary Theory: Mathematical and Conceptual Foundations by Sean H. Rice. A philosophically sensitive book by a mathematical biologist. Rice provides a unique but illuminating conceptual organization for population genetics and related areas.