Skip to content

elsci-io/multinomial-selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multinomial selection in Java Maven Central

Suppose you're picking 100 random balls from a bag. The probabilities of picking Red, Blue, Green are 0.95, 0.03, 0.02. What's the most probable combination? What are k most probable combinations? What if we had to pull from yet another bag of cubes of different colors - then what are possible combinations of balls and cubes that we can pull? This is a problem of Multinomial Selection and it shows up in different contexts. E.g. in Mass Spectrometry, given a Molecular Formula $C_{20}H_{28}Cl_4N_2O_4$, we need to know the most probable isotope combination.

How to use

This library is a generic implementation that doesn't depend on a particular application. The application-specific libraries can wrap this one (like isotope-distribution). Here's how to iterate over all possible combinations starting with the most probable:

// Alphabets represent an Element (a single Bag of balls) - they
// have Symbols (Red, Blue, Green balls) with different probabilities.
//
// In Chemistry terminology an Alphabet is an Element, while Symbol
// is a particular isotope (well, its probability): 
Alphabet c = new Alphabet("C", .99, .1);
Alphabet cl = new Alphabet("Cl", .75, .25);
// Once we have our "Bags", we want to tell how many Symbols (Balls)
// of each Alphabet (Bag) we want to pull (which the map represents):  
Map<Alphabet, Integer> wordSpec = new HashMap<>();
wordSpec.put(c, 15);
wordSpec.put(cl, 5);
// We have all the data, it's time to iterate over all possibilities,
// starting with the most probable:
Iterator<Word> it = WordIteratorFactory.create(new WordSpec(wordSpec));
while(it.hasNext()) {
    Word next = it.next();
    System.out.println(next.probability + ": " + next.symbols);
}

Maven coordinates

Use Maven to download the library (latest version: Maven Central):

<dependency>
    <groupId>io.elsci.multinomial-selection</groupId>
    <artifactId>multinomial-selection</artifactId>
    <version>LATEST VERSION</version>
</dependency>

Implementation notes

In mathematical terms all possible combinations and their probabilities can be generated using a multinomial:

$$ \text{Distribution of probabilities}=(a_{1}+a_{2}+...+a_{n})^A (b_{1}+b_{2}+...+b_{n})^B $$

where

  • $a1$, $a2$ are probabilities of 1st and 2nd elements of the first set (say.. probability of blue and red ball) and $b_1$, $b_2$ are probabilities of the elements of the 2nd set (say.. probabilities of blue and red cubes)
  • $A$ and $B$ is the number of repeats of these elements (taking 2 balls and 3 cubes).

Algorithm and its complexity

The implementation resembles (a little) the one described in Fast Exact Computation of the k Most Abundant Isotope Peaks with Layer-Ordered Heaps. Most likely we're somewhat slower.

The algorithm complexity is hard to calculate exactly, but the overall picture is linearithmic when $k$ is relatively small (which is our main use case). More precisely (and sorry for abusing $O$ here):

$$ O(k^2 \times a^2 \times s^2 \times (log(k \times s) + s)) $$

where each of the params are usually relatively small:

  • $k$ is number of words we select
  • $a$ is the number of alphabets
  • $s$ is the number of symbols in an alphabet (for simplicity sake let's say that each alphabet is of the same size)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages