Probabilistic modeling using PyStan with demonstrative case study experiments from Christopher Bishop's Model-based Machine Learning.
Important
The first run of all */infer.py
files will be slow since the PM model will be built afresh and stored as a pickle
file. Subsequent runs will cache this pickle
file.
When building a new model configuration, be sure to remove or relocate the corresponding *.pkl
file(s) so it doesn't run the cache instead.
Tip
An elaborate case study description can be found in MBML Book, Chapter 2.
- Candidates take a multiple-choice test comprising 5 choices per question, with exactly one right answer per question.
- Each question is associated with a set of skills (one or more), that forms a part of the given dataset.
- Goal: Determine which skills each candidate has, and with what probability, given their answers in the test.
- Dataset: Skill ground truth and response data for 22 candidates, across 48 questions, to assess 7 skills, are available as CSV files in the data directory.
The following binary heatmap represents the skills, one or more, assessed by each of the 48 questions in the dataset.
The solution is implemented incrementally. A probabilstic model is formulated that makes the following initial set of assumptions about the data and problem:
- A candidate has either mastered each skill or not.
- Before seeing any test results, it is equally likely that a candidate does or doesn’t have any particular skill.
- If a candidate has all of the skills needed for a question, then they are likely to make a mistake once in ten times -- a 90% right answer probability.
- If a candidate doesn’t have all the skills needed for a question, they will pick an answer at random. Hence, there’s a one in five chance that they get the question right -- a 20% right answer probability, assuming a uniform guessing distribution of course!
- Whether the candidate gets a question right depends only on what skills that candidate has, and not on anything else.
Note
Assumptions C and D essentially give rise to a model parameter each, and they can be tuned to better mimic a "realistic" scenario. Accordingly, going from solution steps A through to D below, the model complexities increase and the assumptions become more relaxed.
- The non-vectorized models are primitive implementations based on small subsets of data.
- They capture all possible candidate response combinations.
- They provide a way to intuitively ensure that the model is foundationally right, and that the assumptions and inference workflows are valid.
Link to three-question model implementation.
- Modeled for 2 skills assessed through 3 questions.
- These are skills 1 and 7; and questions 1 through 3 on the skill-question heatmap.
- Factors and evaluates skill probabilities for all possible response combinations.
The following factor graph represents the model and message flow for the three-question scenario.
Link to four-question model implementation.
- Modeled for 2 skills assessed through 4 questions.
- These are skills 1 and 7; and questions 1 through 3 on the skill-question heatmap.
- Factors and evaluates skill probabilities for all possible response combinations.
The following factor graph represents the model and message flow for the four-question scenario.
Link to baseline implementation on complete dataset.
- This is the first realisitic models that uses the complete dataset for inference.
- It still carries the original modeling assumptions.
- The implementation uses matrix operations for message passing and inference to manage larger datasets effectively and to optimize for a GPU.
The following three-feature heatmap represents the correct and incorrect reponses of the 22 candidates to the 48 questions.
In the figure,
- White blocks represent questions answered correctly, where colored boxes represent incorrect responses.
- The colors tag each incorrect response with the skills required to answer them.
- The heatmap helps visually and qualitatively assess, which candidate likely lacks what skills.
The inferred skill probabilities are compared against the ground truth data on skills possessed by each of the 22 candidates in the grayscale heatmap below.
Link to baseline applied on ancestrally sampled data.
- Ancestral sampling uses the model assumptions of underlying probability distributions of the factors to sample a synthetic dataset.
- In essence, the synthetic dataset closely represents the model assumptions.
- Hence, the results may be used to evaluate the model assumptions and inference method.
The inferred skill probabilities based on sampled response data are compared against the ground truth data on skills possessed by each of the 22 candidates in the grayscale heatmap below.
Comparing results from the models in steps B and C, we see that:
- The obtained results match the ground truth quite well, and much better than the baseline model.
- Since an idealistic dataset representing the model assumptions correctly gives rise to good inference, the inference methods and steps are valid and correct.
So the problem, in turn, lies with the modeling assumptions or the parameters therein; they do not represent the real dataset well. The next steps will involve sampling on select portions of the factor graph while freezing the rest to identify which parameters/assumptions give rise to inaccuracies, and modifying them. This is called ancestral sampling.
Link to improved implementation on complete dataset.
- Selective ancestral sampling for analysis (code not included) suggests that guess probability values are not very realistic; Candidates are able to guess correct answers more often than once in ten attempts.
- To improve the probabilistic model, the guess probabilities are no longer assumed to be constant.
- Instead, the guess probability for each question is inferred through message passing and belief propagation in the undirected factor graph.
- Specificially, assumption D is modified as follows:
If a candidate doesn’t have all the skills needed for a question, they will pick an answer at random. The probability of getting a question right, called the guess probability, is inferred from data.
The inferred skill probabilities, when applying learnt guess probabilities, are compared against baseline performance and the ground truth data on skills possessed by each of the 22 candidates in the grayscale heatmap below.
A substantial improvement in the inference is evident after learning the guess probabilities. Further improvements can be made, for instance, by learning the “know probabilities" and by targetedly diagnosing other assumptions of the model.
Note
In solution step D, the ground truth is not used for inference. Rather, all possible combinations of skill sets are generated and belief-propagated to infer the posterior for guess probabilities.