mcbizmod
is a package meant to help create advanced business case opportunities leveraging distributions.
- Zuccarelli, Eugenio (ZuccarelliE@aetna.com)
- Goldberg, Eli (GoldbergE@cvshealth.com)
In your environment, run
pip install mcbizmod
To run unit tests:
python3 tests/<test_name>.py
Oftentimes when people make a business cases, they make static or fixed assumptions about the behavior. While convenient, this is a bad idea. Here's where distributions come in.
Ultimately, each parameter isn't static; it's a distribution of possibilities for which we need to determine the fraction of times that we'll fail on a PLAN level and need to pay fees back.
In many real-world applications, we have to deal with complex probability distributions on complicated high-dimensional spaces. On rare occasions, it is possible to sample exactly from the distribution of interest, but typically exact sampling is difficult. Further, high-dimensional spaces are very large, and distributions on these spaces are hard to visualize, making it difficult to even guess where the regions of high probability are located. As a result, it may be challenging to even design a reasonable proposal distribution to use with importance sampling.
Markov chain Monte Carlo (MCMC) is a sampling technique that works well in many situations like this. However, this is NOT a conditional distribution (or Gibbs) sampling tool!! This is for several reasons:
-
We usually don't have enough information to set an appropriate prior.
-
Where we do, we could set that prior. However, we don't know the conditional effect on the distribution from the most critical variables, namely engagement and the treatment effect. Thus, it's likely better to assume absolute ignorance rather than create a partially broken conditional probability chain. This mimicks typical business cases (which obviously don't use markov-chain-based distribution sampling methods).
To make this simple task a bit easier, we've built a basic dataclass constructor to track the assumptions made and make your business cases repeatable, inspectable, and (more) accurate. Or at least, wrong in a quantifiable way :).
-
Pull your population directly from data whenever possible.
-
Where possible, use the whole distribution.
- Data is data. It's wild, but you benefit from using the whole distribution because IT'S data!
-
When in doubt, use lognormal distributions to estimate.
- Most biologic phenomena follow a lognormal distribution (this came from a colleague of mine lognormal and bioscience).
- Before contributing to this CVS Health sponsored project, you will need to sign the associated Contributor License Agreement.
- Follow the PEP-8 syntax guide for your code, specifically focused on the following:
- Function names should reflect usage instead of implementation.
- Function and variable names should be of the
lowercase_with_underscore
style. - Globally-used names should be of the
UPPERCASE_WITH_UNDERSCORE
style.
- Minimize dependencies.
- Where possible, avoid list comprehensions. They have a smell.
- Leverage the power of logging (although we don't, here).
- Most of all: Keep it simple and smartly objective.