-
Notifications
You must be signed in to change notification settings - Fork 30
Fit a Sample To a Distribution
Suppose you have a bunch of numbers (heights, failure times, whatever) that you believe are distributed according to a probability distribution, but you don't know the parameters of that distribution. How do you go about finding those parameters?
Here's one thing not to do (and that I, as a non-statistician scientist, did in the past without knowing better): bin the values into a histogram, estimate the error bar on each bin count as sqrt(n), use the parameterized PDF at each bin mid-point to as a model prediction for the bin count, and do a least-squares fit to find the parameters that best fit the bin counts. Given enough data, this procedure will converge to the correct parameter values, but it is onerous, inefficient, dependent on binning choices, and mis-weights low-count bins.
The right thing to do is use maximum likelihood estimation to find the fit parameters directly from the sample data, without binning. Meta.Numerics can do this for you.
Let's generate some synthetic Weibull-distributed data to work with:
using System;
using System.Collections.Generic;
using Meta.Numerics.Distributions;
Random rng = new Random(7);
WeibullDistribution distribution = new WeibullDistribution(3.0, 1.5);
List<double> sample = new List<double>();
for (int i = 0; i < 500; i++) sample.Add(distribution.GetRandomValue(rng));
Here's some code that finds the best fit Weibull and lognormal parameters for our data:
using Meta.Numerics.Statistics;
WeibullFitResult weibull = sample.FitToWeibull();
Console.WriteLine($"Best fit scale: {weibull.Scale}");
Console.WriteLine($"Best fit shape: {weibull.Shape}");
Console.WriteLine($"Probability of fit: {weibull.GoodnessOfFit.Probability}");
LognormalFitResult lognormal = sample.FitToLognormal();
Console.WriteLine($"Best fit mu: {lognormal.Mu}");
Console.WriteLine($"Best fit sigma: {lognormal.Sigma}");
Console.WriteLine($"Probability of fit: {lognormal.GoodnessOfFit.Probability}");
Notice that the Weibull fit parameters agree, within uncertainties, with the Weibull parameters which were used to generate the data. Notice also that, even though the Weibull and lognormal distribution shapes are similar, our goodness-of-fit tests indicate that the Weibull fits much better.
Here is the list as of the writing of this tutorial: Bernoulli, Beta, exponential, Gamma, Gumbel, lognormal, normal, Rayleigh, Wald (aka inverse normal), Weibull.
No problem. As long as you can write a factory method that, given a parameter dictionary, produces a ContinuousDistribution, Meta.Numerics can do maximum likelihood estimation to find the best-fit parameters. Pretend for a moment that there isn't a dedicated Weibull fit method. Here is the code that you would write to do a Weibull fit:
var result = sample.MaximumLikelihoodFit(parameters => {
return (new WeibullDistribution(parameters["Scale"], parameters["Shape"]));
},
new Dictionary<string, double>() { {"Scale", 1.0}, {"Shape", 1.0}}
);
foreach(Parameter parameter in result.Parameters) {
Console.WriteLine($"{parameter.Name} = {parameter.Estimate}");
}
They are estimates of the standard deviation of the distribution with which the estimates will vary if the many independent samples of the same size are produced and fit.
They are Kolmogorov-Smirnov tests of the tests of the best-fit distribution against the input data. They are not corrected for the fact that distributions were produced from the data, so the P-values are likely to be inflated for small samples. If you get a barely acceptable P-value with a small sample, it's likely that the fitted distribution does not, in fact, perfectly describe your data.
Since they are maximum likelihood estimates, they are at least asymptotically unbiased and efficient.
In cases where the finite-sample-size bias is known, we typically correct for it in our dedicated methods.
- Project
- What's New
- Installation
- Versioning
- Tutorials
- Functions
- Compute a Special Function
- Bessel Functions
- Solvers
- Evaluate An Integral
- Find a Maximum or Minimum
- Solve an Equation
- Integrate a Differential Equation
- Data Wrangling
- Statistics
- Analyze a Sample
- Compare Two Samples
- Simple Linear Regression
- Association
- ANOVA
- Contingency Tables
- Multiple Regression
- Logistic Regression
- Cluster and Component Analysis
- Time Series Analysis
- Fit a Sample to a Distribution
- Distributions
- Special Objects
- Linear Algebra
- Polynomials
- Permutations
- Partitions
- Uncertain Values
- Extended Precision
- Functions