Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



55 Commits

Repository files navigation


Accuracy and precision cannot be used interchangeably, the former being true to intention (degree of closeness of measured value to true value) while the latter is true to itself (degree of closeness of repeated measured values)

Probability and likelihood are different terms; the former is finding the chance of outcomes given a data distribution, the latter is finding the most likely distribution given the outcomes.

DESCRIPTIVE STATISTICS: For inference of (smaller) sample data

INFERENTIAL STATISTICS: For inference of (larger) population

CENTRAL LIMIT THEOREM The central limit theorem (CLT) states that, given a sufficiently large sample size, the sampling distribution of the mean for a variable approximates a normal distribution regardless of that variable’s distribution in the population.

CLT is vital for two reasons — the normality assumption and the precision of the estimates.

The normality assumption is vital for parametric hypothesis tests of the mean. Consequently, you might think that these tests are not valid when the data are non-normally distributed. However, if your sample size is large enough, CLT kicks in and produces sampling distributions that approximate a normal distribution. This fact allows you to use these hypothesis tests even when your data are non-normally distributed as long as your sample size is large enough.

The 'precision of estimates' property of CLT becomes relevant when using a sample to estimate the mean of an entire population. With a larger sample size, your sample mean is more likely to be close to the real population mean. In other words, your estimate is more precise.

Depending on your goal and the data, you select a test.

If the goal is to quantify an association between two groups, we check Pearson correlation for parametric data, Spearman correlation for non-parametric data. If the goal is to predict a target from one or more variables, we perform simple regression (two variables) and multiple regression (more than two variables) for parametric data. If we have to compare unpaired (independent) groups, we perform unpaired T-test (or one-way ANOVA for 2+ groups) for parametric data, and Mann-Whitney test (2 groups) for non-parametric data.

Parametric test:-

Assumption: Data has normal distribution


Non-parametric test:-

No assumption


HYPOTHESIS TESTS: Depending on datatypes and number of samples, hypothesis testing is carried out.

Traditional testing is called Non-Bayesian. It is how often an outcome happens over repeated runs (repeat sampling) of the experiment. It’s an objective view of whether an experiment is repeatable. Bayesian hypothesis testing is a subjective view of the same - it takes into account how much faith you have in your results. It includes prior knowledge about the data and personal beliefs about the results.


There's a data classification based on privacy, security, risk management and regulatory compliance: public, confidential, restricted and internal.

For more:



Mode: Number that occurs most often in a dataset.

Median: Middle number/value when a dataset is ordered from least to greatest.



Read more on the first 4 moments - mean, variance, skewness, and kurtosis from this excellent blog post:

A violin plot shows the shape (density distribution) of data which boxplot does not, and it must be used to explore skewed data.


There are power transformations that variables need to undergo if they follow either right-skewed or left-skewed distributions. Parametric machine learning models like linear regression assume real-valued variables in the input data have Gaussian distributions. Non-parametric models like kNN do not have this assumption, yet often are more reliable and perform better when the input variables have Gaussian distributions. As such, variables with skewed distributions (Gaussian-like) or different distributions altogether need transformation. Power transforms refer to a class of techniques utilizing a power function (like logarithm or exponent) to make the probability distribution of a variable Gaussian.

Gaussian (normal) distribution:

There're 2 popular approaches for automatic power transforms:

• Box-Cox Transform • Yeo-Johnson Transform

They find a parameter (lambda) that best transforms a variable for example, lambda = -1 is a reciprocal transform, lambda = 0 is a log transform, lambda = 0.5 is a square root transform.

MEASURES OF DISPERSION: Range, quartile deviation and interquartile range (quartile deviation is half of the interquartile range), variance, standard deviation



✅ It is mentionworthy that the standard error of a sample mean is an estimate of how far the sample mean is likely to be from a population mean, whereas the standard deviation of the sample is the degree to which individuals within the sample differ from the sample mean. Hence, standard error and standard deviation are different terms.

For more:




Discriminative models leverage conditional probability distributions while Generative models leverage non-conditional ones.



Basic Statistics for Data Sciences







No releases published


No packages published