From a80c4a173bfe1acfbe51018f5d1d36e06f4865b7 Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 26 Apr 2024 15:09:45 +1000 Subject: [PATCH 01/10] hide yf.download output --- lectures/heavy_tails.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md index 639e6b1d..a0680009 100644 --- a/lectures/heavy_tails.md +++ b/lectures/heavy_tails.md @@ -4,7 +4,7 @@ jupytext: extension: .md format_name: myst format_version: 0.13 - jupytext_version: 1.14.5 + jupytext_version: 1.16.1 kernelspec: display_name: Python 3 (ipykernel) language: python @@ -169,7 +169,12 @@ This equates to daily returns if we set dividends aside. The code below produces the desired plot using Yahoo financial data via the `yfinance` library. ```{code-cell} ipython3 -s = yf.download('AMZN', '2015-1-1', '2022-7-1')['Adj Close'] +:tags: [hide-output] +data = yf.download('AMZN', '2015-1-1', '2022-7-1') +``` + +```{code-cell} ipython3 +s = data['Adj Close'] r = s.pct_change() fig, ax = plt.subplots() From 536a511292c0e26d5d493fc3741f67b4c8b21929 Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 26 Apr 2024 15:31:30 +1000 Subject: [PATCH 02/10] reformat display math, add mystnb figure tags --- lectures/heavy_tails.md | 145 +++++++++++++++++++++++++++++++++------- 1 file changed, 121 insertions(+), 24 deletions(-) diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md index a0680009..533c8cbf 100644 --- a/lectures/heavy_tails.md +++ b/lectures/heavy_tails.md @@ -61,10 +61,10 @@ To explain this concept, let's look first at examples. The classic example is the [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution), which has density $$ - f(x) = \frac{1}{\sqrt{2\pi}\sigma} - \exp\left( -\frac{(x-\mu)^2}{2 \sigma^2} \right) - \qquad - (-\infty < x < \infty) +f(x) = \frac{1}{\sqrt{2\pi}\sigma} +\exp\left( -\frac{(x-\mu)^2}{2 \sigma^2} \right) +\qquad +(-\infty < x < \infty) $$ @@ -78,6 +78,12 @@ We can see this when we plot the density and show a histogram of observations, as with the following code (which assumes $\mu=0$ and $\sigma=1$). ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Histogram of observations + name: hist-obs +--- fig, ax = plt.subplots() X = norm.rvs(size=1_000_000) ax.hist(X, bins=40, alpha=0.4, label='histogram', density=True) @@ -101,6 +107,12 @@ X.min(), X.max() Here's another view of draws from the same distribution: ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Histogram of observations + name: hist-obs2 +--- n = 2000 fig, ax = plt.subplots() data = norm.rvs(size=n) @@ -174,6 +186,12 @@ data = yf.download('AMZN', '2015-1-1', '2022-7-1') ``` ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Daily Amazon returns + name: dailyreturns-amzn +--- s = data['Adj Close'] r = s.pct_change() @@ -194,7 +212,18 @@ Several of observations are quite extreme. We get a similar picture if we look at other assets, such as Bitcoin ```{code-cell} ipython3 -s = yf.download('BTC-USD', '2015-1-1', '2022-7-1')['Adj Close'] +:tags: [hide-output] +data = yf.download('BTC-USD', '2015-1-1', '2022-7-1') +``` + +```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Daily Bitcoin returns + name: dailyreturns-btc +--- +s = data['Adj Close'] r = s.pct_change() fig, ax = plt.subplots() @@ -211,6 +240,12 @@ The histogram also looks different to the histogram of the normal distribution: ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Histogram (Normal vs Bitcoin returns) + name: hist-normal-btc +--- r = np.random.standard_t(df=5, size=1000) fig, ax = plt.subplots() @@ -274,10 +309,6 @@ like We return to these points [below](https://intro.quantecon.org/heavy_tails.html#why-do-heavy-tails-matter). - - - - ## Visual comparisons In this section, we will introduce important concepts such as the Pareto distribution, Counter CDFs, and Power laws, which aid in recognizing heavy-tailed distributions. @@ -300,6 +331,12 @@ distribution](https://en.wikipedia.org/wiki/Cauchy_distribution), which is heavy-tailed. ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Histogram of Cauchy distribution + name: hist-cauchy +--- n = 120 np.random.seed(11) @@ -353,6 +390,12 @@ The exponential distribution is a light-tailed distribution. Here are some draws from the exponential distribution. ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Histogram of Exponential distribution + name: hist-exponential +--- n = 120 np.random.seed(11) @@ -394,7 +437,9 @@ exponential random variable. In particular, if $X$ is exponentially distributed with rate parameter $\alpha$, then -$$ Y = \bar x \exp(X) $$ +$$ +Y = \bar x \exp(X) +$$ is Pareto-distributed with minimum $\bar x$ and tail index $\alpha$. @@ -402,6 +447,12 @@ Here are some draws from the Pareto distribution with tail index $1$ and minimum $1$. ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Histogram of Pareto distribution + name: hist-pareto +--- n = 120 np.random.seed(11) @@ -425,7 +476,9 @@ light and heavy tails is to look at the For a random variable $X$ with CDF $F$, the CCDF is the function -$$ G(x) := 1 - F(x) = \mathbb P\{X > x\} $$ +$$ +G(x) := 1 - F(x) = \mathbb P\{X > x\} +$$ (Some authors call $G$ the "survival" function.) @@ -433,13 +486,17 @@ The CCDF shows how fast the upper tail goes to zero as $x \to \infty$. If $X$ is exponentially distributed with rate parameter $\alpha$, then the CCDF is -$$ G_E(x) = \exp(- \alpha x) $$ +$$ +G_E(x) = \exp(- \alpha x) +$$ This function goes to zero relatively quickly as $x$ gets large. The standard Pareto distribution, where $\bar x = 1$, has CCDF -$$ G_P(x) = x^{- \alpha} $$ +$$ +G_P(x) = x^{- \alpha} +$$ This function goes to zero as $x \to \infty$, but much slower than $G_E$. @@ -505,13 +562,21 @@ The sample counterpart of the CCDF function is the **empirical CCDF**. Given a sample $x_1, \ldots, x_n$, the empirical CCDF is given by -$$ \hat G(x) = \frac{1}{n} \sum_{i=1}^n \mathbb 1\{x_i > x\} $$ +$$ +\hat G(x) = \frac{1}{n} \sum_{i=1}^n \mathbb 1\{x_i > x\} +$$ Thus, $\hat G(x)$ shows the fraction of the sample that exceeds $x$. Here's a figure containing some empirical CCDFs from simulated data. ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Empirical CCDFs + name: ccdf-empirics +--- def eccdf(x, data): "Simple empirical CCDF function." return np.mean(data > x) @@ -690,7 +755,13 @@ def extract_wb(varlist=['NY.GDP.MKTP.CD'], Here is a plot of the firm size distribution for the largest 500 firms in 2020 taken from Forbes Global 2000. ```{code-cell} ipython3 -:tags: [hide-input] +--- +tags: [hide-input] +mystnb: + figure: + caption: Firm size distribution + name: firm-size-dist +--- df_fs = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/forbes-global2000.csv') df_fs = df_fs[['Country', 'Sales', 'Profits', 'Assets', 'Market Value']] @@ -711,7 +782,13 @@ Here are plots of the city size distribution for the US and Brazil in 2023 from The size is measured by population. ```{code-cell} ipython3 -:tags: [hide-input] +--- +tags: [hide-input] +mystnb: + figure: + caption: City size distribution + name: city-size-dist +--- # import population data of cities in 2023 United States and 2023 Brazil from world population review df_cs_us = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/cities_us.csv') @@ -732,7 +809,13 @@ Here is a plot of the upper tail (top 500) of the wealth distribution. The data is from the Forbes Billionaires list in 2020. ```{code-cell} ipython3 -:tags: [hide-input] +--- +tags: [hide-input] +mystnb: + figure: + caption: Wealth distribution (Forbes Billionaires in 2020) + name: wealth-dist +--- df_w = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/forbes-billionaires.csv') df_w = df_w[['country', 'realTimeWorth', 'realTimeRank']].dropna() @@ -782,7 +865,13 @@ df_gdp1.dropna(inplace=True) ``` ```{code-cell} ipython3 -:tags: [hide-input] +--- +tags: [hide-input] +mystnb: + figure: + caption: GDP per capita distribution + name: gdppc-dist +--- fig, axes = plt.subplots(1, 2, figsize=(8.8, 3.6)) @@ -828,6 +917,12 @@ Let's have a look at the behavior of the sample mean in this case, and see whether or not the LLN is still valid. ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: LLN failure + name: fail-lln +--- from scipy.stats import cauchy np.random.seed(1234) @@ -887,7 +982,9 @@ portfolio is $\mu$ and the variance is $\sigma^2$. If instead the investor puts share $1/n$ of her wealth in each asset, then the portfolio payoff is -$$ Y_n = \sum_{i=1}^n \frac{X_i}{n} = \frac{1}{n} \sum_{i=1}^n X_i. $$ +$$ +Y_n = \sum_{i=1}^n \frac{X_i}{n} = \frac{1}{n} \sum_{i=1}^n X_i. +$$ Try computing the mean and variance. @@ -918,8 +1015,6 @@ For example, the heaviness of the tail of the income distribution helps determine {doc}`how much revenue a given tax policy will raise `. - - (cltail)= ## Classifying tail properties @@ -964,7 +1059,9 @@ For example, every random variable with bounded support is light-tailed. (Why?) As another example, if $X$ has the [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution), with cdf $F(x) = 1 - \exp(-\lambda x)$ for some $\lambda > 0$, then its moment generating function is -$$ m(t) = \frac{\lambda}{\lambda - t} \quad \text{when } t < \lambda $$ +$$ +m(t) = \frac{\lambda}{\lambda - t} \quad \text{when } t < \lambda +$$ In particular, $m(t)$ is finite whenever $t < \lambda$, so $X$ is light-tailed. @@ -1023,7 +1120,7 @@ $$ But then $$ - \mathbb E X^r = r \int_0^\infty x^{r-1} \mathbb P\{ X > x \} dx +\mathbb E X^r = r \int_0^\infty x^{r-1} \mathbb P\{ X > x \} dx \geq r \int_0^{\bar x} x^{r-1} \mathbb P\{ X > x \} dx + r \int_{\bar x}^\infty x^{r-1} b x^{-\alpha} dx. @@ -1254,7 +1351,7 @@ assumption leads to a lower mean and greater dispersion. The [characteristic function](https://en.wikipedia.org/wiki/Characteristic_function_%28probability_theory%29) of the Cauchy distribution is $$ - \phi(t) = \mathbb E e^{itX} = \int e^{i t x} f(x) dx = e^{-|t|} +\phi(t) = \mathbb E e^{itX} = \int e^{i t x} f(x) dx = e^{-|t|} $$ (lln_cch) Prove that the sample mean $\bar X_n$ of $n$ independent draws $X_1, \ldots, From b135dd83271d59f8bc136e1feb537b8feb2cc48e Mon Sep 17 00:00:00 2001 From: mmcky Date: Fri, 26 Apr 2024 15:50:12 +1000 Subject: [PATCH 03/10] review of figure titles --- lectures/heavy_tails.md | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md index 533c8cbf..42042630 100644 --- a/lectures/heavy_tails.md +++ b/lectures/heavy_tails.md @@ -334,8 +334,8 @@ heavy-tailed. --- mystnb: figure: - caption: Histogram of Cauchy distribution - name: hist-cauchy + caption: Draws from Normal and Cauchy distributions + name: draws-normal-cauchy --- n = 120 np.random.seed(11) @@ -393,8 +393,8 @@ Here are some draws from the exponential distribution. --- mystnb: figure: - caption: Histogram of Exponential distribution - name: hist-exponential + caption: Draws of Exponential distribution + name: draws-exponential --- n = 120 np.random.seed(11) @@ -450,8 +450,8 @@ $1$. --- mystnb: figure: - caption: Histogram of Pareto distribution - name: hist-pareto + caption: Draws from Pareto distribution + name: draws-pareto --- n = 120 np.random.seed(11) @@ -528,6 +528,12 @@ $$ Here's a plot that illustrates how $G_E$ goes to zero faster than $G_P$. ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Pareto and exponential distribution comparison + name: compare-pareto-exponential +--- x = np.linspace(1.5, 100, 1000) fig, ax = plt.subplots() alpha = 1.0 @@ -541,6 +547,12 @@ Here's a log-log plot of the same functions, which makes visual comparison easier. ```{code-cell} ipython3 +--- +mystnb: + figure: + caption: Pareto and exponential distribution comparison (log-log) + name: compare-pareto-exponential-log-log +--- fig, ax = plt.subplots() alpha = 1.0 ax.loglog(x, np.exp(- alpha * x), label='exponential', alpha=0.8) From b607dd115a33cb5b0db86b7ef39dfea7f0c3b272 Mon Sep 17 00:00:00 2001 From: mmcky Date: Mon, 29 Apr 2024 10:37:44 +1000 Subject: [PATCH 04/10] adjust line colour --- lectures/heavy_tails.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md index 42042630..29008d6f 100644 --- a/lectures/heavy_tails.md +++ b/lectures/heavy_tails.md @@ -182,6 +182,7 @@ The code below produces the desired plot using Yahoo financial data via the `yfi ```{code-cell} ipython3 :tags: [hide-output] + data = yf.download('AMZN', '2015-1-1', '2022-7-1') ``` @@ -213,6 +214,7 @@ We get a similar picture if we look at other assets, such as Bitcoin ```{code-cell} ipython3 :tags: [hide-output] + data = yf.download('BTC-USD', '2015-1-1', '2022-7-1') ``` @@ -254,7 +256,7 @@ ax.hist(r, bins=60, alpha=0.4, label='bitcoin returns', density=True) xmin, xmax = plt.xlim() x = np.linspace(xmin, xmax, 100) p = norm.pdf(x, np.mean(r), np.std(r)) -ax.plot(x, p, 'k', linewidth=2, label='normal distribution') +ax.plot(x, p, linewidth=2, label='normal distribution') ax.set_xlabel('returns', fontsize=12) ax.legend() @@ -768,12 +770,12 @@ Here is a plot of the firm size distribution for the largest 500 firms in 2020 t ```{code-cell} ipython3 --- -tags: [hide-input] mystnb: figure: caption: Firm size distribution name: firm-size-dist ---- +tags: [hide-input] +--- df_fs = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/forbes-global2000.csv') df_fs = df_fs[['Country', 'Sales', 'Profits', 'Assets', 'Market Value']] @@ -795,12 +797,12 @@ The size is measured by population. ```{code-cell} ipython3 --- -tags: [hide-input] mystnb: figure: caption: City size distribution name: city-size-dist ---- +tags: [hide-input] +--- # import population data of cities in 2023 United States and 2023 Brazil from world population review df_cs_us = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/cities_us.csv') @@ -822,12 +824,12 @@ The data is from the Forbes Billionaires list in 2020. ```{code-cell} ipython3 --- -tags: [hide-input] mystnb: figure: caption: Wealth distribution (Forbes Billionaires in 2020) name: wealth-dist ---- +tags: [hide-input] +--- df_w = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/forbes-billionaires.csv') df_w = df_w[['country', 'realTimeWorth', 'realTimeRank']].dropna() @@ -878,12 +880,12 @@ df_gdp1.dropna(inplace=True) ```{code-cell} ipython3 --- -tags: [hide-input] mystnb: figure: caption: GDP per capita distribution name: gdppc-dist ---- +tags: [hide-input] +--- fig, axes = plt.subplots(1, 2, figsize=(8.8, 3.6)) @@ -934,7 +936,7 @@ mystnb: figure: caption: LLN failure name: fail-lln ---- +--- from scipy.stats import cauchy np.random.seed(1234) From 959e37da26bbb83e9c439285f38e4db9e5acb280 Mon Sep 17 00:00:00 2001 From: mmcky Date: Mon, 29 Apr 2024 11:10:22 +1000 Subject: [PATCH 05/10] reorganise code and add q-q plot --- lectures/heavy_tails.md | 55 ++++++++++++++++++++++++++++++++--------- 1 file changed, 43 insertions(+), 12 deletions(-) diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md index 29008d6f..b3a90022 100644 --- a/lectures/heavy_tails.md +++ b/lectures/heavy_tails.md @@ -582,6 +582,12 @@ $$ Thus, $\hat G(x)$ shows the fraction of the sample that exceeds $x$. +```{code-cell} ipython3 +def eccdf(x, data): + "Simple empirical CCDF function." + return np.mean(data > x) +``` + Here's a figure containing some empirical CCDFs from simulated data. ```{code-cell} ipython3 @@ -591,21 +597,20 @@ mystnb: caption: Empirical CCDFs name: ccdf-empirics --- -def eccdf(x, data): - "Simple empirical CCDF function." - return np.mean(data > x) - +# Parameters and grid x_grid = np.linspace(1, 1000, 1000) sample_size = 1000 np.random.seed(13) z = np.random.randn(sample_size) -data_1 = np.random.exponential(size=sample_size) -data_2 = np.exp(z) -data_3 = np.exp(np.random.exponential(size=sample_size)) +# Draws +data_exp = np.random.exponential(size=sample_size) +data_logn = np.exp(z) +data_pareto = np.exp(np.random.exponential(size=sample_size)) -data_list = [data_1, data_2, data_3] +data_list = [data_exp, data_logn, data_pareto] +# Build figure fig, axes = plt.subplots(3, 1, figsize=(6, 8)) axes = axes.flatten() labels = ['exponential', 'lognormal', 'Pareto'] @@ -630,6 +635,36 @@ approximately linear in a log-log plot. We will use this idea [below](https://intro.quantecon.org/heavy_tails.html#heavy-tails-in-economic-cross-sections) when we look at real data. ++++ + +#### Q-Q Plots + +We can also use a [qq plot](https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot) to do a visual comparison between two probability distributions. + +The [statsmodels](https://www.statsmodels.org/stable/index.html) package provides a convenient [qqplot](https://www.statsmodels.org/stable/generated/statsmodels.graphics.gofplots.qqplot.html) function that, by default, compares sample data to the quintiles of the normal distribution. + +If the data is drawn from a Normal distribution, the plot would look like: + +```{code-cell} ipython3 +data_normal = np.random.normal(size=sample_size) +sm.qqplot(data_normal, line='45') +plt.show() +``` + +We can now compare this with the exponential, log-normal, and pareto distributions + +```{code-cell} ipython3 +# Build figure +fig, axes = plt.subplots(3, 1, figsize=(6, 8)) +axes = axes.flatten() +labels = ['exponential', 'lognormal', 'Pareto'] +for data, label, ax in zip(data_list, labels, axes): + sm.qqplot(data, line='45', ax=ax, ) + ax.set_title(label) +plt.tight_layout() +plt.show() +``` + ### Power laws @@ -776,7 +811,6 @@ mystnb: name: firm-size-dist tags: [hide-input] --- - df_fs = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/forbes-global2000.csv') df_fs = df_fs[['Country', 'Sales', 'Profits', 'Assets', 'Market Value']] fig, ax = plt.subplots(figsize=(6.4, 3.5)) @@ -803,7 +837,6 @@ mystnb: name: city-size-dist tags: [hide-input] --- - # import population data of cities in 2023 United States and 2023 Brazil from world population review df_cs_us = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/cities_us.csv') df_cs_br = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/cities_brazil.csv') @@ -830,7 +863,6 @@ mystnb: name: wealth-dist tags: [hide-input] --- - df_w = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/forbes-billionaires.csv') df_w = df_w[['country', 'realTimeWorth', 'realTimeRank']].dropna() df_w = df_w.astype({'realTimeRank': int}) @@ -886,7 +918,6 @@ mystnb: name: gdppc-dist tags: [hide-input] --- - fig, axes = plt.subplots(1, 2, figsize=(8.8, 3.6)) for name, ax in zip(variable_names, axes): From d6f575003fbc5cada05690a3f5e93ccfb1687ff5 Mon Sep 17 00:00:00 2001 From: mmcky Date: Mon, 29 Apr 2024 11:28:03 +1000 Subject: [PATCH 06/10] add wiki links and idea on less formal discussion on heavy-tails --- lectures/heavy_tails.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md index b3a90022..80c4f75a 100644 --- a/lectures/heavy_tails.md +++ b/lectures/heavy_tails.md @@ -1076,7 +1076,7 @@ left hand tails are very similar and we omit them to simplify the exposition. ### Light and heavy tails -A distribution $F$ with density $f$ on $\mathbb R_+$ is called **heavy-tailed** if +A distribution $F$ with density $f$ on $\mathbb R_+$ is called [heavy-tailed](https://en.wikipedia.org/wiki/Heavy-tailed_distribution) if ```{math} :label: defht @@ -1096,6 +1096,8 @@ $(0, \infty)$. The Pareto distribution is also heavy-tailed. +Less formally, a **heavy-tailed** distribution is one that is not exponentially bounded (i.e. the tails are heavier than the exponential distribution). + A distribution $F$ on $\mathbb R_+$ is called **light-tailed** if it is not heavy-tailed. A nonnegative random variable $X$ is **light-tailed** if its distribution $F$ is light-tailed. From 15f9e14d293d21ab094b28364800fb232a1ebf06 Mon Sep 17 00:00:00 2001 From: mmcky Date: Mon, 29 Apr 2024 11:33:50 +1000 Subject: [PATCH 07/10] square q-q plots --- lectures/heavy_tails.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md index 80c4f75a..b01ebbea 100644 --- a/lectures/heavy_tails.md +++ b/lectures/heavy_tails.md @@ -655,7 +655,7 @@ We can now compare this with the exponential, log-normal, and pareto distributio ```{code-cell} ipython3 # Build figure -fig, axes = plt.subplots(3, 1, figsize=(6, 8)) +fig, axes = plt.subplots(3, 1, figsize=(8, 8)) axes = axes.flatten() labels = ['exponential', 'lognormal', 'Pareto'] for data, label, ax in zip(data_list, labels, axes): From 90bf2c26af2f1fbe2524d1528702bdec6ae79b69 Mon Sep 17 00:00:00 2001 From: mmcky Date: Mon, 29 Apr 2024 11:56:12 +1000 Subject: [PATCH 08/10] qq-plot multiples of 3 and horizontal --- lectures/heavy_tails.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md index b01ebbea..a63bd34b 100644 --- a/lectures/heavy_tails.md +++ b/lectures/heavy_tails.md @@ -655,7 +655,7 @@ We can now compare this with the exponential, log-normal, and pareto distributio ```{code-cell} ipython3 # Build figure -fig, axes = plt.subplots(3, 1, figsize=(8, 8)) +fig, axes = plt.subplots(1, 3, figsize=(12, 4)) axes = axes.flatten() labels = ['exponential', 'lognormal', 'Pareto'] for data, label, ax in zip(data_list, labels, axes): @@ -665,7 +665,6 @@ plt.tight_layout() plt.show() ``` - ### Power laws From f32785b73f4e3377fb6557bd5ebcddc4b662c79a Mon Sep 17 00:00:00 2001 From: mmcky Date: Mon, 29 Apr 2024 14:28:13 +1000 Subject: [PATCH 09/10] follow wikipedia style for capitalization, Normal -> normal in text --- lectures/heavy_tails.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md index a63bd34b..0dd48009 100644 --- a/lectures/heavy_tails.md +++ b/lectures/heavy_tails.md @@ -245,7 +245,7 @@ distribution: --- mystnb: figure: - caption: Histogram (Normal vs Bitcoin returns) + caption: Histogram (normal vs bitcoin returns) name: hist-normal-btc --- r = np.random.standard_t(df=5, size=1000) @@ -336,7 +336,7 @@ heavy-tailed. --- mystnb: figure: - caption: Draws from Normal and Cauchy distributions + caption: Draws from normal and Cauchy distributions name: draws-normal-cauchy --- n = 120 @@ -395,7 +395,7 @@ Here are some draws from the exponential distribution. --- mystnb: figure: - caption: Draws of Exponential distribution + caption: Draws of exponential distribution name: draws-exponential --- n = 120 @@ -643,7 +643,7 @@ We can also use a [qq plot](https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot) to d The [statsmodels](https://www.statsmodels.org/stable/index.html) package provides a convenient [qqplot](https://www.statsmodels.org/stable/generated/statsmodels.graphics.gofplots.qqplot.html) function that, by default, compares sample data to the quintiles of the normal distribution. -If the data is drawn from a Normal distribution, the plot would look like: +If the data is drawn from a normal distribution, the plot would look like: ```{code-cell} ipython3 data_normal = np.random.normal(size=sample_size) @@ -651,7 +651,7 @@ sm.qqplot(data_normal, line='45') plt.show() ``` -We can now compare this with the exponential, log-normal, and pareto distributions +We can now compare this with the exponential, log-normal, and Pareto distributions ```{code-cell} ipython3 # Build figure @@ -858,7 +858,7 @@ The data is from the Forbes Billionaires list in 2020. --- mystnb: figure: - caption: Wealth distribution (Forbes Billionaires in 2020) + caption: Wealth distribution (Forbes billionaires in 2020) name: wealth-dist tags: [hide-input] --- From a44a5df4b131965fb7e83f184b8adf9ec53b79fa Mon Sep 17 00:00:00 2001 From: mmcky Date: Tue, 30 Apr 2024 10:34:43 +1000 Subject: [PATCH 10/10] remove bolding as not formal --- lectures/heavy_tails.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md index 0dd48009..137af816 100644 --- a/lectures/heavy_tails.md +++ b/lectures/heavy_tails.md @@ -1095,7 +1095,7 @@ $(0, \infty)$. The Pareto distribution is also heavy-tailed. -Less formally, a **heavy-tailed** distribution is one that is not exponentially bounded (i.e. the tails are heavier than the exponential distribution). +Less formally, a heavy-tailed distribution is one that is not exponentially bounded (i.e. the tails are heavier than the exponential distribution). A distribution $F$ on $\mathbb R_+$ is called **light-tailed** if it is not heavy-tailed.