From 59b60e6bb8c635f104d67df72366eed049f4dde1 Mon Sep 17 00:00:00 2001
From: Christoph Molnar <christoph.molnar@gmail.com>
Date: Fri, 2 Nov 2018 14:58:39 +0100
Subject: [PATCH] upper case titles

---
 manuscript/04.2-interpretable-linear.Rmd | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/manuscript/04.2-interpretable-linear.Rmd b/manuscript/04.2-interpretable-linear.Rmd
index 07eb3731e..8b016b7a4 100644
--- a/manuscript/04.2-interpretable-linear.Rmd
+++ b/manuscript/04.2-interpretable-linear.Rmd
@@ -131,7 +131,7 @@ If you think of the features as knobs that you can turn up or down, it is nice t
 On the bad side of things, the interpretation ignores the joint distribution with other features.
 Increasing one feature, but not changing others, might create unrealistic, or at least unlikely, data points.
 
-### Interpretation templates
+### Interpretation Templates
 The interpretation of the features in the linear model can be automated by using following text templates.
 
 **Interpretation of a Numerical Feature**
@@ -143,10 +143,10 @@ An increase of $x_{k}$ by one unit increases the expectation for $y$ by $\beta_k
 A change from $x_{k}$'s reference level to the other category increases the expectation for $y$ by $\beta_{k}$, given all other features stay the same.
 
 
-### Visual parameter interpretation
+### Visual Parameter Interpretation
 Different visualisations make the linear model outcomes easy and quick to grasp for humans.
 
-#### Weight plot
+#### Weight Plot
 The information of the weights table (weight estimates and variance) can be visualised in a weight plot (showing the results from the linear model fitted before):
 
 ```{r linear-weights-plot, fig.cap="Each row in the plot represents one feature weight. The weights are displayed as points and the 0.95 confidence intervals with a line around the points. A 0.95 confidence interval means that if the linear model would be estimated 100 times on similar data, in 95 out of 100 times, the confidence interval would cover the true weight, under the linear model assumptions (linearity, normality, homoscedasticity, independence, fixed features, absence of multicolinearity)."}
@@ -255,7 +255,7 @@ You have a model to predict the value of a house and have features like number o
 House area and number of rooms are highly correlated: the bigger a house is, the more rooms it has. If you now take both features into a linear model, it might happen, that the area of the house is the better predictor and get's a large positive weight.
 The number of rooms might end up getting a negative weight, because either, given that a house has the same size, increasing the number of rooms could make it less valuable or the linear equation becomes less stable, when the correlation is too strong.
 
-### Do linear models create good explanations?
+### Do Linear Models Create Good Explanations?
 Judging by the attributes that constitute a good explanation as [presented in this section](#good-explanation), linear models don't create the best explanations.
 They are contrastive, but the reference instance is a data point for which all continuous features are zero and the categorical features at their reference levels.
 This is usually an artificial, meaningless instance, which is unlikely to occur in your dataset.
@@ -271,7 +271,7 @@ The linearity makes the explanations more general and simple.
 The linear nature of the model, I believe, is the main factor why people like linear models for explaining relationships.
 
 
-###  Sparse linear models {#sparse-linear}
+###  Sparse Linear Models {#sparse-linear}
 The examples for the linear models that I chose look all nice and tidy, right?
 But in reality you might not have just a handful of features, but hundreds or thousands.
 And your normal linear models?