diff --git a/manuscript/05.6-agnostic-permfeatimp.Rmd b/manuscript/05.6-agnostic-permfeatimp.Rmd index 1bbd64160..4cf7e0536 100644 --- a/manuscript/05.6-agnostic-permfeatimp.Rmd +++ b/manuscript/05.6-agnostic-permfeatimp.Rmd @@ -238,8 +238,6 @@ Since another feature is chosen as the first split, the whole tree can be very d ### Disadvantages -It is very **unclear whether you should use training or test data** to compute the feature importance. - Permutation feature importance is **linked to the error of the model**. This is not inherently bad, but in some cases not what you need. In some cases, you might prefer to know how much the model's output varies for a feature without considering what it means for performance. @@ -260,6 +258,7 @@ The permutation of features produces unlikely data instances when two or more fe When they are positively correlated (like height and weight of a person) and I shuffle one of the features, I create new instances that are unlikely or even physically impossible (2 meter person weighing 30 kg for example), yet I use these new instances to measure the importance. In other words, for the permutation feature importance of a correlated feature, we consider how much the model performance decreases when we exchange the feature with values we would never observe in reality. Check if the features are strongly correlated and be careful about the interpretation of the feature importance if they are. +However, pairwise correlations might not be sufficient to reveal the problem. Another tricky thing: **Adding a correlated feature can decrease the importance of the associated feature** by splitting the importance between both features.