Skip to content

Commit

Permalink
Merge pull request #35 from sdgary56249128/patch-2
Browse files Browse the repository at this point in the history
Update IsolationForest_example.md
  • Loading branch information
htygithub authored Feb 16, 2020
2 parents f50809d + 1c946ca commit 72fb83d
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions Ensemble_methods/IsolationForest_example.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# **IsolationForest example**

https://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html#sphx-glr-auto-examples-ensemble-plot-isolation-forest-py

此範例介紹IsolationForest(隔離森林、孤立森林)的使用方式及其效果,使用IsolationForest會回傳每個樣本的異常分數
Expand All @@ -14,16 +15,19 @@ IsolationForest是用於異常檢測的unsupervised learning(無監督學習)算
* numpy : 產生陣列數值
* matplotlib.pyplot : 用來繪製影像
* sklearn.ensemble import IsolationForest : 匯入隔離森林算法

```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
```

## (二)產生訓練樣本

* np.random.RandomState(seed) : 產生偽隨機數,當seed值相同時,產生的數值為一樣
* np.r_[] : 將數據沿第一個軸相連接
* rng.uniform() : 隨機數產生

```python
rng = np.random.RandomState(42)

Expand All @@ -36,16 +40,20 @@ X_test = np.r_[X + 2, X - 2] # 將+,-2的資料相連接成為一筆(40,2)
# Generate some abnormal novel observations
X_outliers = rng.uniform(low=-4, high=4, size=(20, 2)) # 生成20筆新的異常資料,藉由亂數產生
```

## (三)IsolationForest model

* IsolationForest(n_estimators=100, max_samples='auto', contamination='auto', max_features=1.0, bootstrap=False, n_jobs=None, behaviour='deprecated', random_state=None, verbose=0, warm_start=False)

1. n_estimators : 森林中樹的棵樹
2. max_samples : 每棵樹中的樣本數量
3. contamination : 設置樣本中異常
4. max_features : 每顆樹中特徵個數或比例
5. random_state : 隨機數與random_seed作用相同

* fit() : 擬合資料
* predict() : 預測資料

```python
# fit the Model
clf = IsolationForest(max_samples=100, random_state=rng)
Expand All @@ -54,6 +62,7 @@ y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)
```

## (四)繪製結果

* np.meshgrid() : 從給定的座標向量回傳座標矩陣
Expand All @@ -62,6 +71,7 @@ y_pred_outliers = clf.predict(X_outliers)
* plt.contourf() : 繪製輪廓
* plt.scatter() : 繪製x與y的散點圖,其中標記大小和顏色不同
最後用下面的程式將所有點繪製出來

```python
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Expand All @@ -85,10 +95,13 @@ plt.legend([b1, b2, c],
loc="upper left")
plt.show()
```

![](https://github.com/sdgary56249128/machine-learning-python/blob/master/Ensemble_methods/sphx_glr_plot_isolation_forest_001.png)

## (五)完整程式碼

https://scikit-learn.org/stable/_downloads/a48f0894575e256740089d572cff3acd/plot_isolation_forest.py

```python
print(__doc__)

Expand Down

0 comments on commit 72fb83d

Please sign in to comment.