Skip to content

2. Examples: anomaly detection

George Lopatenko edited this page Jul 5, 2024 · 12 revisions

Anomaly detection

Fedot.Industrial offers a comprehensive set of implemented approaches for an anomaly detection task.

Task statement

The anomaly detection task generally focuses on one-class classification, identifying point anomalies (outlier detection), and changepoint detection, where the exact moment when data begins to exhibit abnormal behavior can be pinpointed.

As for now Fedot.Industrial supports:

  • 'stat_detector' - statistical detector
  • 'arima_detector' - ARIMA fault detector
  • 'iforest_detector' - isolation forest detector
  • 'conv_ae_detector' - convolutional autoencoder detector
  • 'lstm_ae_detector' - LSTM autoencoder detector

Data source

The Waico development team has collected 34 datasets containing both point and group anomalies to test and compare anomaly detection algorithms. They plan to expand their SKAB (Skoltech Anomaly Benchmark) repository to 300 industrial datasets, making it one of the most comprehensive resources for anomaly detection.

For a randomly selected file from the SKAB datasets, the values of the outlier anomaly and the changepoints are as follows (a non-zero value at a particular point indicates an anomaly at that point for the outlier detection task or marks the beginning of a cluster of anomalies for the changepoint prediction task): image

Basic case

Code

from sklearn.model_selection import train_test_split

df = pd.read_csv('https://raw.githubusercontent.com/waico/SKAB/master/data/valve1/1.csv',
                 index_col='datetime', sep=';', parse_dates=True)
train_data, test_data = train_test_split(df, train_size=0.9, shuffle=True)
train_data = train_data.iloc[:, :-2].values, train_data.iloc[:, -2].values
test_data = test_data.iloc[:, :-2].values, test_data.iloc[:, -2].values

If we wish to define an anomaly detector, we must specify a set of parameters for the FedotIndustrial object:

api_params = dict(
    problem='classification',  # general task (since AD is one-class classification in core)
    industrial_strategy='anomaly_detection',  # defines a set of appropriate methods for a task
    industrial_task_params={
        'detection_window': 10,  # same as forecast_length, detection sliding window
        'data_type': 'time_series',  # data type definition
    },
    metric='accuracy',  # metric for a Fedot API
    pop_size=10,  # initial population size for EvoOptimizer
    timeout=1,  # time for model design (in minutes)
    with_tuning=False,  # whether apply tuning for a model or not
    n_jobs=2  # number of jobs for parallelization
)
from fedot_ind.api.main import FedotIndustrial

detector = FedotIndustrial(**api_params)
detector.fit(train_data)
labels = detector.predict(test_data)

probs = detector.predict_proba(test_data)
metrics = detector.get_metrics(target=test_data[1],
                               rounding_order=3,
                               metric_names=('nab', 'accuracy'))

result_dict = dict(industrial_model=detector, labels=labels, metrics=metrics)

Specific detector case

If one wishes to check out a particular implemented linear pipeline, the following steps can be taken:

from fedot_ind.core.architecture.pipelines.abstract_pipeline import AbstractPipeline
from fedot_ind.core.repository.constanst_repository import VALID_LINEAR_DETECTION_PIPELINE

pipeline_label = 'iforest_detector'  # for example
node_list = VALID_LINEAR_DETECTION_PIPELINE[pipeline_label]
data_dict = dict(benchmark='valve1', dataset='1')

result = AbstractPipeline(task='classification',
                          task_params=dict(industrial_strategy='anomaly_detection',
                                           detection_window=10)).evaluate_pipeline(node_list, data_dict)