This repository contains code and experiments related to time series classification using shapelets. The experiments focus on evaluating various machine learning models and quality measures on different time series datasets. Below, you will find an overview of the contents and how to navigate this repository.
This repository focuses on shapelet-based time series classification. Running demo.ipynb
is an easy and fast way to explore our project's capabilities. Additionally, you can experiment with our Ipol demo.
-
Utils: The
utils
directory contains Python scripts with the main functions used in the experiments. It includes code for shapelet selection, shapelet transformation of datasets and computing quality measures. There is also a Notebookjson_to_table.ipynb
usefull to translate the .json results into latex tables. -
Datasets: The
datasets/raw_datasets
directory contains the unprocessed series datasets used in the experiments. Thedatasets/preprocessed_datasets
directory contains the processed datasets stored in NumPy format files for easy loading. -
Jupyter Notebook: The repository contains a range of Jupyter Notebooks, each tailored for specific aspects of our time series analysis project. The
demo.ipynb
offers a practical demonstration on a synthetic dataset, giving users an insight into the code's functionality. For more in-depth analysis,main.ipynb
focuses on testing the efficacy of conventional classification algorithms on shapelet-transformed datasets. To ensure the robustness of our methods,sanity_checks.ipynb
is dedicated to experimenting with key functions and verifying their performance. The preprocessing of our datasets is documented indatasets/datasets_processing.ipynb
. Lastly,shapelet_visualization.ipynb
presents a selection of the most informative shapelets discovered across various datasets. -
Experiments Results :
results
directory stores the results of the experiments, including classification accuracy on each dataset and for each quality measure shapelet selection, and selected shapelets. The results are organized by dataset name and quality measure. These files are generated bymain.ipynb
Notebook.
To reproduce or extend the experiments, follow these steps:
-
Clone this repository to your local machine:
git clone https://github.com/VictorBaillet/shapelets-time-series-classification-experiments.git cd shapelets-time-series-classification-experiments
-
Install the required dependencies by creating a virtual environment and using
pip
:python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate` pip install -r requirements.txt
-
Run the Jupyter notebooks to conduct experiments.
Here is a list of datasets used in the experiments:
- GunPointAgeSpan
- Synthetic
- ECG200
- ECG Five Days
- Two Lead ECG
- Mote Strain
- Sony Robot
- Beef
Refer to the dataset-specific documentation for more details. These dataset (except for the synthetic one) can be download here.
The dataset_to_parameters
dictionary in the code defines the experiment configurations, including shapelet length ranges and the number of shapelets or clusters to consider. You can customize these configurations to tailor the experiments to your specific needs.
The experiment results, including classification accuracy and quality measure evaluations, are stored in JSON files within the experiments_results
directory. You can analyze these results to gain insights into the performance of different models and quality measures on various datasets.