Skip to content

marcodigennaro/windml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WindML

This repo provides a tutorial of wind turbine performance prediction based on weather parameters.

Data are based on a subset of the ENGIE open dataset.

Getting Started

Prerequisites

  • Python ^3.8
  • Poetry

Getting started

# Navigate to your local folder
cd /your/local/folder

# Clone the WindML repository
git clone https://github.com/marcodigennaro/windml  

# Enter the folder
cd windml/

# Install the package
poetry install

# Activate the environment
source .venv/bin/activate

# Start Jupyter Lab
jupyter-lab  

Run any of the jupyter notebooks to visualize data and perform ML algorithms.

Data

Data are available at this URL. Since this is not always functioning, a data folder was included in this package.

Content of the Jupyter Notebooks

  1. Scalability

    • Tests the memory/speed performances of 4 python libraries The results are shown below (test results from Processor: 3,1 GHz Dual-Core Intel Core i5 with Memory: 8 GB 2133 MHz)

      File Size (MB)
      R80736.csv 51.41
      R80721.csv 51.20
      R80790.csv 51.03
      R80711.csv 51.68
      Library Time (sec) Max memory usage (MB)
      pandas 6.13 735.23 MB
      dask 18.38 603.20 MB
      vaex 3.11 268.40 MB
      modin 11.60 245.38 MB
  2. Time Series and Forecast: learning from the past:

    • Calculates and visualises 3 quantities as function of time (Average Energy, Produced Energy and Capacity Factor)

    • Performs Auto-Regression analysis and Regularizing gradient boosting

    plot plot

  3. Data Analysis

    • Extract, transform, and load (ETL)
    • Exploratory Data Analysis (EDA)
    • Feature Selection Analysis

    plot plot

  4. Machine Learning

    • Perform several regression algorithms: Linear, Polynomial, Kernel Ridge Regression
    • Perform pipeline including grid search analysis for parameter optimization
    • Compare learning by plotting Learning Curve

    plot plot

Author

Marco Di Gennaro

License

This project is licensed under the GPL v3 License - see the LICENSE.md file for details

Acknowledgements

  • A previous analysis on this database can be found here
  • More on the XGBOOST algorithms can be found here