This repository contains the GraphLeak dataset, a comprehensive dataset designed for locating and identifying leaks in water distribution networks (WDN). The dataset is intended to support researchers in developing and evaluating water leak detection models, particularly those utilizing deep learning techniques.
Note: Please refer to the corresponding folder in the folder list above for information about a specific publication. All of them use the same data generation and structure proposed in GraphLeak.
The management of water resources and the reduction of water losses due to leaks are crucial for human life and industrial processes. To improve the efficiency of leak detection algorithms, a realistic dataset with reliable values is essential. GraphLeak is a dataset created through realistic simulations using the EPANET-MATLAB toolkit. It includes various WDN scenarios and topologies, with each node representing a measurement point within the network.
- Dataset
- Water leak detection
- Deep learning
- EPANET simulation
Deep learning algorithms rely on high-quality data for accurate training and evaluation. GraphLeak provides a comprehensive dataset in tabular format, where each column represents a specific variable measured by individual sensors. The dataset includes information on pressure, flow, volume, label, and localization. The simulations are conducted using the EPANET WDN modeling software, and the datasets are exported to CSV (Comma-Separated Values) files.
The results obtained by a Multi-layer Perceptron are evaluated by the ain classification metrics of confusion matrix, such as accuracy, precision, reacall and F1-score.
The Mean Absolute Error (MAPE) is used to analyze the error between predictions and the correct values.
All the contents of GraphLeak are public and can be acessed here
- Python3
- All the libraries in
requirements.txt
From raw Data, generate the dataset by running:
python3 main.py
Meansurements Content - You can choose which measure values contain in the dataset
Pressure: True or False
Flow: True or False
Volume: True or False
Noise - If you want a Gaussian noise in the data, set noise as True.
Noise: True
Noise specification - If there is noise in the data, specify the configuration bellow:
mu: 0
mean defaultsigma: 0.1
standard deviation default
Nodes Normalization - Set True (recommended) to normalize values between nodes.
Node_normalization: True
Data Normalization - Set True (recommended) to normalize values in the range 0 to 1.
Data_normalization: True
Lucas Roberto Tomazini; Weliton do Carmo Rodrigues; Rodrigo Pita Rolle; Alexandre da Silva Simões; Esther Luna Colombini; Eduardo Paciência Godoy;
Please cite one of the following papers if you use this code for your researches:
@article{xx,
title={GraphLeak: A realistic dataset to detect and locate leaks in water distribution networks},
author={xx},
journal={xx},
volume={xx},
pages={xx},
year={xx},
publisher={xx}
}