Random Clusters Generator

Generate datasets with defined clusters using a normal distribution. This specialized tool allows you to customize data creation, specifying the number of significant columns forming the clusters. Additionally, it provides the option to include dummy columns, adding variability and noise to your datasets.

Key Features:

Defined Clusters: Create datasets with clearly defined clusters, ideal for applying clustering algorithms.
Normal Distribution: Utilize a normal distribution to generate data, providing realism and coherence in your datasets.
Significant Columns Configuration: Customize the number of columns forming clusters, allowing you to adjust the complexity of your datasets.
Optional Dummy Columns: Add dummy columns to introduce variability and noise in the data, providing a more realistic approach to real-world scenarios.

Project Status

🚀 In Development | Production Ready

This project is in constant evolution to enhance and expand its functionalities. We welcome any community contributions to make it even more robust. While the current version is ready for deployment in production environments, we are committed to continuous improvement and optimization of the code. Feel free to explore, use, and contribute to the project. Refer to the Contribution section for more details on how to get involved in development. Your feedback and suggestions are valuable to us. Together, we can make this project even better!

Project Structure

This project consists of three main files, each serving a specific purpose:

main_generator.py: This file is responsible for generating datasets based on the parameters provided. It allows users to create customized datasets with defined clusters using a normal distribution. Users can specify the number of significant columns forming the clusters and choose to include additional dummy columns for added variability.
main_generator_parameters.py: This file generates datasets based on parameters specified in a CSV file. Users can provide a CSV file containing configuration details, and the script will use this information to create datasets accordingly.
config_gen.py: This file generates a CSV file in the config folder containing various combinations of data parameters, including the number of clusters, the number of significant features, the number of dummy features, and standard deviation values.
add_dummy_columns.py: The purpose of this file is to add dummy columns to an existing CSV file. It takes a CSV file as input and appends additional columns with dummy data, introducing variability and noise to the dataset.

(back to top)

How to use

Configuration Parameters in `main_generator.py`

DATA_PATH: The name of the result path where generated datasets will be saved. Example: DATA_PATH = "data"
CLUSTERS_NUM: The number of clusters to be generated. The maximum value must be equal to SIGNIFICANT_NUM^2. Example: CLUSTERS_NUM = 4
INSTANCES: The number of instances per cluster in the generated datasets. Example: INSTANCES = 10
SIGNIFICANT_NUM: The number of significant columns forming the clusters. Pay attention to CLUSTERS_NUM as it must satisfy the condition CLUSTERS_NUM = SIGNIFICANT_NUM^2. Example: SIGNIFICANT_NUM = 3
DUMMY_NUM: The number of dummy columns to be included, adding variability and noise to the datasets. Example: DUMMY_NUM = 3
STANDARD_DEV: The standard deviation for the Normal Distribution of data, influencing the spread of the generated values. Example: STANDARD_DEV = 0.05

Configuration Parameters in `main_generator_parameters.py`

The main_generator_parameters.py script generates datasets based on configuration parameters specified in a CSV file. The CSV file should have the following columns:

clusters_num: The number of clusters to be generated. It influences the structure of the datasets. Example: clusters_num = 4
significant_num: The number of significant columns forming the clusters. It must satisfy the condition clusters_num = significant_num^2. Example: significant_num = 3
dummy_num: Description: The number of dummy columns to be included, adding variability and noise to the datasets. Example: dummy_num = 3
standard_dev: The standard deviation for the Normal Distribution of data, influencing the spread of the generated values. Example: standard_dev = 0.05

To use this script, create a CSV file with these columns and corresponding values, and then run the script by specifying the path to your CSV file.

Example CSV file:

clusters_num,significant_num,dummy_num,standard_dev
4,2,2,0.10
4,6,3,0.05

(back to top)

Contribution

🎉 We welcome and encourage community contributions to enhance this project. Whether you want to report issues, propose new features, or submit improvements, your collaboration is valuable.

How to Contribute

Fork the Repository:
- Fork the repository to your GitHub account.

Clone the Repository:

Clone the forked repository to your local machine.

git clone https://github.com/josemarialuna/RandomClustersGenerator.git
cd RandomClustersGenerator

(back to top)

License

This project is licensed under the MIT License - see the LICENSE.md file for details

(back to top)

Contact

José María Luna-Romera - Website

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
config		config
img		img
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Random Clusters Generator

Key Features:

Project Status

Project Structure

How to use

Configuration Parameters in `main_generator.py`

Configuration Parameters in `main_generator_parameters.py`

Example CSV file:

Contribution

How to Contribute

License

Contact

About

Languages

License

josemarialuna/RandomClustersGenerator

Folders and files

Latest commit

History

Repository files navigation

Random Clusters Generator

Key Features:

Project Status

Project Structure

How to use

Configuration Parameters in main_generator.py

Configuration Parameters in main_generator_parameters.py

Example CSV file:

Contribution

How to Contribute

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Configuration Parameters in `main_generator.py`

Configuration Parameters in `main_generator_parameters.py`