Welcome to threat2yar
, an advanced pipeline for automating the download, conversion, and validation of various threat data into YARA rules, enhancing cybersecurity analysis and research.
threat2yar
encompasses four main scripts, each with a distinct role in processing threat data:
- Download Data (
download.py
): Retrieves threat data from specified sources. - Generate YARA Rules (
convert_yara.py
): Transforms threat descriptions into YARA rules. - Validate YARA Rules (
validate_yara.py
): Assesses rule quality and syntax. - Generate Regex Rules (
generate_regex.py
): Utilizes string similarity and OpenAI to generate regular expressions from extracted strings.
Each component contributes to a robust process for thorough threat analysis.
- Configuration (
config.py
): Centralized settings for the pipeline. - Common Functions (
common.py
): Shared utilities across scripts.
Script Name: download.py
- Automated download of threat data based on flexible URL patterns.
- Creation of placeholder files for missing data.
- Configurable local storage of downloaded data.
- Rate control and TOR integration for discreet downloads.
- Multi-threaded approach for efficient data retrieval.
Script Name: convert_yara.py
- Processes downloaded threat data.
- Utilizes OpenAI's GPT-3.5-turbo model for YARA rule generation.
- Systematic storage of generated YARA rules.
- Accurate and relevant YARA rule generation.
- Customizable settings for API interactions and data processing.
Script Name: validate_yara.py
- Syntax verification of YARA rules using YARA binary.
- Uses OpenAI API for automated error correction.
- Sorts YARA rules into categories based on syntax, content, and complexity.
- Syntax validation and automatic correction process.
- Complexity analysis to filter out overly simplistic or convoluted rules.
- Organized categorization of rules for easy management.
Script Name: generate_regex.py
- Extracts strings from YARA rules and categorizes them based on similarity.
- Utilizes OpenAI API to generate regular expressions from string clusters.
- Filters regex rules based on complexity and relevance.
- Efficient clustering of strings to optimize regex generation.
- Integration with OpenAI API for accurate regex generation.
- Automatic filtering of complex or irrelevant regex patterns.
cli.py
simplifies pipeline control, allowing direct command-line management of all stages.
- Unified access point for all pipeline stages.
- Customization options for parameters such as API key, download URL, and file patterns.
- Ability to run specific stages or the entire pipeline seamlessly.
Execute specific stages or the entire process via CLI:
python cli.py --api-key [Your_API_Key] --download-url [URL_Pattern] --min-id [Min_ID] --max-id [Max_ID] --file-pattern [File_Pattern] --stage [Stage]