Enterprises deploy many point solutions to defend their networks. These point solutions provide a wealth of data about the enterprise assets and networks but it is difficult to provide analytics on this data because there is no common repository and the events are in different formats. The Cybersec Toolkit is a pipeline that ingests, correlates and prepares cybersecurity data for analytics. The Cyber Toolkit leverages the Cloudera Data Platform to build a Security Data Lakehouse.
The Cyber Toolkit ingests raw log events from a variety of sources, parses and normalizes the log events using a common schema, enriches the events with reference data, scores the log events, profiles the events, and streams the events to a Kafka and a data lakehouse. Integrate with orchestration or investigation and ticketing platforms using Flink SQL (SQL Stream Builder) on the triaged event topic. Query the data lakehouse using SQL for visualizations and ad hoc queries or Spark for notebooks, investigations and machine learning model training.
The Cyber Toolkit is flexible and configurable so the ingestion can be changed with low or no code.
The Cybersec Toolkit includes a Cloudera Manager parcel and service for easier installation.
Artifacts are available for download on the releases page. You can also find less stable, but more up to date artifacts by selecting one of successful runs on this page and scrolling to the bottom of the selected run page.
Or you can find artifacts after the build in the following directories:
git clone https://github.com/cloudera/cybersec.git
cd cybersec/flink-cyber
mvn clean install
cd cybersec/flink-cyber
mvn clean install -DskipTests