Skip to content

Latest commit

 

History

History
74 lines (53 loc) · 4.22 KB

documentation.md

File metadata and controls

74 lines (53 loc) · 4.22 KB

API Reference

This is the class and function reference of library used in this repo.

Class implementing the k-Nearest-Neighbors classifier

  • src.models.classification.KnnClassifier: This class defines the KNN Classifier with specified hyperparameters and creates a StandardScaler for data standardization. If _load_model is True, it loads a pre-trained model and scaler from the specified path.

src.models.gnn: GNN embeddings

Class implementing Graph Neural Networks to generate embeddings in a self-supervised way.

  • src.models.gnn.GCN: This class implements the GCN (Graph Convolutional Network) model with the specified hyperparameters.
  • src.models.gnn.GCN_GRU: This class implements the GCN-GRU model with the specified hyperparameters. It sets up the model, optimizer, and manages CUDA if available and specified.
  • src.models.gnn.IncrementalGcnGru: This class implements the Incremental GCN-GRU model with the specified hyperparameters. It sets up the model, optimizer, and manages CUDA if available and specified.

src.models.nlp: NLP embeddings

Class implementing i-DarkVec to generate embeddings in a self-supervised way.


src.preprocessing: Preprocessing Functions

Preprocessing functions used to generate GNN embeddings

  • src.preprocessing.gnn.extract_single_snapshot: Extract and format a single snapshot from a DataFrame for a given day.
  • src.preprocessing.gnn.aggregate_edges: Aggregate edges in a DataFrame while counting packets and adding labels.
  • src.preprocessing.gnn.get_contacted_dst_ports: Get the total number of contacted destination ports per source IP.
  • src.preprocessing.gnn.get_stats_per_dst_port: Get general statistics of packets per destination port.
  • src.preprocessing.gnn.get_contacted_src_ips: Get the total number of contacted source IPs per destination port.
  • src.preprocessing.gnn.get_stats_per_src_ip: Get general statistics of packets per source IP per destination port.
  • src.preprocessing.gnn.get_contacted_dst_ips: Get the total number of contacted darknet IPs per source IP or destination port.
  • src.preprocessing.gnn.get_stats_per_dst_ip: Get general statistics of packets per destination IP per source IP or destination port.
  • src.preprocessing.gnn.get_packet_statistics: Get general packet statistics per source IP or destination port.
  • src.preprocessing.gnn.uniform_features: Uniformly format and index features DataFrame based on node lookup.
  • src.preprocessing.gnn.generate_adjacency_matrices: Generate adjacency matrices from a list of DataFrame files.

Preprocessing functions used to generate NLP embeddings

  • src.preprocessing.nlp.drop_duplicates: Remove consecutive duplicate elements from a NumPy array.
  • src.preprocessing.nlp.split_array: Split a NumPy array into smaller sub-arrays of a specified step size.

Generic preprocessing functions

  • src.preprocessing.preprocessing.generate_negatives: Generate negative edges for self-supervised training.
  • src.preprocessing.preprocessing.get_self_supervised_edges: Get self-supervised edges for training.
  • src.preprocessing.preprocessing.load_single_file: Load and preprocess a single data file.
  • src.preprocessing.preprocessing.apply_packets_filter: Apply a packet count filter to a DataFrame.
  • src.preprocessing.preprocessing.apply_port_filter : Apply a port count filter to a DataFrame.

src.utils: Utility Functions

Generic utility functions

  • src.utils._normalize: Row-normalize a sparse matrix.
  • src.utils._sparse_mx_to_torch_sparse_tensor: Convert a scipy sparse matrix to a torch sparse tensor.
  • src.utils.get_set_diff: Compute the set difference between two arrays A and B.
  • src.utils.compute_accuracy: Compute accuracy between true and predicted labels.
  • src.utils.get_diagonal_features: Get a sparse diagonal feature matrix.
  • src.utils.initalize_output_folder: Initialize an output folder for experiment results.