Packet Stream Data Collection

Deep Sequence Models for Packet Stream Analysis and Early Decisions, LCN 2022.

Minji Kim, Dongeun Lee, Kookjin Lee, Doowon Kim, Sangman Lee, Jinoh Kim

The packet stream analysis is essential for the early identification of attack connections while in progress, enabling timely responses to protect system resources. However, there are several challenges for implementing effective analysis, including out-of-order packet sequences introduced due to network dynam-ics andclass imbalancewith a small fraction of attack connections available to characterize. To overcome these challenges, we present two deep sequence models: (i) a bidirectional recurrent structure designed for resilience to out-of-order packets, and (ii) a pre-training-enabled sequence-to-sequence structure designed for better dealing with unbalanced class distributions using self-supervised learning. We evaluate the presented models using a real network dataset created from month-long real traffic traces collected from backbone links with the associated intrusion log. The experimental results support the feasibility of the presented models with up to 94.8% in F1 score with the first five packets (k=5), outperforming baseline deep learning models.

Datasets

In this work, we construct packet stream data to develop the function for the early identification of network attacks by combining public traffic traces with corresponding intrusion detection logs. Specifically, we utilize the real network traces and intrusion logs collected from backbone links in Japan (MAWILab). The traffic trace contains TCP/IP packet header information in a pcap file, while the associated intrusion log is provided in a comma-separated values (CSV) file with the attack information inferred by multiple detectors. Each pcap file is a recording of 15-minute traffic collected on a specific day. We extract flow information from the 25-day network traffic collected in September 2020, except five days due to unavailability (3rd, 14th, 27th, 28th, and 29th).

The dataset consists of packet size information, packet interarrival time, c2s and taxonomy label. K means number of packets. (e.g., 0901_0930_K_3.csv has 3 packets on the same flow is measured)

MAWILabSep2020/
   ├── 0901_0930_K_3.csv
   ├── 0901_0930_K_5.csv
   ├── 0901_0930_K_10.csv
   └── 0901_0930_K_20.csv

Due to limitation of file size, we are sharining the Google drive link for dataset.

Issues

If you have questions about your rights to use, please contact dcs.tamuc@gmail.com

Acknowledgements, Usage & License

This work was supported in part by the Texas A&M University Presidential GAR Initiative program.

If you find our work useful in your research or if you use parts of this datasets or code please consider citing our paper:

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Packet Stream Data Collection

Deep Sequence Models for Packet Stream Analysis and Early Decisions, LCN 2022.

Datasets

Issues

Acknowledgements, Usage & License

About

Releases

Packages

License

dcstamuc/PacketStreamDataCollection

Folders and files

Latest commit

History

Repository files navigation

Packet Stream Data Collection

Deep Sequence Models for Packet Stream Analysis and Early Decisions, LCN 2022.

Datasets

Issues

Acknowledgements, Usage & License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages