-
Notifications
You must be signed in to change notification settings - Fork 12
/
README
105 lines (88 loc) · 4.91 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
/*
* Polygraph (release 0.1)
* Signature generation algorithms for polymorphic worms
*
* Copyright (c) 2004-2005, Intel Corporation
* All Rights Reserved
*
* This software is distributed under the terms of the Eclipse Public
* License, Version 1.0 which can be found in the file named LICENSE.
* ANY USE, REPRODUCTION OR DISTRIBUTION OF THIS SOFTWARE CONSTITUTES
* RECIPIENT'S ACCEPTANCE OF THIS AGREEMENT
*/
This software is a test harness for the algorithms and techniques described in:
``Polygraph: Automatically generating signatures for polymorphic worms''
Citation:
James Newsome, Brad Karp, and Dawn Song.
Polygraph: Automatically generating signatures for polymorphic worms.
In Proceedings of the IEEE Symposium on Security and Privacy, May 2005.
The included software can be used to verify our results, and experiment
with variations of our techniques. It is NOT intended to be used as a
production system for generating worm signatures. We are aware of several
attacks against these signature generation algorithms. Some of these attacks
are described in the Polygraph paper itself, others in:
James Newsome, Brad Karp, and Dawn Song.
Paragraph: Thwarting Signature Learning by Training Maliciously.
In Proceedings of the International Symposium on Recent Advances in
Intrusion Detection (RAID), September 2006.
Other attacks are described in:
Roberto Perdisci, David Dagon, Wenke Lee, Prahlad Fogla, and Monirul Sharif.
Misleading Worm Signature Generators Using Deliberate Noise Injection.
In Proceedings of the IEEE Symposium of the IEEE Symposium on Security
and Privacy, May 2006.
This software has been tested on Redhat 9.3, using Python 2.3.4. We are
unable to provide help on running this software in other environments.
This software was written by James Newsome (jnewsome@ece.cmu.edu), in
collaboration with Brad Karp (brad.n.karp@intel.com) and
Dawn Song (dawnsong@cmu.edu).
***************************************************************************
Overview
***************************************************************************
This package contains the polygraph code, in the 'polygraph' directory,
and scripts to run the experiments in our paper, in the 'experiments' directory.
For instructions on installing the polygraph code and running the experiments,
see README.install.
The source code for polygraph is organized as follows:
bin # support scripts
pysary # python wrapper to sary suffix array library
pysubseq # Smith-Waterman sequence alignment
sig_gen # signature generation
sigprob # false positive rate estimation
sutil # efficient string processing
trace_crunching # pcap trace processing
util # miscellaneous
worm_gen # generators for (synthetic) polymorphic worms
***************************************************************************
Known Issues
***************************************************************************
The script for reconstructing the network streams from pcap trace files
(polygraph/bin/reconstruct_streams) can process multiple traces, but does
not reconstruct streams that span over multiple traces. That is, a stream
that starts in one pcap file and finishes in the next pcap file will be
assembled into two separate streams.
In our experiments, noise workloads are generates using samples from the
evaluation traces. That is, the streams that are used as noise in the
suspicious pool are also used in the evaluation to measure false positives,
possibly biasing our results to report slightly higher false positive rates.
***************************************************************************
Suggested Extensions
***************************************************************************
For token subsequence signatures, we have experimented with keeping track of
whether tokens always appear at a particular offset from each-other, and
incorporating that information into the signature. This option
(use_fixed_gaps in sig_gen.lcseq_tree.LCSeqTree) is currently broken, but
would be straight-forward to get working again. This strategy can make
the generated signatures more specific (reducing false positives), at the
risk of making them *too* specific (increasing false negatives).
***************************************************************************
Performance Improvements
***************************************************************************
Implement linear-time algorithm for finding what tokens are present in a
string. Suffix tree algorithm is implemented in polygraph/sutil/sutilc.c
(stree_find_tokens), but it is not currently being used, because naive
implementation is faster in practice. See Gusfield for how to achieve
linear time bound in the suffix tree implementation.
Implement linear-time algorithm for finding common substrings, as described
in Gusfield, chapter 9. Current implementation is O(Kn) time, where K is the
number of strings, and n is the total length of the strings. This doesn't
seem to be a major bottleneck though.