Introduction

Terminator is a library written in C++ for spam filtering, like the famous SpamBayes and OSBF-Lua. It can be embedded into other spam filtering software or service as a machine learning module. The advantages are

Very high precision and recall, best results on all public spam filtering corpus.
It is fast and can only consume several MB of memory.
Do not need to tune hyper-parameters

Terminator can be used in any other binary text classification problems, especially those that need an adaptive model for online learning.

Terminator is not a complete E2E spam filtering solution. Instead, it focuses on the machine learning part without blocklist/allowlist or DKIM. My paper, "An Adaptive Fusion Algorithm for Spam Detection](http://csse.szu.edu.cn/staff/panwk/publications/Journal-IEEE-IS-14-AFSD.pdf)" described the implementation in detail.

(Update on Jan 2023. The work of this library dates back to around 2010. It consistently got SOTA results on most online learning email filtering corpus, TREC, CEAS, and a private dataset from NetEase. I have not followed this area for a long time, so I may miss some latest research. For batch learning context, I think the newest Transformer based LLMs have great potential.)

Implementation

Terminator used a fusion model, which includes eight machine learning algorithms to boost spam filtering performance. The algorithms are listed below according to papers

We used a novel adaptive model fusion technique. The weight of every single model is learned during the online learning process.

Installation & Usage

Step 1, Install Dependencies

The only dependency is kyotocabinet](http://fallabs.com/kyotocabinet/) for persistence, which must be installed first.

Step 2, Install Terminator and Compile

clone https://github.com/freiz/terminator.git
cd terminator
make

You can change the compiler suite in Makefile; the output is a static linkable lib.

Step 3, Write an Example

#include "terminator.h"

// The first parameter is the path of database file
// The second parameter is the main memory used as cache, the unit is Byte, so 5 << 20 is around 5MB as cache
Terminator* classifier = new Terminator("terminator.kch", 5 << 20);

// Now you can write the main logic
// There are two public api, Train and Predict

// [Predict] pass in the email content and return a score ranging from 0 (100% ham) to 1 (100% spam)
// You can change the threshold to make the decision on your own 
double score = classifier->Predict(std::string email_content);

// [Train] pass in the email content and a flag
// If spam train, the flag set to true or false
classifier->Train(std::string email_content, boolean is_spam)

Step 4, Play with Demo (Optional)

make run-demo

It will run a demo application to simulate spam filtering using the SpamAssassin corpus; you can also put another dataset (such as ceas08) under demo/corpus to check the experiment result.

Step 5, Compile and Link Your bits

Do not forget to link against the library kyotocabinet.

Experiment Result

Here, I only quote samples of results on public corpus Trec05-p1

Competitor	(1-ROCA)%, the smaller the better
bogoﬁlter	0.048
spamprobe	0.059
spamasassin	0.059
terminator	0.0055

The paper "An Adaptive Fusion Algorithm for Spam Detection" contains a complete set of experiment results.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
demo		demo
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Implementation

Installation & Usage

Step 1, Install Dependencies

Step 2, Install Terminator and Compile

Step 3, Write an Example

Step 4, Play with Demo (Optional)

Step 5, Compile and Link Your bits

Experiment Result

About

Languages

License

freiz/terminator

Folders and files

Latest commit

History

Repository files navigation

Introduction

Implementation

Installation & Usage

Step 1, Install Dependencies

Step 2, Install Terminator and Compile

Step 3, Write an Example

Step 4, Play with Demo (Optional)

Step 5, Compile and Link Your bits

Experiment Result

About

Topics

Resources

License

Stars

Watchers

Forks

Languages