Skip to content

Commit

Permalink
New domain tracker (#892)
Browse files Browse the repository at this point in the history
* feat: new domain tracker transformer
  • Loading branch information
dmachard authored Dec 6, 2024
1 parent 0345d75 commit 2af81fc
Show file tree
Hide file tree
Showing 13 changed files with 458 additions and 13 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
<p align="center">
<img src="https://goreportcard.com/badge/github.com/dmachard/go-dns-collector" alt="Go Report"/>
<img src="https://img.shields.io/badge/go%20version-min%201.21-green" alt="Go version"/>
<img src="https://img.shields.io/badge/go%20tests-513-green" alt="Go tests"/>
<img src="https://img.shields.io/badge/go%20tests-516-green" alt="Go tests"/>
<img src="https://img.shields.io/badge/go%20bench-21-green" alt="Go bench"/>
<img src="https://img.shields.io/badge/go%20lines-32126-green" alt="Go lines"/>
<img src="https://img.shields.io/badge/go%20lines-32515-green" alt="Go lines"/>
</p>

<p align="center">
Expand Down Expand Up @@ -76,6 +76,7 @@

- **[Transformers](./docs/transformers.md)**

- Detect [Newly Observed Domains](docs/transformers/transform_newdomaintracker.md)
- [Rewrite](docs/transformers/transform_rewrite.md) DNS messages or custom [Relabeling](docs/transformers/transform_relabeling.md) for JSON output
- Add additionnal [Tags](docs/transformers/transform_atags.md) in DNS messages
- Traffic [Filtering](docs/transformers/transform_trafficfiltering.md) and [Reducer](docs/transformers/transform_trafficreducer.md)
Expand Down
33 changes: 33 additions & 0 deletions docs/_examples/use-case-31.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
global:
trace:
verbose: true

pipelines:
- name: tap
dnstap:
listen-ip: 0.0.0.0
listen-port: 6000
transforms:
normalize:
qname-lowercase: true
qname-replace-nonprintable: true
routing-policy:
forward: [ detect_new_domain ]
dropped: [ ]

- name: detect_new_domain
dnsmessage:
matching:
include:
dnstap.operation: "CLIENT_QUERY"
transforms:
new-domain-tracker:
ttl: 3600
cache-size: 1000
routing-policy:
forward: [ console ]
dropped: [ ]

- name: console
stdout:
mode: text
2 changes: 1 addition & 1 deletion docs/collectors/collector_dnsmessage.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,6 @@ Finally a complete full example:
atags:
tags: [ "TXT:apple", "TXT:google" ]
routing-policy:
dropped: [ outputfile ]
forward: [ outputfile ]
default: [ console ]
```
1 change: 1 addition & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ You will find below some examples of configurations to manage your DNS logs.
- [x] [Advanced example with DNSmessage collector](./_examples/use-case-24.yml)
- [x] [How can I log only slow responses and errors?"](./_examples/use-case-25.yml)
- [x] [Filter DNStap messages where the response ip address is 0.0.0.0](./_examples/use-case-26.yml)
- [x] [Detect Newly Observed Domains](./_examples/use-case-31.yml)

- **Capture DNS traffic from incoming DNSTap streams**
- [x] [Read from UNIX DNSTap socket and forward it to TLS stream](./_examples/use-case-5.yml)
Expand Down
3 changes: 2 additions & 1 deletion docs/transformers.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,5 @@ Transformers processing is currently in this order :
| [Traffic Prediction](transformers/transform_trafficprediction.md) | Features to train machine learning models |
| [Additionnal Tags](transformers/transform_atags.md) | Add additionnal tags |
| [JSON relabeling](transformers/transform_relabeling.md) | JSON relabeling to rename or remove keys |
| [DNS message rewrite](transformers/transform_rewrite.md) | Rewrite value for DNS messages structure |
| [DNS message rewrite](transformers/transform_rewrite.md) | Rewrite value for DNS messages structure |
| [Newly Observed Domains](transformers/transform_newdomaintracker.md) | Detect Newly Observed Domains |
72 changes: 72 additions & 0 deletions docs/transformers/transform_newdomaintracker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Transformer: New Domain Tracker Transformer

The **New Domain Tracker** transformer identifies domains that are newly observed within a configurable time window. It is particularly useful for detecting potentially malicious or suspicious domains in DNS traffic, such as those used for phishing, malware, or botnets.

## Features

- **Configurable Time Window**: Define how long a domain is considered new.
- **LRU-based Memory Management**: Ensures efficient memory usage with a finite cache size.
- **Persistence**: Optionally save the domain cache to disk for continuity after restarts.
- **Whitelist Support**: Exclude specific domains or patterns from detection.

## How It Works

1. When a DNS query is processed, the transformer checks if the queried domain exists in its cache.
2. If the domain is not in the cache or has not been seen within the specified TTL, it is marked as newly observed.
3. The domain is added to the cache with a timestamp of when it was last seen.
4. Whitelisted domains are ignored and never marked as new.

## Configuration:

* `ttl` (integer)
> time window in seconds (e.g., 1 hour)
* `cache-size` (integer)
> Maximum number of domains to track
* `white-domains-file` (string)
> path file to domain white list, domains list can be a partial domain name with regexp expression

```yaml
transforms:
new-domain-tracker:
ttl: 3600
cache-size: 100000
white-domains-file: ""
persistence-file: ""
```
## Cache
The New Domain Tracker uses an **LRU Cache** to manage memory consumption efficiently. You can configure the maximum number of domains stored in the cache using the max_size parameter. Once the cache reaches its maximum size, the least recently used entries will be removed to make room for new ones.
The LRU Cache ensures finite memory usage but may cause some domains to be forgotten if the cache size is too small.
## Whitelist
Example of configuration to load a whitelist of domains to ignore.
```yaml
transforms:
new-domain-tracker:
white-domains-file: /tmp/whitelist_domain.txt
```
Example of content for the file `/tmp/whitelist_domain.txt`

```
(mail|wwww).google.com
github.com
```
## Persistence
To ensure continuity across application restarts, you can enable the persistence feature by specifying a file path (persistence).
The transformer will save the domain cache to this file and reload it on startup.
```yaml
transforms:
new-domain-tracker:
persistence-file: /tmp/nod-state.json
```
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ require (
github.com/google/uuid v1.6.0
github.com/grafana/dskit v0.0.0-20240905221822-931a021fb06b
github.com/grafana/loki/v3 v3.2.1
github.com/hashicorp/golang-lru v0.6.0
github.com/hashicorp/golang-lru/v2 v2.0.7
github.com/hpcloud/tail v1.0.0
github.com/influxdata/influxdb-client-go v1.4.0
Expand Down Expand Up @@ -92,7 +93,6 @@ require (
github.com/hashicorp/go-rootcerts v1.0.2 // indirect
github.com/hashicorp/go-sockaddr v1.0.6 // indirect
github.com/hashicorp/go-uuid v1.0.3 // indirect
github.com/hashicorp/golang-lru v0.6.0 // indirect
github.com/hashicorp/memberlist v0.5.0 // indirect
github.com/hashicorp/serf v0.10.1 // indirect
github.com/huandu/xstrings v1.3.3 // indirect
Expand Down
7 changes: 7 additions & 0 deletions pkgconfig/transformers.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,13 @@ type ConfigTransformers struct {
Enable bool `yaml:"enable" default:"false"`
Identifiers map[string]interface{} `yaml:"identifiers,flow"`
} `yaml:"rewrite"`
NewDomainTracker struct {
Enable bool `yaml:"enable" default:"false"`
TTL int `yaml:"ttl" default:"3600"`
CacheSize int `yaml:"cache-size" default:"100000"`
WhiteDomainsFile string `yaml:"white-domains-file" default:""`
PersistenceFile string `yaml:"persistence-file" default:""`
} `yaml:"new-domain-tracker"`
}

func (c *ConfigTransformers) SetDefault() {
Expand Down
2 changes: 2 additions & 0 deletions tests/testsdata/newdomain_whitelist_regex.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.*\.google\.com
github\.com
204 changes: 204 additions & 0 deletions transformers/newdomaintracker.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
package transformers

import (
"bufio"
"encoding/json"
"errors"
"fmt"
"os"
"regexp"
"strings"
"time"

"github.com/dmachard/go-dnscollector/dnsutils"
"github.com/dmachard/go-dnscollector/pkgconfig"
"github.com/dmachard/go-logger"
"github.com/hashicorp/golang-lru/v2/expirable"
)

type NewDomainTracker struct {
ttl time.Duration // Time window to consider a domain as "new"
cache *expirable.LRU[string, struct{}] // Expirable LRU Cache
whitelist map[string]*regexp.Regexp // Whitelisted domains
persistencePath string
logInfo func(msg string, v ...interface{})
logError func(msg string, v ...interface{})
}

func NewNewDomainTracker(ttl time.Duration, maxSize int, whitelist map[string]*regexp.Regexp, persistencePath string, logInfo, logError func(msg string, v ...interface{})) (*NewDomainTracker, error) {

if ttl <= 0 {
return nil, fmt.Errorf("invalid TTL value: %v", ttl)
}

cache := expirable.NewLRU[string, struct{}](maxSize, nil, ttl)

tracker := &NewDomainTracker{
ttl: ttl,
cache: cache,
whitelist: whitelist,
persistencePath: persistencePath,
logInfo: logInfo,
logError: logError,
}
// Load cache state from disk if persistence is enabled
if persistencePath != "" {
if err := tracker.loadCacheFromDisk(); err != nil {
return nil, fmt.Errorf("failed to load cache state: %w", err)
}
}

return tracker, nil
}

func (ndt *NewDomainTracker) isWhitelisted(domain string) bool {
for _, d := range ndt.whitelist {
if d.MatchString(domain) {
return true
}
}
return false
}

func (ndt *NewDomainTracker) IsNewDomain(domain string) bool {
// Check if the domain is whitelisted
if ndt.isWhitelisted(domain) {
return false
}

// Check if the domain exists in the cache
if _, exists := ndt.cache.Get(domain); exists {
// Domain was recently seen, not new
return false
}

// Otherwise, mark the domain as new
ndt.cache.Add(domain, struct{}{})
return true
}

func (ndt *NewDomainTracker) SaveCacheToDisk() error {
keys := ndt.cache.Keys()
data, err := json.Marshal(keys)
if err != nil {
return err
}

return os.WriteFile(ndt.persistencePath, data, 0644)
}

// loadCacheFromDisk loads the cache state from a file
func (ndt *NewDomainTracker) loadCacheFromDisk() error {
if ndt.persistencePath == "" {
return errors.New("persistence filepath not set")
}

data, err := os.ReadFile(ndt.persistencePath)
if err != nil {
if os.IsNotExist(err) {
return nil // File does not exist, no previous state to load
}
return err
}

var keys []string
if err := json.Unmarshal(data, &keys); err != nil {
return err
}

for _, key := range keys {
ndt.cache.Add(key, struct{}{})
}

return nil
}

// NewDomainTransform is the Transformer for DNS messages
type NewDomainTrackerTransform struct {
GenericTransformer
domainTracker *NewDomainTracker
listDomainsRegex map[string]*regexp.Regexp
}

// NewNewDomainTransform creates a new instance of the transformer
func NewNewDomainTrackerTransform(config *pkgconfig.ConfigTransformers, logger *logger.Logger, name string, instance int, nextWorkers []chan dnsutils.DNSMessage) *NewDomainTrackerTransform {
t := &NewDomainTrackerTransform{GenericTransformer: NewTransformer(config, logger, "new-domain-tracker", name, instance, nextWorkers)}
t.listDomainsRegex = make(map[string]*regexp.Regexp)
return t
}

// ReloadConfig reloads the configuration
func (t *NewDomainTrackerTransform) ReloadConfig(config *pkgconfig.ConfigTransformers) {
t.GenericTransformer.ReloadConfig(config)
ttl := time.Duration(config.NewDomainTracker.TTL) * time.Second
t.domainTracker.ttl = ttl
t.LogInfo("new-domain-transformer configuration reloaded")
}

func (t *NewDomainTrackerTransform) GetTransforms() ([]Subtransform, error) {
subtransforms := []Subtransform{}
if t.config.NewDomainTracker.Enable {
// init whitelist
if err := t.LoadWhiteDomainsList(); err != nil {
return nil, err
}

// Initialize the domain tracker
ttl := time.Duration(t.config.NewDomainTracker.TTL) * time.Second
maxSize := t.config.NewDomainTracker.CacheSize
tracker, err := NewNewDomainTracker(ttl, maxSize, t.listDomainsRegex, t.config.NewDomainTracker.PersistenceFile, t.LogInfo, t.LogError)
if err != nil {
return nil, err
}
t.domainTracker = tracker

subtransforms = append(subtransforms, Subtransform{name: "new-domain-tracker:detect", processFunc: t.trackNewDomain})
}
return subtransforms, nil
}

func (t *NewDomainTrackerTransform) LoadWhiteDomainsList() error {
// before to start, reset all maps
for key := range t.listDomainsRegex {
delete(t.listDomainsRegex, key)
}

if len(t.config.NewDomainTracker.WhiteDomainsFile) > 0 {
file, err := os.Open(t.config.NewDomainTracker.WhiteDomainsFile)
if err != nil {
return fmt.Errorf("unable to open regex list file: %w", err)
} else {

scanner := bufio.NewScanner(file)
for scanner.Scan() {
domain := strings.ToLower(scanner.Text())
t.listDomainsRegex[domain] = regexp.MustCompile(domain)
}
t.LogInfo("loaded with %d domains in the whitelist", len(t.listDomainsRegex))
}
}
return nil
}

// Process processes DNS messages and detects newly observed domains
func (t *NewDomainTrackerTransform) trackNewDomain(dm *dnsutils.DNSMessage) (int, error) {
// Log a warning if the cache is full (before adding the new domain)
if t.domainTracker.cache.Len() == t.config.NewDomainTracker.CacheSize {
return ReturnError, fmt.Errorf("LRU cache is full. Consider increasing cache-size to avoid frequent evictions")
}

// Check if the domain is newly observed
if t.domainTracker.IsNewDomain(dm.DNS.Qname) {
return ReturnKeep, nil
}
return ReturnDrop, nil
}

func (t *NewDomainTrackerTransform) Reset() {
if len(t.domainTracker.persistencePath) != 0 {
if err := t.domainTracker.SaveCacheToDisk(); err != nil {
t.LogError("failed to save cache state: %v", err)
}
t.LogInfo("cache content saved on disk with success")
}
}
Loading

0 comments on commit 2af81fc

Please sign in to comment.