Skip to content

Commit

Permalink
use bloom filters for dictionaries (close #115)
Browse files Browse the repository at this point in the history
  • Loading branch information
dimus committed Mar 7, 2022
1 parent 27035b0 commit 3ab06f1
Show file tree
Hide file tree
Showing 18 changed files with 227 additions and 23 deletions.
11 changes: 9 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ go 1.17
require (
github.com/abadojack/whatlanggo v1.0.1
github.com/aclements/perflock v0.0.0-20180319185109-8402f33a418d
github.com/devopsfaith/bloomfilter v1.4.0
github.com/gnames/bayes v0.4.0
github.com/gnames/gndoc v0.3.1
github.com/gnames/gner v0.1.4
Expand All @@ -16,6 +17,7 @@ require (
github.com/labstack/echo/v4 v4.6.3
github.com/maxbrunsfeld/counterfeiter/v6 v6.4.1
github.com/rendon/testcli v1.0.0
github.com/rs/zerolog v1.26.1
github.com/spf13/cobra v1.3.0
github.com/spf13/viper v1.10.1
github.com/stretchr/testify v1.7.0
Expand All @@ -41,13 +43,18 @@ require (
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/pelletier/go-toml v1.9.4 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/rs/zerolog v1.26.1 // indirect
github.com/sirupsen/logrus v1.8.1 // indirect
github.com/spf13/afero v1.8.1 // indirect
github.com/spf13/cast v1.4.1 // indirect
github.com/spf13/jwalterweatherman v1.1.0 // indirect
github.com/spf13/pflag v1.0.5 // indirect
github.com/subosito/gotenv v1.2.0 // indirect
github.com/tmthrgd/atomics v0.0.0-20180217065130-6910de195248 // indirect
github.com/tmthrgd/go-bitset v0.0.0-20180828125936-62ad9ed7ff29 // indirect
github.com/tmthrgd/go-bitwise v0.0.0-20170218093117-01bef038b6bd // indirect
github.com/tmthrgd/go-byte-test v0.0.0-20170223110042-2eb5216b83f7 // indirect
github.com/tmthrgd/go-hex v0.0.0-20180828131331-d1fb3dbb16a1 // indirect
github.com/tmthrgd/go-memset v0.0.0-20180828131805-6f4e59bf1e1d // indirect
github.com/tmthrgd/go-popcount v0.0.0-20180111143836-3918361d3e97 // indirect
github.com/valyala/bytebufferpool v1.0.0 // indirect
github.com/valyala/fasttemplate v1.2.1 // indirect
golang.org/x/crypto v0.0.0-20220214200702-86341886e292 // indirect
Expand Down
189 changes: 168 additions & 21 deletions go.sum

Large diffs are not rendered by default.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
12 changes: 12 additions & 0 deletions tools/bloom/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# bloom script

This script generates bloom filters that can be used as a smaller and faster
substitution to dictionaries. Bloom filters are generated with this script
are then compiled into the binaries, while dictionaries themselves are
discarded from the final binaries.

## Usage

```bash
go run ./...
```
38 changes: 38 additions & 0 deletions tools/bloom/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
package main

import (
"sync"

baseBloomfilter "github.com/devopsfaith/bloomfilter/bloomfilter"
"github.com/rs/zerolog/log"
)

var dataPath = "../../io/dict/data"
var bloomPath = "../../io/dict/bloom"

type filter struct {
name string
path string
filter *baseBloomfilter.Bloomfilter
size int
mux sync.Mutex
}

func main() {
log.Info().Msg("Creating bloom filters")
items := []string{
"bad/uninomials",
"bad/species",
"common/eu",
"ambig/genera",
"ambig/genera_species",
"ambig/species",
"ambig/uninomials",
"good/genera",
"good/species",
"good/uninomials",
}
for _, v := range items {
createFilters(v)
}
}

0 comments on commit 3ab06f1

Please sign in to comment.