Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

significant overhead while running the FST due to heap allocations #15

Open
thelink2012 opened this issue Jul 28, 2023 · 3 comments · May be fixed by #16
Open

significant overhead while running the FST due to heap allocations #15

thelink2012 opened this issue Jul 28, 2023 · 3 comments · May be fixed by #16

Comments

@thelink2012
Copy link

Hey, thanks for building vellum, it's awesome!

I have a service which uses vellum's FST for searching needles in a haystack, and while profiling it, I found significant overhead at the calls to IsMatchWithVal and AcceptWithVal, both of which are linked to decoderV1.stateAt -> runtime.newobject.

I took at look at the code and found it's due to stateAt heap allocating a fstStateV1 at every call to IsMatchWithVal and AcceptWithVal. In other words, heap allocation occurs for every character in the string being searched on.

I hacked a patch (see next comment) and found the service to reduce its response time by half. So I guess this is significant enough to discuss upstream.

@thelink2012
Copy link
Author

thelink2012 commented Jul 28, 2023

As follows is the patch I mentioned. It bypasses the multi-version decoder/state abstraction, but it makes the problem clear:

diff --git a/decoder_v1.go b/decoder_v1.go
index d56e61d..5befe50 100644
--- a/decoder_v1.go
+++ b/decoder_v1.go
@@ -69,6 +69,15 @@ func (d *decoderV1) stateAt(addr int, prealloc fstState) (fstState, error) {
 	return state, nil
 }
 
+func (d *decoderV1) stateAtNoAlloc(addr int) (fstStateV1, error) {
+	var state fstStateV1
+	err := state.at(d.data, addr)
+	if err != nil {
+		return state, err
+	}
+	return state, nil
+}
+
 type fstStateV1 struct {
 	data     []byte
 	top      int
diff --git a/encoding.go b/encoding.go
index 988d486..6418529 100644
--- a/encoding.go
+++ b/encoding.go
@@ -50,6 +50,7 @@ type decoder interface {
 	getRoot() int
 	getLen() int
 	stateAt(addr int, prealloc fstState) (fstState, error)
+	stateAtNoAlloc(addr int) (fstStateV1, error)
 }
 
 func loadDecoder(ver int, data []byte) (decoder, error) {
diff --git a/fst.go b/fst.go
index 3140042..ff2300d 100644
--- a/fst.go
+++ b/fst.go
@@ -160,7 +160,7 @@ func (f *FST) Accept(addr int, b byte) int {
 // IsMatchWithVal returns if this state is a matching state in this Automaton
 // and also returns the final output value for this state
 func (f *FST) IsMatchWithVal(addr int) (bool, uint64) {
-	s, err := f.decoder.stateAt(addr, nil)
+	s, err := f.decoder.stateAtNoAlloc(addr)
 	if err != nil {
 		return false, 0
 	}
@@ -170,7 +170,7 @@ func (f *FST) IsMatchWithVal(addr int) (bool, uint64) {
 // AcceptWithVal returns the next state for this Automaton on input of byte b
 // and also returns the output value for the transition
 func (f *FST) AcceptWithVal(addr int, b byte) (int, uint64) {
-	s, err := f.decoder.stateAt(addr, nil)
+	s, err := f.decoder.stateAtNoAlloc(addr)
 	if err != nil {
 		return noneAddr, 0
 	}
@@ -178,6 +178,35 @@ func (f *FST) AcceptWithVal(addr int, b byte) (int, uint64) {
 	return next, output
 }

What do you think we could do about it without going around the architecture?

My first thought was to add a FSTRunner that keeps a pre-allocated fstState, but it probably means duplicating every method in FST to be consistent. Something along the lines of:

+type FSTRunner struct {
+	fst *FST
+	prealloc fstState
+}
+
+func (f *FST) Runner() (*FSTRunner, error) {
+	return &FSTRunner { f, nil }, nil
+}
+
+func (r *FSTRunner) AcceptWithVal(addr int, b byte) (int, uint64) {
+	s, err := r.fst.decoder.stateAt(addr, r.prealloc)
+	r.prealloc = s
+	if err != nil {
+		return noneAddr, 0
+	}
+	_, next, output := s.TransitionFor(b)
+	return next, output
+}
+
+func (r *FSTRunner) IsMatchWithVal(addr int) (bool, uint64) {
+	s, err := r.fst.decoder.stateAt(addr, r.prealloc)
+	r.prealloc = s
+	if err != nil {
+		return false, 0
+	}
+	return s.Final(), s.FinalOutput()
+}

I've tested it and performance-wise it actually works a bit better than the first solution.

@abhinavdangeti
Copy link
Member

Thanks @thelink2012 .

I think what'd be better here is for you to raise this diff as a pull request instead. The team will get to reviewing it at the soonest possible.

@thelink2012 thelink2012 linked a pull request Jul 29, 2023 that will close this issue
@thelink2012
Copy link
Author

Sure @abhinavdangeti. After looking a bit more into the code I noticed Reader, that does exactly what the above FSTRunner would do. In PR #16 I just added more methods to that type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants