perf: cache stack hash to node in stacktreeTree #3827

airbnb-gps · 2025-01-08T22:29:21Z

See the inline comment for more context.

Profiling against my workload showed the compactor spending significant CPU time in stacktraceTree.insert().

Staring at the code, I realized it was performing a full stack walk to populate/resolve the leaf-most tree node representing the stack.

Since many stacks will be identical, we can benefit from caching the mapping of the unique stack back to the node index.

The performance speedups are greater if there are many repeated stacks and when stacks are deep. My data set consists of a lot of JVM profiles and Java is infamous for having very deep stacks. So this yields a considerable speedup for me.

See the inline comment for more context. Profiling against my workload showed the compactor spending significant CPU time in `stacktraceTree.insert()`. Staring at the code, I realized it was performing a full stack walk to populate/resolve the leaf-most tree node representing the stack. Since many stacks will be identical, we can benefit from caching the mapping of the unique stack back to the node index. The performance speedups are greater if there are many repeated stacks and when stacks are deep. My data set consists of a lot of JVM profiles and Java is infamous for having very deep stacks. So this yields a considerable speedup for me.

CLAassistant · 2025-01-08T22:29:28Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

korniltsev · 2025-01-09T02:23:09Z

can we add a benchmark to measure the difference?

korniltsev · 2025-01-09T02:28:35Z

Btw, there should not be fully identical samples in the pprof converted from JFR due to this https://github.com/grafana/jfr-parser/blob/a8d22a1cd731f0ef8b48f84c5bab58532a9af541/pprof/pprof.go#L66

How do you ingest your data? do you ingest jfr to pyroscope?

airbnb-gps · 2025-01-09T03:07:37Z

For single pprof profiles, yes, there should be 1 entry for each distinct stack (assuming the pprof generator deduplicated, which the JFR converting code does).

But for the compactor, it will see multiple occurrences of the same stack from different source blocks, no?

(I found this hot code when looking at compactor performance.)

kolesnikovae · 2025-01-09T04:29:25Z

pkg/phlaredb/symdb/stacktrace_tree.go

+	// Many stacks are repeating. So we benefit from an optimization that
+	// can quickly map the input sequence back to a node without
+	// walking the tree. We simply cache a map of stack digest back to the
+	// node index. If there's a hit, we avoid a stack walk to resolve
+	// the leaf node. If not, we pay a penalty for computing the hash
+	// and performing a map lookup.
+	digest := hashLocations(refs)
+	existing, ok := t.hashedStacks[digest]
+	if ok {
+		return existing
+	}
+


This is a valid approach, and your reasoning is correct, if I understand it correctly.

However, we already have a caching layer on top of trees (so-called chunks), and we never access a tree directly, bypassing the cache in the write path (valid for both ingesters and compactors):

pyroscope/pkg/phlaredb/symdb/partition_memory.go

Lines 63 to 113 in 26950d5

func (p *stacktraces) append(dst []uint32, s []*schemav1.Stacktrace) {

if len(s) == 0 {

return

}

var (

id uint32

found bool

misses int

)

p.m.RLock()

for i, x := range s {

if dst[i], found = p.hashToIdx[hashLocations(x.LocationIDs)]; !found {

misses++

}

}

p.m.RUnlock()

if misses == 0 {

return

}

// NOTE(kolesnikovae):

//

// Maybe we don't need this map at all: tree insertion might be

// done in a thread safe fashion, and optimised to the extent

// that its performance is comparable with:

// map_read + r_(un)lock + map_overhead +

// miss_rate * (map_write + w_(un)lock)

//

// Instead of inserting stacks one by one, it is better to

// build a tree, and merge it to the existing one.

p.m.Lock()

defer p.m.Unlock()

for i, v := range dst[:len(s)] {

if v != 0 {

// Already resolved. ID 0 is reserved

// as it is the tree root.

continue

}

x := s[i].LocationIDs

// Tree insertion is idempotent,

// we don't need to check the map.

id = p.tree.insert(x)

h := hashLocations(x)

p.hashToIdx[h] = id

dst[i] = id

}

}

(if we revive chunking, the cache is to be shared across chunks)

And it works in the same way: locations hash => stack trace ID (leaf node index) lookup:

There is a chance of hash collisions, and we accept this risk.

We eliminate an extra map lookup by relying on insertion idempotence: since the stack traces we want to insert might be added between mutex locks, we may write them again. Given that there are very few such writes, this approach is more efficient than checking each stack trace after acquiring the write lock.

Therefore, I'm wondering if we benefit from adding a cache at the tree level. I can say for sure that it will increase memory consumption significantly, while I can't see how it improves performance.

This is how it looks in a loaded cluster (half of 10G link of ingress, samples over 6 hours):

In ingesters:

In compactors:

Note that there's a certain issue with the compaction process, which makes the cache less helpful. We're aware of it and we have a solution to the problem – something we're actively working on.

kolesnikovae · 2025-01-09T04:32:55Z

Thank you for the contribution @airbnb-gps! I believe the proposed optimisation has already been implemented – please check my comment

airbnb-gps · 2025-01-09T19:36:45Z

I agree with your assessment.

I still don't have a great way to reproduce my performance tests so my methodology was flawed, leading to faulty conclusions on my part. Sorry for the noise!

kolesnikovae · 2025-01-10T02:11:40Z

No worries at all! Thank you for a good PR, and I hope you can contribute something else another time :)

airbnb-gps requested a review from a team as a code owner January 8, 2025 22:29

kolesnikovae reviewed Jan 9, 2025

View reviewed changes

airbnb-gps closed this Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cache stack hash to node in stacktreeTree #3827

perf: cache stack hash to node in stacktreeTree #3827

airbnb-gps commented Jan 8, 2025

CLAassistant commented Jan 8, 2025

korniltsev commented Jan 9, 2025

korniltsev commented Jan 9, 2025

airbnb-gps commented Jan 9, 2025

kolesnikovae Jan 9, 2025 •

edited

Loading

kolesnikovae commented Jan 9, 2025

airbnb-gps commented Jan 9, 2025

kolesnikovae commented Jan 10, 2025

	func (p stacktraces) append(dst []uint32, s []schemav1.Stacktrace) {
	if len(s) == 0 {
	return
	}

	var (
	id uint32
	found bool
	misses int
	)

	p.m.RLock()
	for i, x := range s {
	if dst[i], found = p.hashToIdx[hashLocations(x.LocationIDs)]; !found {
	misses++
	}
	}

	p.m.RUnlock()
	if misses == 0 {
	return
	}

	// NOTE(kolesnikovae):
	//
	// Maybe we don't need this map at all: tree insertion might be
	// done in a thread safe fashion, and optimised to the extent
	// that its performance is comparable with:
	// map_read + r_(un)lock + map_overhead +
	// miss_rate * (map_write + w_(un)lock)
	//
	// Instead of inserting stacks one by one, it is better to
	// build a tree, and merge it to the existing one.

	p.m.Lock()
	defer p.m.Unlock()
	for i, v := range dst[:len(s)] {
	if v != 0 {
	// Already resolved. ID 0 is reserved
	// as it is the tree root.
	continue
	}
	x := s[i].LocationIDs
	// Tree insertion is idempotent,
	// we don't need to check the map.
	id = p.tree.insert(x)
	h := hashLocations(x)
	p.hashToIdx[h] = id
	dst[i] = id
	}
	}

perf: cache stack hash to node in stacktreeTree #3827

perf: cache stack hash to node in stacktreeTree #3827

Conversation

airbnb-gps commented Jan 8, 2025

CLAassistant commented Jan 8, 2025

korniltsev commented Jan 9, 2025

korniltsev commented Jan 9, 2025

airbnb-gps commented Jan 9, 2025

kolesnikovae Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

kolesnikovae commented Jan 9, 2025

airbnb-gps commented Jan 9, 2025

kolesnikovae commented Jan 10, 2025

kolesnikovae Jan 9, 2025 •

edited

Loading