Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: cache stack hash to node in stacktreeTree #3827

Closed
wants to merge 1 commit into from

Conversation

airbnb-gps
Copy link

See the inline comment for more context.

Profiling against my workload showed the compactor spending significant CPU time in stacktraceTree.insert().

Staring at the code, I realized it was performing a full stack walk to populate/resolve the leaf-most tree node representing the stack.

Since many stacks will be identical, we can benefit from caching the mapping of the unique stack back to the node index.

The performance speedups are greater if there are many repeated stacks and when stacks are deep. My data set consists of a lot of JVM profiles and Java is infamous for having very deep stacks. So this yields a considerable speedup for me.

See the inline comment for more context.

Profiling against my workload showed the compactor spending significant
CPU time in `stacktraceTree.insert()`.

Staring at the code, I realized it was performing a full stack walk to
populate/resolve the leaf-most tree node representing the stack.

Since many stacks will be identical, we can benefit from caching
the mapping of the unique stack back to the node index.

The performance speedups are greater if there are many repeated stacks
and when stacks are deep. My data set consists of a lot of JVM profiles
and Java is infamous for having very deep stacks. So this yields a
considerable speedup for me.
@airbnb-gps airbnb-gps requested a review from a team as a code owner January 8, 2025 22:29
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@korniltsev
Copy link
Collaborator

can we add a benchmark to measure the difference?

@korniltsev
Copy link
Collaborator

Btw, there should not be fully identical samples in the pprof converted from JFR due to this https://github.com/grafana/jfr-parser/blob/a8d22a1cd731f0ef8b48f84c5bab58532a9af541/pprof/pprof.go#L66

How do you ingest your data? do you ingest jfr to pyroscope?

@airbnb-gps
Copy link
Author

For single pprof profiles, yes, there should be 1 entry for each distinct stack (assuming the pprof generator deduplicated, which the JFR converting code does).

But for the compactor, it will see multiple occurrences of the same stack from different source blocks, no?

(I found this hot code when looking at compactor performance.)

Comment on lines +52 to +63
// Many stacks are repeating. So we benefit from an optimization that
// can quickly map the input sequence back to a node without
// walking the tree. We simply cache a map of stack digest back to the
// node index. If there's a hit, we avoid a stack walk to resolve
// the leaf node. If not, we pay a penalty for computing the hash
// and performing a map lookup.
digest := hashLocations(refs)
existing, ok := t.hashedStacks[digest]
if ok {
return existing
}

Copy link
Collaborator

@kolesnikovae kolesnikovae Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid approach, and your reasoning is correct, if I understand it correctly.

However, we already have a caching layer on top of trees (so-called chunks), and we never access a tree directly, bypassing the cache in the write path (valid for both ingesters and compactors):

func (p *stacktraces) append(dst []uint32, s []*schemav1.Stacktrace) {
if len(s) == 0 {
return
}
var (
id uint32
found bool
misses int
)
p.m.RLock()
for i, x := range s {
if dst[i], found = p.hashToIdx[hashLocations(x.LocationIDs)]; !found {
misses++
}
}
p.m.RUnlock()
if misses == 0 {
return
}
// NOTE(kolesnikovae):
//
// Maybe we don't need this map at all: tree insertion might be
// done in a thread safe fashion, and optimised to the extent
// that its performance is comparable with:
// map_read + r_(un)lock + map_overhead +
// miss_rate * (map_write + w_(un)lock)
//
// Instead of inserting stacks one by one, it is better to
// build a tree, and merge it to the existing one.
p.m.Lock()
defer p.m.Unlock()
for i, v := range dst[:len(s)] {
if v != 0 {
// Already resolved. ID 0 is reserved
// as it is the tree root.
continue
}
x := s[i].LocationIDs
// Tree insertion is idempotent,
// we don't need to check the map.
id = p.tree.insert(x)
h := hashLocations(x)
p.hashToIdx[h] = id
dst[i] = id
}
}

(if we revive chunking, the cache is to be shared across chunks)

And it works in the same way: locations hash => stack trace ID (leaf node index) lookup:

  • There is a chance of hash collisions, and we accept this risk.
  • We eliminate an extra map lookup by relying on insertion idempotence: since the stack traces we want to insert might be added between mutex locks, we may write them again. Given that there are very few such writes, this approach is more efficient than checking each stack trace after acquiring the write lock.

Therefore, I'm wondering if we benefit from adding a cache at the tree level. I can say for sure that it will increase memory consumption significantly, while I can't see how it improves performance.

This is how it looks in a loaded cluster (half of 10G link of ingress, samples over 6 hours):

In ingesters:
image

In compactors:
image
Note that there's a certain issue with the compaction process, which makes the cache less helpful. We're aware of it and we have a solution to the problem – something we're actively working on.

@kolesnikovae
Copy link
Collaborator

Thank you for the contribution @airbnb-gps! I believe the proposed optimisation has already been implemented – please check my comment

@airbnb-gps
Copy link
Author

I agree with your assessment.

I still don't have a great way to reproduce my performance tests so my methodology was flawed, leading to faulty conclusions on my part. Sorry for the noise!

@airbnb-gps airbnb-gps closed this Jan 9, 2025
@kolesnikovae
Copy link
Collaborator

No worries at all! Thank you for a good PR, and I hope you can contribute something else another time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants