Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logic to access strings used by remote config bpf probe #2983

Merged
merged 9 commits into from
Dec 3, 2024
19 changes: 19 additions & 0 deletions ddtrace/tracer/remote_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ package tracer
import (
"encoding/json"
"fmt"
"io"
"regexp"
"strings"
"sync"
Expand Down Expand Up @@ -298,13 +299,31 @@ func initalizeDynamicInstrumentationRemoteConfigState() {
time.Sleep(time.Second * 5)
diRCState.Lock()
for _, v := range diRCState.state {
accessStringsToMitigatePageFault(v.runtimeID, v.configPath, v.configContent)
passProbeConfiguration(v.runtimeID, v.configPath, v.configContent)
}
diRCState.Unlock()
}
}()
}

// accessStringsToMitigatePageFault iterates over each string to trigger a page fault,
// ensuring it is loaded into RAM or listed in the translation lookaside buffer.
// This is done by writing the string to io.Discard.
//
// This function addresses an issue with the bpf program that hooks the
// `passProbeConfiguration()` function from system-probe. The bpf program fails
// to read strings if a page fault occurs because the `bpf_probe_read()` helper
// disables paging (uprobe bpf programs can't sleep). Consequently, page faults
// cause `bpf_probe_read()` to return an error and not read any data.
// By preloading the strings, we mitigate this issue, enhancing the reliability
// of the Go Dynamic Instrumentation product.
func accessStringsToMitigatePageFault(strs ...string) {
grantseltzer marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we guarantee the Go compiler won't optimize this function away, since it has no effect?

Copy link
Member Author

@grantseltzer grantseltzer Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less worried about the function itself being optimized away (it doesn't seem to do it according to the disassembler), and more about the actual lines that access the variables. I observed the behavior this PR is trying to mitigate over a large amount of time across different services (though admittedly only on 1 environment), and in fact, we still have occasional misses (seem to typically be followed by hits).

As a result I changed the access to writing the strings to /dev/null and in fact this removed any amount of misses in my manual testing. It also simplifies the code. So i've pushed this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok i've further updated this to just use io.Discard

Copy link
Contributor

@eliottness eliottness Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could runtime.KeepAlive be enough here? 🤔
The doc is a little bit wierd but it's goal is only to force the compiler to not optimize anything away because it can't know at this stage what is going to happen on the other side, which is the runtime package here

Copy link
Member Author

@grantseltzer grantseltzer Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eliottness I wasn't aware of this function. Looking at the code it just performs a simple println() on the passed value which is what we're doing except we won't print the config to our customers service logs.

The theory we're accepting is that the string addresses need to be in the core-specific TLB before the hooked function is called. Printing to stdout, or with io.Discard, achieves this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of this function. Looking at the code it just performs a simple println() on the passed value which is what we're doing except we won't print the config to our customers service logs.

@grantseltzer I probably badly explained what I was suggesting here: Yes technically there is a call to println in there but it's never reach because the if statement end up being always false. Except the compiler cannot know if the variable will stay false so it cannot optimize the if statement away. In the end, it ends up running just a few instructions where as io.Discard([]byte(str)) does a copy of the string and allocates it on the heap to send it to the io package. In the PR description you where talking about the "critical path for Go DI" so I thought you would welcome a performance boost but now that I think of it it's probably unnecessary. Feel free to take or discard (pun intended) my comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eliottness Ahh, I see. Let me test to make sure this works at resolving this issue, and would rather use this instead. But yea performance isn't the focus of concern considering it gets called once every 5 seconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work, the purpose of the print to io.Discard is that it triggers the actual memory to be paged in. If the println call is skipped, that won't happen.

for i := range strs {
io.WriteString(io.Discard, strs[i])
}
}

// startRemoteConfig starts the remote config client.
// It registers the APM_TRACING product with a callback,
// and the LIVE_DEBUGGING product without a callback.
Expand Down
Loading