Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logic to access strings used by remote config bpf probe #2983

Merged
merged 9 commits into from
Dec 3, 2024

Conversation

grantseltzer
Copy link
Member

What does this PR do?

This adds logic to the remote-config callback logic, which is used by the Go dynamic instrumentation product, to access the probe definition strings before they are passed to the hooked callback. This is to ensure that the strings are in memory. If they are not in memory, or not in the same thread page table, page faults can occur which causes bpf_probe_read() calls to fail. This is in a critical path for Go DI so this change will help mitigate that as much as possible.

Motivation

Prevent page faults from occurring when passing remote config probe definitions into a callback which is hooked by Go DI.

Reviewer's Checklist

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.
  • Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Unsure? Have a question? Request a review!

@grantseltzer grantseltzer requested a review from a team as a code owner November 20, 2024 17:02
@grantseltzer grantseltzer force-pushed the grantseltzer/DEBUG-3142-keep-rc-probe-defs-in-memory branch from 301778d to 7f3379a Compare November 20, 2024 17:10
@pr-commenter
Copy link

pr-commenter bot commented Nov 20, 2024

Benchmarks

Benchmark execution time: 2024-12-03 09:12:13

Comparing candidate commit a81c6ba in PR branch grantseltzer/DEBUG-3142-keep-rc-probe-defs-in-memory with baseline commit 5dcc739 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 59 metrics, 0 unstable metrics.

passProbeConfiguration(v.runtimeID, v.configPath, v.configContent)
}
diRCState.Unlock()
}
}()
}

func accessStringsToMitigatePageFault(strs ...string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we guarantee the Go compiler won't optimize this function away, since it has no effect?

Copy link
Member Author

@grantseltzer grantseltzer Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less worried about the function itself being optimized away (it doesn't seem to do it according to the disassembler), and more about the actual lines that access the variables. I observed the behavior this PR is trying to mitigate over a large amount of time across different services (though admittedly only on 1 environment), and in fact, we still have occasional misses (seem to typically be followed by hits).

As a result I changed the access to writing the strings to /dev/null and in fact this removed any amount of misses in my manual testing. It also simplifies the code. So i've pushed this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok i've further updated this to just use io.Discard

Copy link
Contributor

@eliottness eliottness Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could runtime.KeepAlive be enough here? 🤔
The doc is a little bit wierd but it's goal is only to force the compiler to not optimize anything away because it can't know at this stage what is going to happen on the other side, which is the runtime package here

Copy link
Member Author

@grantseltzer grantseltzer Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eliottness I wasn't aware of this function. Looking at the code it just performs a simple println() on the passed value which is what we're doing except we won't print the config to our customers service logs.

The theory we're accepting is that the string addresses need to be in the core-specific TLB before the hooked function is called. Printing to stdout, or with io.Discard, achieves this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of this function. Looking at the code it just performs a simple println() on the passed value which is what we're doing except we won't print the config to our customers service logs.

@grantseltzer I probably badly explained what I was suggesting here: Yes technically there is a call to println in there but it's never reach because the if statement end up being always false. Except the compiler cannot know if the variable will stay false so it cannot optimize the if statement away. In the end, it ends up running just a few instructions where as io.Discard([]byte(str)) does a copy of the string and allocates it on the heap to send it to the io package. In the PR description you where talking about the "critical path for Go DI" so I thought you would welcome a performance boost but now that I think of it it's probably unnecessary. Feel free to take or discard (pun intended) my comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eliottness Ahh, I see. Let me test to make sure this works at resolving this issue, and would rather use this instead. But yea performance isn't the focus of concern considering it gets called once every 5 seconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work, the purpose of the print to io.Discard is that it triggers the actual memory to be paged in. If the println call is skipped, that won't happen.

This adds logic to the remote-config callback logic, which is used by
the Go dynamic instrumentation product, to access the probe definition
strings before they are passed to the hooked callback. This is to
ensure that the strings are in memory. If they are not in memory, or
not in the same thread page table, page faults can occur which causes
bpf_probe_read() calls to fail. This is in a critical path for Go DI
so this change will help mitigate that as much as possible.

Signed-off-by: grantseltzer <grantseltzer@gmail.com>
Signed-off-by: grantseltzer <grantseltzer@gmail.com>
Signed-off-by: grantseltzer <grantseltzer@gmail.com>
Signed-off-by: grantseltzer <grantseltzer@gmail.com>
Signed-off-by: grantseltzer <grantseltzer@gmail.com>
Signed-off-by: grantseltzer <grantseltzer@gmail.com>
Signed-off-by: grantseltzer <grantseltzer@gmail.com>
Signed-off-by: grantseltzer <grantseltzer@gmail.com>
@grantseltzer grantseltzer force-pushed the grantseltzer/DEBUG-3142-keep-rc-probe-defs-in-memory branch from b14c870 to 12b145c Compare November 27, 2024 18:35
Copy link
Member

@darccio darccio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for APM

@grantseltzer grantseltzer removed the request for review from felixge November 28, 2024 18:50
@grantseltzer
Copy link
Member Author

/merge

@dd-devflow
Copy link

dd-devflow bot commented Nov 28, 2024

Devflow running: /merge

View all feedbacks in Devflow UI.


2024-11-28 18:50:57 UTC ❌ MergeQueue

You are not allowed to use the merge queue towards main. User is not part of the authorized users or teams.
The authorized teams (including their child teams) are: apm-ecosystems, apm-go, asm-go, profiling-go.
The authorized users are: dbenamydd.

@grantseltzer
Copy link
Member Author

@darccio Can I ask you to merge?

@darccio darccio merged commit 076462a into main Dec 3, 2024
160 checks passed
@darccio darccio deleted the grantseltzer/DEBUG-3142-keep-rc-probe-defs-in-memory branch December 3, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants