-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate OpenTelemetry tracing in Cody Webview #6100
Conversation
|
37b3804
to
ac6d9c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good start, feedback inline.
Instead of adding lots of low quality spans, let's concentrate on a good chat latency: from hitting submit to when we paint the first token. This is interesting because:
- Core product scenario.
- Has an extension-side metric we can compare and contrast to learn a bit about webview latency specifically.
- Easy, in the sense that it starts and ends in the webview.
- ...but still interesting. These "how long does it take to run postMessage" don't reflect user latency.
5deba34
to
6c5c8c2
Compare
|
||
// Determines if a span group is complete for export(Currently only for chat interactions) | ||
// TODO: Extend with YAML configuration to support all span groups, not just chat interactions. | ||
private isSpanGroupComplete(spanGroup: Set<ReadableSpan>): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO here but this looks okayish for now
1131458
to
8fbcd4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is shaping up very nicely.
The conditions for what state we're in look arduous, any ideas about abstractions there so it is simpler?
This is still not user latency in the sense that HumanMessageEditor onSubmitClick is probably when we could start measuring something. Looks like we have already undergone some state changes by the time we start a span in what you have here?
const { auth } = await currentResolvedConfig() | ||
if (!auth.accessToken) { | ||
logError('TraceSender', 'Cannot send trace data: not authenticated') | ||
throw new Error('Not authenticated') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we require this?
What's the plan for performance around login, which we will probably want to do soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we require this?
const traceUrl = new URL('/-/debug/otlp/v1/traces', auth.serverEndpoint).toString()
const response = await fetch(traceUrl, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
...(auth.accessToken ? { Authorization: `token ${auth.accessToken}` } : {}),
},
body: spanData,
})
When we send the telemetry data to the Otel Collector endpoint it does require the auth token so that's why.
What's the plan for performance around login, which we will probably want to do soon.
I gotta think through this to be honest, I can't tell you much right now unless I explore the code first
vscode/webviews/chat/Transcript.tsx
Outdated
timeToFirstTokenSpan.current = undefined | ||
hasRecordedFirstToken.current = true | ||
|
||
// Also set on parent span for backwards compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backwards compatibility with what? This is a new trace...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it was just a bad LLM comment so I removed it, NOTE: its still a child of the parent trace of chat-interaction.
private agentIDE?: CodyIDE | ||
private extensionAgentVersion?: string | ||
constructor() { | ||
if (!WebviewOpenTelemetryService.instance) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design of this singleton and guards could be better, I think, although these things are always messy...
Why not have the singleton, initialize it, and make it convenient to use. Then the constructor doesn't need conditional side effects...
If people want to new up other ones, is there any harm in that? Why have configure
exit early?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have the singleton, initialize it, and make it convenient to use. Then the constructor doesn't need conditional side effects...
Great question: I actually spent some time with this.
So for this code
Lines 159 to 173 in c497269
const webviewTelemetryService = useMemo(() => { | |
const service = WebviewOpenTelemetryService.getInstance() | |
return service | |
}, []) | |
useEffect(() => { | |
if (config) { | |
webviewTelemetryService.configure({ | |
isTracingEnabled: true, | |
debugVerbose: true, | |
agentIDE: config.clientCapabilities.agentIDE, | |
extensionAgentVersion: config.clientCapabilities.agentExtensionVersion, | |
}) | |
} | |
}, [config, webviewTelemetryService]) | |
I had to wait for config to fully load and only then do i get the value of config.clientCapabilities.agentIDE
and config.clientCapabilities.agentExtensionVersion
because they are needed to figure out which OS and IDe the request is coming from: so I had to make use of useEffect for that to load fully and only then I could do the configuration(although Ideally I would have just done the initialization once and be done with it). This seemed like a good enough way to pass config values in the configuration.
Why have configure exit early?
Once the configuration is loaded and set up, I want it to be globally accessible for the webview. This approach ensures that subsequent changes—such as those triggered by useEffect—don’t introduce unnecessary complications that make debugging issues (like traces not rendering correctly) more difficult. By enforcing a singleton pattern and using a fixed configuration after initialization, we can guarantee the stability of the trace pipeline. This way, you only need to focus on potential errors in trace handling, rather than worrying about misconfigurations or unstable configurations.
} | ||
} | ||
|
||
public dispose(): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See you did guard some methods, but not this one. Is it simpler and better to just leave the global alone in these instance methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out! I've added a guard clause to the dispose method to ensure it only affects the global instance when appropriate. This should help maintain simplicity and prevent any unintended modifications to the global state.
Let me know if you have any other suggestions!
aaf2e18
to
5616fc4
Compare
Yes, I simplified the code a little bit by separating it out into functions and hopefully that's better.
Fixed that. |
9416e1c
to
d3a12dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback inline. Let's land this and iterate on any niggling details.
vscode/webviews/chat/Transcript.tsx
Outdated
|
||
const onEditSubmit = useCallback( | ||
(editorValue: SerializedPromptEditorValue, intentFromSubmit?: ChatMessage['intent']): void => { | ||
startSpanAndSubmit('edit', editorValue, intentFromSubmit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this before, but you're not including the editor serialization, etc. I think we should start the span when the user hits enter or clicks the button. (Even that is late... ideally we'd use this and count the time from the actual input to when we get the event: https://developer.mozilla.org/en-US/docs/Web/API/PerformanceEventTiming but I'm fine if we don't do that yet, it might create headaches for Cody Web. But we should start as soon as we get the key press/click event.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, so I added it like this
const onUserAction = (action: 'edit' | 'submit', intentFromSubmit?: ChatMessage['intent']) => {
// Start the span as soon as the user initiates the action
const startMark = performance.mark('startSubmit')
const spanManager = new SpanManager('cody-webview')
const span = spanManager.startSpan('chat-interaction', {
attributes: {
sampled: true,
'render.state': 'started',
'startSubmit.mark': startMark.startTime,
},
})
if (!span) {
throw new Error('Failed to start span for chat interaction')
}
const spanContext = trace.setSpan(context.active(), span)
setActiveChatContext(spanContext)
// Serialize the editor value after starting the span
const editorValue = humanEditorRef.current?.getSerializedValue()
if (!editorValue) {
console.error('Failed to serialize editor value')
return
}
const commonProps = {
editorValue,
intent: intentFromSubmit || intentResults.current?.intent,
intentScores: intentFromSubmit ? undefined : intentResults.current?.allScores,
manuallySelectedIntent: !!intentFromSubmit,
}
if (action === 'edit') {
editHumanMessage({
messageIndexInTranscript: humanMessage.index,
...commonProps,
})
} else {
submitHumanMessage({
...commonProps,
})
}
}
so that serialization is within span. Hope that's better than before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to make more changes in followup PR too.
Kudos for writing a detailed test plan. It may make sense to unit test the span construction. It's fine to do that in a follow-up PR. The whole mechanism will be continuously observed by the alerts, so an integration test is less critical for this. |
e2505cb
to
810662a
Compare
810662a
to
5bde0b6
Compare
Revert of #6100 #6100 introduces Cody Web regression because it pulls into the bundle zlib, and apparently, the zlib browser version doesn't work in the web-worker context. Hence, we have a runtime error. I suggest we revert changes from #6100 and investigate more into what exactly is happening in web-worker Cody's Web bundle with the new open telemetry reporter. ``` Uncaught TypeError: Cannot read properties of undefined (reading 'prototype') at ../../../cody/web/dist/agent.worker-Cnq9eXnn.mjs (50be7d08-d01b-472b-8932-6af884d43a68:488831:61) at __init (50be7d08-d01b-472b-8932-6af884d43a68:11:56) at 50be7d08-d01b-472b-8932-6af884d43a68:501380:1 ``` ## Test plan - Check that Cody Web demo works (meaning has no runtime error on initial run)
Enhance the chat application by integrating OpenTelemetry for tracing and context management. This update includes:
ChatController
from the webviewThese changes aim to improve observability and debugging capabilities by providing detailed trace information for chat interactions and operations
Test plan
sg start otel
http://localhost:16686
to see if Jaegar is runningchat-interaction
this is a trace coming from the webviewHere is a video
https://www.loom.com/share/557b0ea9dffd4561810f8d67879f7dfb
Changelog