Integrate OpenTelemetry tracing in Cody Webview #6100

arafatkatze · 2024-11-08T23:35:53Z

Enhance the chat application by integrating OpenTelemetry for tracing and context management. This update includes:

Added OpenTelemetry dependencies for context and tracing.
Implemented tracing spans in chat interactions, feedback submission in Webview
Managed active chat context using OpenTelemetry's context API.
Introduced a telemetry service for managing tracing configuration.
Added a new class for exporting trace data in ChatController from the webview

These changes aim to improve observability and debugging capabilities by providing detailed trace information for chat interactions and operations

Test plan

Run sourcegraph instance locally
Run sg start otel
Run the debugger for vscode cody locally on this branch
Perform some chat operations
Go to http://localhost:16686 to see if Jaegar is running
Select Cody-Client as the service
See a trace with the title chat-interaction this is a trace coming from the webview

Here is a video
https://www.loom.com/share/557b0ea9dffd4561810f8d67879f7dfb

Changelog

github-actions · 2024-11-08T23:36:29Z

‼️ Hey @sourcegraph/cody-security, please review this PR carefully as it introduces the usage of an unsafe_ function or abuses PromptString.

dominiccooney

A good start, feedback inline.

Instead of adding lots of low quality spans, let's concentrate on a good chat latency: from hitting submit to when we paint the first token. This is interesting because:

Core product scenario.
Has an extension-side metric we can compare and contrast to learn a bit about webview latency specifically.
Easy, in the sense that it starts and ends in the webview.
...but still interesting. These "how long does it take to run postMessage" don't reflect user latency.

vscode/src/chat/chat-view/ChatController.ts

vscode/webviews/App.tsx

vscode/webviews/Chat.tsx

vscode/src/services/open-telemetry/CodyTraceExportWeb.ts

arafatkatze · 2024-11-19T05:02:50Z

vscode/src/services/open-telemetry/CodyTraceExportWeb.ts

+
+    // Determines if a span group is complete for export(Currently only for chat interactions)
+    // TODO: Extend with YAML configuration to support all span groups, not just chat interactions.
+    private isSpanGroupComplete(spanGroup: Set<ReadableSpan>): boolean {


Added a TODO here but this looks okayish for now

dominiccooney

This is shaping up very nicely.

The conditions for what state we're in look arduous, any ideas about abstractions there so it is simpler?

This is still not user latency in the sense that HumanMessageEditor onSubmitClick is probably when we could start measuring something. Looks like we have already undergone some state changes by the time we start a span in what you have here?

vscode/src/services/open-telemetry/CodyTraceExportWeb.ts

dominiccooney · 2024-11-19T12:30:16Z

vscode/src/services/open-telemetry/trace-sender.ts

+    const { auth } = await currentResolvedConfig()
+    if (!auth.accessToken) {
+        logError('TraceSender', 'Cannot send trace data: not authenticated')
+        throw new Error('Not authenticated')


Why do we require this?

What's the plan for performance around login, which we will probably want to do soon.

Why do we require this?

const traceUrl = new URL('/-/debug/otlp/v1/traces', auth.serverEndpoint).toString() const response = await fetch(traceUrl, { method: 'POST', headers: { 'Content-Type': 'application/json', ...(auth.accessToken ? { Authorization: `token ${auth.accessToken}` } : {}), }, body: spanData, })

When we send the telemetry data to the Otel Collector endpoint it does require the auth token so that's why.

What's the plan for performance around login, which we will probably want to do soon.

I gotta think through this to be honest, I can't tell you much right now unless I explore the code first

vscode/src/services/open-telemetry/CodyTraceExportWeb.ts

vscode/webviews/chat/Transcript.tsx

dominiccooney · 2024-11-19T12:48:40Z

vscode/webviews/chat/Transcript.tsx

+                timeToFirstTokenSpan.current = undefined
+                hasRecordedFirstToken.current = true
+
+                // Also set on parent span for backwards compatibility


Backwards compatibility with what? This is a new trace...

Sorry it was just a bad LLM comment so I removed it, NOTE: its still a child of the parent trace of chat-interaction.

dominiccooney · 2024-11-19T12:52:39Z

vscode/webviews/utils/webviewOpenTelemetryService.ts

+    private agentIDE?: CodyIDE
+    private extensionAgentVersion?: string
+    constructor() {
+        if (!WebviewOpenTelemetryService.instance) {


The design of this singleton and guards could be better, I think, although these things are always messy...

Why not have the singleton, initialize it, and make it convenient to use. Then the constructor doesn't need conditional side effects...

If people want to new up other ones, is there any harm in that? Why have configure exit early?

Why not have the singleton, initialize it, and make it convenient to use. Then the constructor doesn't need conditional side effects...

Great question: I actually spent some time with this.
So for this code

cody/vscode/webviews/App.tsx

Lines 159 to 173 in c497269

const webviewTelemetryService = useMemo(() => {

const service = WebviewOpenTelemetryService.getInstance()

return service

}, [])

useEffect(() => {

if (config) {

webviewTelemetryService.configure({

isTracingEnabled: true,

debugVerbose: true,

agentIDE: config.clientCapabilities.agentIDE,

extensionAgentVersion: config.clientCapabilities.agentExtensionVersion,

})

}

}, [config, webviewTelemetryService])

I had to wait for config to fully load and only then do i get the value of config.clientCapabilities.agentIDE and config.clientCapabilities.agentExtensionVersion because they are needed to figure out which OS and IDe the request is coming from: so I had to make use of useEffect for that to load fully and only then I could do the configuration(although Ideally I would have just done the initialization once and be done with it). This seemed like a good enough way to pass config values in the configuration.

Why have configure exit early?

Once the configuration is loaded and set up, I want it to be globally accessible for the webview. This approach ensures that subsequent changes—such as those triggered by useEffect—don’t introduce unnecessary complications that make debugging issues (like traces not rendering correctly) more difficult. By enforcing a singleton pattern and using a fixed configuration after initialization, we can guarantee the stability of the trace pipeline. This way, you only need to focus on potential errors in trace handling, rather than worrying about misconfigurations or unstable configurations.

dominiccooney · 2024-11-19T12:56:05Z

vscode/webviews/utils/webviewOpenTelemetryService.ts

+        }
+    }
+
+    public dispose(): void {


See you did guard some methods, but not this one. Is it simpler and better to just leave the global alone in these instance methods?

Thanks for pointing that out! I've added a guard clause to the dispose method to ensure it only affects the global instance when appropriate. This should help maintain simplicity and prevent any unintended modifications to the global state.
Let me know if you have any other suggestions!

arafatkatze · 2024-11-20T05:50:01Z

The conditions for what state we're in look arduous, any ideas about abstractions there so it is simpler?

Yes, I simplified the code a little bit by separating it out into functions and hopefully that's better.

This is still not user latency in the sense that HumanMessageEditor onSubmitClick is probably when we could start measuring something. Looks like we have already undergone some state changes by the time we start a span in what you have here?

Fixed that.

dominiccooney

Some feedback inline. Let's land this and iterate on any niggling details.

vscode/webviews/Chat.tsx

vscode/webviews/chat/Transcript.tsx

dominiccooney · 2024-11-20T13:55:19Z

vscode/webviews/chat/Transcript.tsx

+
+    const onEditSubmit = useCallback(
+        (editorValue: SerializedPromptEditorValue, intentFromSubmit?: ChatMessage['intent']): void => {
+            startSpanAndSubmit('edit', editorValue, intentFromSubmit)


I mentioned this before, but you're not including the editor serialization, etc. I think we should start the span when the user hits enter or clicks the button. (Even that is late... ideally we'd use this and count the time from the actual input to when we get the event: https://developer.mozilla.org/en-US/docs/Web/API/PerformanceEventTiming but I'm fine if we don't do that yet, it might create headaches for Cody Web. But we should start as soon as we get the key press/click event.)

Fair point, so I added it like this

const onUserAction = (action: 'edit' | 'submit', intentFromSubmit?: ChatMessage['intent']) => { // Start the span as soon as the user initiates the action const startMark = performance.mark('startSubmit') const spanManager = new SpanManager('cody-webview') const span = spanManager.startSpan('chat-interaction', { attributes: { sampled: true, 'render.state': 'started', 'startSubmit.mark': startMark.startTime, }, }) if (!span) { throw new Error('Failed to start span for chat interaction') } const spanContext = trace.setSpan(context.active(), span) setActiveChatContext(spanContext) // Serialize the editor value after starting the span const editorValue = humanEditorRef.current?.getSerializedValue() if (!editorValue) { console.error('Failed to serialize editor value') return } const commonProps = { editorValue, intent: intentFromSubmit || intentResults.current?.intent, intentScores: intentFromSubmit ? undefined : intentResults.current?.allScores, manuallySelectedIntent: !!intentFromSubmit, } if (action === 'edit') { editHumanMessage({ messageIndexInTranscript: humanMessage.index, ...commonProps, }) } else { submitHumanMessage({ ...commonProps, }) } }

so that serialization is within span. Hope that's better than before?

Happy to make more changes in followup PR too.

vscode/webviews/chat/Transcript.tsx

dominiccooney · 2024-11-20T14:08:07Z

Kudos for writing a detailed test plan.

It may make sense to unit test the span construction. It's fine to do that in a follow-up PR. The whole mechanism will be continuously observed by the alerts, so an integration test is less critical for this.

Revert of #6100 #6100 introduces Cody Web regression because it pulls into the bundle zlib, and apparently, the zlib browser version doesn't work in the web-worker context. Hence, we have a runtime error. I suggest we revert changes from #6100 and investigate more into what exactly is happening in web-worker Cody's Web bundle with the new open telemetry reporter. ``` Uncaught TypeError: Cannot read properties of undefined (reading 'prototype') at ../../../cody/web/dist/agent.worker-Cnq9eXnn.mjs (50be7d08-d01b-472b-8932-6af884d43a68:488831:61) at __init (50be7d08-d01b-472b-8932-6af884d43a68:11:56) at 50be7d08-d01b-472b-8932-6af884d43a68:501380:1 ``` ## Test plan - Check that Cody Web demo works (meaning has no runtime error on initial run)

arafatkatze closed this Nov 9, 2024

arafatkatze reopened this Nov 9, 2024

arafatkatze force-pushed the trace-export branch from 37b3804 to ac6d9c4 Compare November 13, 2024 04:00

arafatkatze changed the title ~~Trace export~~ Integrate OpenTelemetry tracing in Cody Webview Nov 13, 2024

dominiccooney requested changes Nov 13, 2024

View reviewed changes

arafatkatze requested a review from kritzcreek November 14, 2024 15:54

arafatkatze force-pushed the trace-export branch 4 times, most recently from 5deba34 to 6c5c8c2 Compare November 16, 2024 02:39

arafatkatze marked this pull request as ready for review November 19, 2024 05:01

arafatkatze commented Nov 19, 2024

View reviewed changes

vscode/src/services/open-telemetry/CodyTraceExportWeb.ts Outdated Show resolved Hide resolved

arafatkatze commented Nov 19, 2024

View reviewed changes

arafatkatze force-pushed the trace-export branch 2 times, most recently from 1131458 to 8fbcd4b Compare November 19, 2024 05:30

dominiccooney requested changes Nov 19, 2024

View reviewed changes

arafatkatze force-pushed the trace-export branch from aaf2e18 to 5616fc4 Compare November 20, 2024 04:26

arafatkatze force-pushed the trace-export branch 2 times, most recently from 9416e1c to d3a12dc Compare November 20, 2024 06:09

dominiccooney approved these changes Nov 20, 2024

View reviewed changes

arafatkatze enabled auto-merge (squash) November 21, 2024 00:13

arafatkatze force-pushed the trace-export branch from e2505cb to 810662a Compare November 21, 2024 00:30

Integrate OpenTelemetry tracing in Cody Webview

5bde0b6

arafatkatze force-pushed the trace-export branch from 810662a to 5bde0b6 Compare November 21, 2024 00:30

arafatkatze merged commit f747f6e into main Nov 21, 2024
18 of 20 checks passed

arafatkatze deleted the trace-export branch November 21, 2024 00:43

vovakulikov added a commit that referenced this pull request Nov 22, 2024

Revert open telemetry integration #6100

b558cce

vovakulikov mentioned this pull request Nov 22, 2024

Revert open telemetry integration #6100 #6179

Merged

This was referenced Nov 25, 2024

Adding Distributed Tracing and Smart Apply to cody #6178

Draft

Fixing Integrate OpenTelemetry tracing in Cody Webview PR 6100 #6192

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate OpenTelemetry tracing in Cody Webview #6100

Integrate OpenTelemetry tracing in Cody Webview #6100

arafatkatze commented Nov 8, 2024 •

edited

Loading

github-actions bot commented Nov 8, 2024

dominiccooney left a comment

arafatkatze Nov 19, 2024

dominiccooney left a comment

dominiccooney Nov 19, 2024

arafatkatze Nov 20, 2024

dominiccooney Nov 19, 2024

arafatkatze Nov 20, 2024

dominiccooney Nov 19, 2024

arafatkatze Nov 20, 2024 •

edited

Loading

dominiccooney Nov 19, 2024

arafatkatze Nov 20, 2024

arafatkatze commented Nov 20, 2024 •

edited

Loading

dominiccooney left a comment

dominiccooney Nov 20, 2024

arafatkatze Nov 21, 2024

arafatkatze Nov 21, 2024

dominiccooney commented Nov 20, 2024

	const webviewTelemetryService = useMemo(() => {
	const service = WebviewOpenTelemetryService.getInstance()
	return service
	}, [])
	useEffect(() => {
	if (config) {
	webviewTelemetryService.configure({
	isTracingEnabled: true,
	debugVerbose: true,
	agentIDE: config.clientCapabilities.agentIDE,
	extensionAgentVersion: config.clientCapabilities.agentExtensionVersion,
	})
	}
	}, [config, webviewTelemetryService])

Integrate OpenTelemetry tracing in Cody Webview #6100

Integrate OpenTelemetry tracing in Cody Webview #6100

Conversation

arafatkatze commented Nov 8, 2024 • edited Loading

Test plan

Changelog

github-actions bot commented Nov 8, 2024

dominiccooney left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dominiccooney left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arafatkatze Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arafatkatze commented Nov 20, 2024 • edited Loading

dominiccooney left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dominiccooney commented Nov 20, 2024

arafatkatze commented Nov 8, 2024 •

edited

Loading

arafatkatze Nov 20, 2024 •

edited

Loading

arafatkatze commented Nov 20, 2024 •

edited

Loading