feat(otel): Add OpenTelemetry GenAI instrumentation to Copilot Chat#3917
Draft
feat(otel): Add OpenTelemetry GenAI instrumentation to Copilot Chat#3917
Conversation
165fc92 to
6d02516
Compare
43dcf31 to
973e678
Compare
Phase 0 complete: - spec.md: Full spec with decisions, GenAI semconv, dual-write, eval signals, lessons from Gemini CLI + Claude Code - plan.md: E2E demo plan (chat ext + eval repo + Azure backend) - src/platform/otel/: IOTelService, config, attributes, metrics, events, message formatters, NodeOTelService, file exporters - package.json: Added @opentelemetry/* dependencies OTel opt-in behind OTEL_EXPORTER_OTLP_ENDPOINT env var.
- Register IOTelService in DI (NodeOTelService when enabled, NoopOTelService when disabled)
- Add OTelContrib lifecycle contribution for OTel init/shutdown
- Add `chat {model}` inference span in ChatMLFetcherImpl._doFetchAndStreamChat()
- Add `execute_tool {name}` span in ToolsService.invokeTool()
- Add `invoke_agent {participant}` parent span in ToolCallingLoop.run()
- Record gen_ai.client.operation.duration, tool call count/duration, agent metrics
- Thread IOTelService through all ToolCallingLoop subclasses
- Update test files with NoopOTelService
- Zero overhead when OTel is disabled (noop providers, no dynamic imports)
- Add `embeddings {model}` span in RemoteEmbeddingsComputer.computeEmbeddings()
- Add VS Code settings under github.copilot.chat.otel.* in package.json
(enabled, exporterType, otlpEndpoint, captureContent, outfile)
- Wire VS Code settings into resolveOTelConfig in services.ts
- Add unit tests for:
- resolveOTelConfig: env precedence, kill switch, all config paths (16 tests)
- NoopOTelService: zero-overhead noop behavior (8 tests)
- GenAiMetrics: metric recording with correct attributes (7 tests)
…porters - messageFormatters: 18 tests covering toInputMessages, toOutputMessages, toSystemInstructions, toToolDefinitions (edge cases, empty inputs, invalid JSON) - genAiEvents: 9 tests covering all 4 event emitters, content capture on/off - fileExporters: 5 tests covering write/read round-trip for span, log, metric exporters plus aggregation temporality Total OTel test suite: 63 tests across 6 files
Add gen_ai.client.token.usage (input/output) and copilot_chat.time_to_first_token histogram metrics at the fetchMany success path where token counts and TTFT are available from the processSuccessfulResponse result.
… token usage Wire emitInferenceDetailsEvent into fetchMany success path where full token usage (prompt_tokens, completion_tokens), resolved model, request ID, and finish reasons are available from processSuccessfulResponse. This follows the OTel GenAI spec pattern: - Spans: timing + hierarchy + error tracking - Events: full request/response details including token counts The data mirrors what RequestLogger captures for chat-export-logs.json.
Per the OTel GenAI agent spans spec, add gen_ai.usage.input_tokens and gen_ai.usage.output_tokens as Recommended attributes on the invoke_agent span. Tokens are accumulated across all LLM turns by listening to onDidReceiveResponse events during the agent loop, then set on the span before it ends. Ref: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
Defer the `chat {model}` span completion from _doFetchAndStreamChat to
fetchMany where processSuccessfulResponse has extracted token counts.
The chat span now carries:
- gen_ai.usage.input_tokens (prompt_tokens)
- gen_ai.usage.output_tokens (completion_tokens)
- gen_ai.response.model (resolved model)
The span handle is returned from _doFetchAndStreamChat via the result
object so fetchMany can set attributes and end it after tokens are known.
This matches the chat-export-logs.json pattern where each request entry
carries full usage data alongside the response.
…gs/results) - Chat spans: add copilot.debug_name attribute for identifying orphan spans - Chat spans: capture gen_ai.input.messages and gen_ai.output.messages when captureContent enabled - Tool spans: capture gen_ai.tool.call.arguments and gen_ai.tool.call.result when captureContent enabled - Extension chat endpoint: capture input/output messages when captureContent enabled - Add CopilotAttr.DEBUG_NAME constant
…itTestingServices)
- Change gen_ai.provider.name from 'openai' to 'github' for CAPI models - Rename CopilotAttr to CopilotChatAttr, prefix values with copilot_chat.* - Add GITHUB to GenAiProviderName enum - Replace copilot.debug_name with gen_ai.agent.name on chat spans - Add gen_ai.request.temperature, gen_ai.request.top_p to chat spans - Add gen_ai.response.id, gen_ai.response.finish_reasons on success - Add gen_ai.usage.cache_read.input_tokens from cached_tokens - Add copilot_chat.request.max_prompt_tokens and copilot_chat.time_to_first_token - Add gen_ai.tool.description to execute_tool spans - Fix gen_ai.tool.call.id to read chatStreamToolCallId (was reading nonexistent prop) - Fix tool result capture to handle PromptTsxPart and DataPart (not just TextPart) - Add gen_ai.input.messages and gen_ai.output.messages to invoke_agent span (opt-in) - Move gen_ai.tool.definitions from chat spans to invoke_agent span (opt-in) - Add gen_ai.system_instructions to chat spans (opt-in) - Fix error.type raw strings to use StdAttr.ERROR_TYPE constant - Centralize hardcoded copilot.turn_count and copilot.endpoint_type into CopilotChatAttr - Add COPILOT_OTEL_CAPTURE_CONTENT=true to launch.json for testing - Document span hierarchy fixes needed in plan.md
…ation - Add TraceContext type and getActiveTraceContext() to IOTelService - Add storeTraceContext/getStoredTraceContext for cross-boundary propagation - Add parentTraceContext option to SpanOptions for explicit parent linking - Implement in NodeOTelService using OTel remote span context - Capture trace context when execute_tool runSubagent fires (keyed by toolCallId) - Restore parent context in subagent invoke_agent span (via subAgentInvocationId) - Auto-cleanup stored contexts after 5 minutes to prevent memory leaks - Update test mocks with new IOTelService methods - Update plan.md with investigation findings
The previous implementation stored trace context keyed by chatStreamToolCallId (model-assigned tool call ID), but looked it up by subAgentInvocationId (VS Code internal invocation.callId UUID). These are different IDs that don't match across the IPC boundary. Fix: key by chatRequestId on store side (available on invocation options), and look up by parentRequestId on subagent side (same value, available on ChatRequest). Both reference the parent agent's request ID. Verified: 21-span trace with subagent correctly nested under parent agent.
c04fc51 to
480b1c8
Compare
…YOK chat - Set gen_ai.request.model on invoke_agent span from endpoint - Track gen_ai.response.model from last LLM response resolvedModel - Add copilot_chat.request.max_prompt_tokens to BYOK chat spans - Document upstream gaps in plan.md (BYOK token usage, programmatic tool IDs)
Tests verify: - storeTraceContext/getStoredTraceContext round-trip and single-use semantics - getActiveTraceContext returns context inside startActiveSpan - parentTraceContext makes child span inherit traceId from parent - Independent spans get different traceIds without parentTraceContext - Full subagent flow: store context in tool call, retrieve in subagent
…rphan spans - Set gen_ai.response.finish_reasons on BYOK chat success - Set copilot_chat.time_to_first_token on BYOK chat success - Document Gap 4: duplicate orphan spans from CopilotLanguageModelWrapper - Identify all orphan span categories (title, progressMessages, promptCategorization, wrapper)
…sage data The copilotLanguageModelWrapper orphan spans are the actual CAPI HTTP handlers, not duplicates. They contain real token usage, cache read tokens, resolved model names, and temperature — all missing from the consumer-side extChatEndpoint spans due to VS Code LM API limitations. Updated plan.md with: - Side-by-side attribute comparison table - Three fix approaches (context propagation, span suppression, enrichment) - Recommendation: Option 1 (propagate trace context through IPC)
…spans - Pass _otelTraceContext through modelOptions alongside _capturingTokenCorrelationId - Inject IOTelService into CopilotLanguageModelWrapper - Wrap makeRequest in startActiveSpan with parentTraceContext when available - This creates a byok-provider bridge span that makes chatMLFetcher's chat span a child of the original invoke_agent trace, bringing real token usage data into the agent trace hierarchy
…ified working Verified: 63-span trace with Azure BYOK (gpt-5) correctly shows: - byok-provider bridge spans linking wrapper chat spans into agent trace - Real token usage (in:21458 out:1730 cache:19072) visible on wrapper chat spans - hasCtx:true on all extChatEndpoint spans confirming context capture - Two subagent invoke_agent spans correctly nested under main agent - Zero orphan copilotLanguageModelWrapper spans
…ext propagation Add runWithTraceContext() to IOTelService — sets parent trace context without creating a visible span. The wrapper's chat spans now appear directly as children of invoke_agent, eliminating the noisy byok-provider intermediary span. Before: invoke_agent → byok-provider → chat (wrapper) After: invoke_agent → chat (wrapper)
The extChatEndpoint no longer creates its own chat span. The wrapper's chatMLFetcher span (via CopilotLanguageModelWrapper) is the single source of truth with full token usage, cache data, and resolved model. Before: invoke_agent → chat (empty, extChatEndpoint) + chat (rich, wrapper) After: invoke_agent → chat (rich, wrapper only)
…c, Gemini) The previous commit removed the extChatEndpoint chat span, which was correct for Azure/OpenAI BYOK (served by CopilotLanguageModelWrapper via chatMLFetcher). But Anthropic and Gemini BYOK providers call their native SDKs directly, bypassing CopilotLanguageModelWrapper — so they need the consumer-side span. Now: always create a chat span in extChatEndpoint with basic metadata (model, provider, response.id, finish_reasons). For wrapper-based providers, the chatMLFetcher also creates a richer sibling span with token usage.
Only create the extChatEndpoint chat span for non-wrapper providers (Anthropic, Gemini) that need it as their only span. Wrapper-based providers (Azure, OpenAI, OpenRouter, Ollama, xAI) get a single rich span from chatMLFetcher via CopilotLanguageModelWrapper. Result: 1 chat span per LLM call for all provider types.
…vider Move chat span creation into AnthropicLMProvider where actual API response data (token usage, cache reads) is available. The span is linked to the agent trace via runWithTraceContext and enriched with: - gen_ai.usage.input_tokens / output_tokens - gen_ai.usage.cache_read.input_tokens - gen_ai.response.model / response.id / finish_reasons Remove consumer-side extChatEndpoint span for all vendors (nonWrapperVendors now empty) since both wrapper-based and Anthropic providers create their own spans with full data. Next: apply same pattern to Gemini provider.
- Add OTel chat span with full usage data to GeminiNativeBYOKLMProvider - Remove all consumer-side span code from extChatEndpoint (dead code) - Each provider now owns its chat span with real API response data: * CAPI: chatMLFetcher * OpenAI-compat BYOK: CopilotLanguageModelWrapper → chatMLFetcher * Anthropic: AnthropicLMProvider * Gemini: GeminiNativeBYOKLMProvider - Fix Gemini test to pass IOTelService
Add to both providers: - copilot_chat.request.max_prompt_tokens (model.maxInputTokens) - server.address (api.anthropic.com / generativelanguage.googleapis.com) - gen_ai.conversation.id (requestId) - copilot_chat.time_to_first_token (result.ttft) Now matches CAPI chat span attribute parity.
Extract hostname from urlOrRequestMetadata when it's a URL string and set as server.address on the chat span. Works for both CAPI and CopilotLanguageModelWrapper (Azure BYOK) paths.
…at spans - gen_ai.request.max_tokens from model.maxOutputTokens - gen_ai.output.messages (opt-in) from response text - Closes remaining attribute gaps vs CAPI/Azure BYOK spans
When model responds with tool calls instead of text, the output_messages attribute was empty. Now captures both text parts and tool call parts in the output_messages, matching the OTel GenAI output messages schema. Also: Azure BYOK invoke_agent zero tokens is a known upstream gap — extChatEndpoint returns hardcoded usage:0 since VS Code LM API doesn't expose actual usage from the provider side.
… BYOK spans Same fix as CAPI — when model responds with tool calls, include them in gen_ai.output.messages alongside text parts. All three provider paths (CAPI, Anthropic, Gemini) now consistently capture both text and tool call parts in output messages.
… spans - gen_ai.input.messages (opt-in) captured from provider messages parameter - gen_ai.agent.name set to AnthropicBYOK / GeminiBYOK for identification Closes the last attribute gaps vs CAPI/Azure BYOK chat spans.
- Map enum role values to names (1→user, 2→assistant, 3→system) - Extract text from LanguageModelTextPart content arrays instead of showing '[complex]' for all messages - Use OTel GenAI input messages schema with role + parts format
Coverage matrix showing: - Anthropic/Gemini BYOK missing: operation.duration, token.usage, time_to_first_token metrics, and inference.details event - CAPI and Azure BYOK (via wrapper) fully covered - Tool/agent/session metrics covered across all providers - 4 tasks (M1-M4) to close the gap
… providers Both providers now record: - gen_ai.client.operation.duration histogram - gen_ai.client.token.usage histograms (input + output) - copilot_chat.time_to_first_token histogram - gen_ai.client.inference.operation.details log event All metrics/events now have full parity across CAPI, Azure BYOK, Anthropic BYOK, and Gemini BYOK.
… v2) The OTel SDK v2 changed the LoggerProvider constructor option from 'logRecordProcessors' to 'processors'. The old key was silently ignored, causing all log records to be dropped. This is why logs never appeared in Loki despite traces working fine.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds opt-in OpenTelemetry instrumentation to Copilot Chat following the OTel GenAI semantic conventions. Emits traces, metrics, and events for LLM calls, tool executions, agent orchestration, and embeddings. Existing telemetry (
ITelemetryService) is unchanged.What's included
Phase 0 — Foundation
IOTelServiceinterface +OTelServiceImpl(Node) with DI registrationNoopOtelServicefor disabled/test/web pathsCOPILOT_OTEL_*>OTEL_*> VS Code settings > defaults)genAiAttributes.ts)gen_ai.client.token.usage,gen_ai.client.operation.duration,copilot_chat.*)gen_ai.client.inference.operation.details, session/tool/agent events)Phase 1 — Wiring into chat extension
chat {model}) inchatMLFetcher.ts— model, tokens, TTFT, finish reasonsexecute_tool {name}) intoolsService.ts— tool name/type/id, args/results (opt-in)invoke_agent {participant}) intoolCallingLoop.ts— parent span for full hierarchyembeddings {model}) inremoteEmbeddingsComputer.tsCOPILOT_OTEL_CAPTURE_CONTENT=true)Activation
Off by default. Enable via env vars:
Respects
telemetry.telemetryLevel— globally disabled when telemetry is off.Span hierarchy (Agent mode)
Testing
Risk
ITelemetryServicecode paths