feat(otel): Add OpenTelemetry GenAI instrumentation to Copilot Chat by zhichli · Pull Request #3917 · microsoft/vscode-copilot-chat

zhichli · 2026-02-21T22:08:36Z

Summary

Adds opt-in OpenTelemetry instrumentation to Copilot Chat following the OTel GenAI semantic conventions. Emits traces, metrics, and events for LLM calls, tool executions, agent orchestration, and embeddings. Existing telemetry (ITelemetryService) is unchanged.

What's included

Phase 0 — Foundation

IOTelService interface + OTelServiceImpl (Node) with DI registration
NoopOtelService for disabled/test/web paths
Config resolver with layered env precedence (COPILOT_OTEL_* > OTEL_* > VS Code settings > defaults)
GenAI semantic convention constants (genAiAttributes.ts)
Message formatters for OTel GenAI JSON schema
Metric instruments (gen_ai.client.token.usage, gen_ai.client.operation.duration, copilot_chat.*)
Event emitters (gen_ai.client.inference.operation.details, session/tool/agent events)
File exporters (JSON-lines fallback for CI/offline)
OTLP HTTP + gRPC + console exporter support

Phase 1 — Wiring into chat extension

Inference spans (chat {model}) in chatMLFetcher.ts — model, tokens, TTFT, finish reasons
Tool spans (execute_tool {name}) in toolsService.ts — tool name/type/id, args/results (opt-in)
Agent spans (invoke_agent {participant}) in toolCallingLoop.ts — parent span for full hierarchy
Embeddings spans (embeddings {model}) in remoteEmbeddingsComputer.ts
Content capture — full messages, responses, system instructions, tool definitions (opt-in via COPILOT_OTEL_CAPTURE_CONTENT=true)
Metrics recording at all instrumentation points
Diagnostic exporter logs first successful export for easy verification

Activation

Off by default. Enable via env vars:

COPILOT_OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Respects telemetry.telemetryLevel — globally disabled when telemetry is off.

Span hierarchy (Agent mode)

invoke_agent copilot                    [INTERNAL]
  ├── chat gpt-4o                       [CLIENT]
  ├── execute_tool readFile             [INTERNAL]
  ├── execute_tool runCommand           [INTERNAL]
  ├── chat gpt-4o                       [CLIENT]
  └── ...

Testing

63 unit tests across 6 test files covering config, formatters, events, metrics, file exporters, noop service
Verified E2E with local Jaeger (OTLP HTTP on :4318)

Risk

Bundle size: OTel deps added (~200KB budget)
Zero overhead when disabled (noop providers)
No changes to existing ITelemetryService code paths

Phase 0 complete: - spec.md: Full spec with decisions, GenAI semconv, dual-write, eval signals, lessons from Gemini CLI + Claude Code - plan.md: E2E demo plan (chat ext + eval repo + Azure backend) - src/platform/otel/: IOTelService, config, attributes, metrics, events, message formatters, NodeOTelService, file exporters - package.json: Added @opentelemetry/* dependencies OTel opt-in behind OTEL_EXPORTER_OTLP_ENDPOINT env var.

- Register IOTelService in DI (NodeOTelService when enabled, NoopOTelService when disabled) - Add OTelContrib lifecycle contribution for OTel init/shutdown - Add `chat {model}` inference span in ChatMLFetcherImpl._doFetchAndStreamChat() - Add `execute_tool {name}` span in ToolsService.invokeTool() - Add `invoke_agent {participant}` parent span in ToolCallingLoop.run() - Record gen_ai.client.operation.duration, tool call count/duration, agent metrics - Thread IOTelService through all ToolCallingLoop subclasses - Update test files with NoopOTelService - Zero overhead when OTel is disabled (noop providers, no dynamic imports)

- Add `embeddings {model}` span in RemoteEmbeddingsComputer.computeEmbeddings() - Add VS Code settings under github.copilot.chat.otel.* in package.json (enabled, exporterType, otlpEndpoint, captureContent, outfile) - Wire VS Code settings into resolveOTelConfig in services.ts - Add unit tests for: - resolveOTelConfig: env precedence, kill switch, all config paths (16 tests) - NoopOTelService: zero-overhead noop behavior (8 tests) - GenAiMetrics: metric recording with correct attributes (7 tests)

…porters - messageFormatters: 18 tests covering toInputMessages, toOutputMessages, toSystemInstructions, toToolDefinitions (edge cases, empty inputs, invalid JSON) - genAiEvents: 9 tests covering all 4 event emitters, content capture on/off - fileExporters: 5 tests covering write/read round-trip for span, log, metric exporters plus aggregation temporality Total OTel test suite: 63 tests across 6 files

Add gen_ai.client.token.usage (input/output) and copilot_chat.time_to_first_token histogram metrics at the fetchMany success path where token counts and TTFT are available from the processSuccessfulResponse result.

… token usage Wire emitInferenceDetailsEvent into fetchMany success path where full token usage (prompt_tokens, completion_tokens), resolved model, request ID, and finish reasons are available from processSuccessfulResponse. This follows the OTel GenAI spec pattern: - Spans: timing + hierarchy + error tracking - Events: full request/response details including token counts The data mirrors what RequestLogger captures for chat-export-logs.json.

Per the OTel GenAI agent spans spec, add gen_ai.usage.input_tokens and gen_ai.usage.output_tokens as Recommended attributes on the invoke_agent span. Tokens are accumulated across all LLM turns by listening to onDidReceiveResponse events during the agent loop, then set on the span before it ends. Ref: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/

Defer the `chat {model}` span completion from _doFetchAndStreamChat to fetchMany where processSuccessfulResponse has extracted token counts. The chat span now carries: - gen_ai.usage.input_tokens (prompt_tokens) - gen_ai.usage.output_tokens (completion_tokens) - gen_ai.response.model (resolved model) The span handle is returned from _doFetchAndStreamChat via the result object so fetchMany can set attributes and end it after tokens are known. This matches the chat-export-logs.json pattern where each request entry carries full usage data alongside the response.

… failures

…gs/results) - Chat spans: add copilot.debug_name attribute for identifying orphan spans - Chat spans: capture gen_ai.input.messages and gen_ai.output.messages when captureContent enabled - Tool spans: capture gen_ai.tool.call.arguments and gen_ai.tool.call.result when captureContent enabled - Extension chat endpoint: capture input/output messages when captureContent enabled - Add CopilotAttr.DEBUG_NAME constant

…gurations test

…itTestingServices)

- Change gen_ai.provider.name from 'openai' to 'github' for CAPI models - Rename CopilotAttr to CopilotChatAttr, prefix values with copilot_chat.* - Add GITHUB to GenAiProviderName enum - Replace copilot.debug_name with gen_ai.agent.name on chat spans - Add gen_ai.request.temperature, gen_ai.request.top_p to chat spans - Add gen_ai.response.id, gen_ai.response.finish_reasons on success - Add gen_ai.usage.cache_read.input_tokens from cached_tokens - Add copilot_chat.request.max_prompt_tokens and copilot_chat.time_to_first_token - Add gen_ai.tool.description to execute_tool spans - Fix gen_ai.tool.call.id to read chatStreamToolCallId (was reading nonexistent prop) - Fix tool result capture to handle PromptTsxPart and DataPart (not just TextPart) - Add gen_ai.input.messages and gen_ai.output.messages to invoke_agent span (opt-in) - Move gen_ai.tool.definitions from chat spans to invoke_agent span (opt-in) - Add gen_ai.system_instructions to chat spans (opt-in) - Fix error.type raw strings to use StdAttr.ERROR_TYPE constant - Centralize hardcoded copilot.turn_count and copilot.endpoint_type into CopilotChatAttr - Add COPILOT_OTEL_CAPTURE_CONTENT=true to launch.json for testing - Document span hierarchy fixes needed in plan.md

…ation - Add TraceContext type and getActiveTraceContext() to IOTelService - Add storeTraceContext/getStoredTraceContext for cross-boundary propagation - Add parentTraceContext option to SpanOptions for explicit parent linking - Implement in NodeOTelService using OTel remote span context - Capture trace context when execute_tool runSubagent fires (keyed by toolCallId) - Restore parent context in subagent invoke_agent span (via subAgentInvocationId) - Auto-cleanup stored contexts after 5 minutes to prevent memory leaks - Update test mocks with new IOTelService methods - Update plan.md with investigation findings

The previous implementation stored trace context keyed by chatStreamToolCallId (model-assigned tool call ID), but looked it up by subAgentInvocationId (VS Code internal invocation.callId UUID). These are different IDs that don't match across the IPC boundary. Fix: key by chatRequestId on store side (available on invocation options), and look up by parentRequestId on subagent side (same value, available on ChatRequest). Both reference the parent agent's request ID. Verified: 21-span trace with subagent correctly nested under parent agent.

…YOK chat - Set gen_ai.request.model on invoke_agent span from endpoint - Track gen_ai.response.model from last LLM response resolvedModel - Add copilot_chat.request.max_prompt_tokens to BYOK chat spans - Document upstream gaps in plan.md (BYOK token usage, programmatic tool IDs)

Tests verify: - storeTraceContext/getStoredTraceContext round-trip and single-use semantics - getActiveTraceContext returns context inside startActiveSpan - parentTraceContext makes child span inherit traceId from parent - Independent spans get different traceIds without parentTraceContext - Full subagent flow: store context in tool call, retrieve in subagent

…rphan spans - Set gen_ai.response.finish_reasons on BYOK chat success - Set copilot_chat.time_to_first_token on BYOK chat success - Document Gap 4: duplicate orphan spans from CopilotLanguageModelWrapper - Identify all orphan span categories (title, progressMessages, promptCategorization, wrapper)

…sage data The copilotLanguageModelWrapper orphan spans are the actual CAPI HTTP handlers, not duplicates. They contain real token usage, cache read tokens, resolved model names, and temperature — all missing from the consumer-side extChatEndpoint spans due to VS Code LM API limitations. Updated plan.md with: - Side-by-side attribute comparison table - Three fix approaches (context propagation, span suppression, enrichment) - Recommendation: Option 1 (propagate trace context through IPC)

…spans - Pass _otelTraceContext through modelOptions alongside _capturingTokenCorrelationId - Inject IOTelService into CopilotLanguageModelWrapper - Wrap makeRequest in startActiveSpan with parentTraceContext when available - This creates a byok-provider bridge span that makes chatMLFetcher's chat span a child of the original invoke_agent trace, bringing real token usage data into the agent trace hierarchy

…YOK path

…ified working Verified: 63-span trace with Azure BYOK (gpt-5) correctly shows: - byok-provider bridge spans linking wrapper chat spans into agent trace - Real token usage (in:21458 out:1730 cache:19072) visible on wrapper chat spans - hasCtx:true on all extChatEndpoint spans confirming context capture - Two subagent invoke_agent spans correctly nested under main agent - Zero orphan copilotLanguageModelWrapper spans

…ext propagation Add runWithTraceContext() to IOTelService — sets parent trace context without creating a visible span. The wrapper's chat spans now appear directly as children of invoke_agent, eliminating the noisy byok-provider intermediary span. Before: invoke_agent → byok-provider → chat (wrapper) After: invoke_agent → chat (wrapper)

The extChatEndpoint no longer creates its own chat span. The wrapper's chatMLFetcher span (via CopilotLanguageModelWrapper) is the single source of truth with full token usage, cache data, and resolved model. Before: invoke_agent → chat (empty, extChatEndpoint) + chat (rich, wrapper) After: invoke_agent → chat (rich, wrapper only)

…c, Gemini) The previous commit removed the extChatEndpoint chat span, which was correct for Azure/OpenAI BYOK (served by CopilotLanguageModelWrapper via chatMLFetcher). But Anthropic and Gemini BYOK providers call their native SDKs directly, bypassing CopilotLanguageModelWrapper — so they need the consumer-side span. Now: always create a chat span in extChatEndpoint with basic metadata (model, provider, response.id, finish_reasons). For wrapper-based providers, the chatMLFetcher also creates a richer sibling span with token usage.

Only create the extChatEndpoint chat span for non-wrapper providers (Anthropic, Gemini) that need it as their only span. Wrapper-based providers (Azure, OpenAI, OpenRouter, Ollama, xAI) get a single rich span from chatMLFetcher via CopilotLanguageModelWrapper. Result: 1 chat span per LLM call for all provider types.

…vider Move chat span creation into AnthropicLMProvider where actual API response data (token usage, cache reads) is available. The span is linked to the agent trace via runWithTraceContext and enriched with: - gen_ai.usage.input_tokens / output_tokens - gen_ai.usage.cache_read.input_tokens - gen_ai.response.model / response.id / finish_reasons Remove consumer-side extChatEndpoint span for all vendors (nonWrapperVendors now empty) since both wrapper-based and Anthropic providers create their own spans with full data. Next: apply same pattern to Gemini provider.

- Add OTel chat span with full usage data to GeminiNativeBYOKLMProvider - Remove all consumer-side span code from extChatEndpoint (dead code) - Each provider now owns its chat span with real API response data: * CAPI: chatMLFetcher * OpenAI-compat BYOK: CopilotLanguageModelWrapper → chatMLFetcher * Anthropic: AnthropicLMProvider * Gemini: GeminiNativeBYOKLMProvider - Fix Gemini test to pass IOTelService

Add to both providers: - copilot_chat.request.max_prompt_tokens (model.maxInputTokens) - server.address (api.anthropic.com / generativelanguage.googleapis.com) - gen_ai.conversation.id (requestId) - copilot_chat.time_to_first_token (result.ttft) Now matches CAPI chat span attribute parity.

Extract hostname from urlOrRequestMetadata when it's a URL string and set as server.address on the chat span. Works for both CAPI and CopilotLanguageModelWrapper (Azure BYOK) paths.

…at spans - gen_ai.request.max_tokens from model.maxOutputTokens - gen_ai.output.messages (opt-in) from response text - Closes remaining attribute gaps vs CAPI/Azure BYOK spans

When model responds with tool calls instead of text, the output_messages attribute was empty. Now captures both text parts and tool call parts in the output_messages, matching the OTel GenAI output messages schema. Also: Azure BYOK invoke_agent zero tokens is a known upstream gap — extChatEndpoint returns hardcoded usage:0 since VS Code LM API doesn't expose actual usage from the provider side.

… BYOK spans Same fix as CAPI — when model responds with tool calls, include them in gen_ai.output.messages alongside text parts. All three provider paths (CAPI, Anthropic, Gemini) now consistently capture both text and tool call parts in output messages.

… spans - gen_ai.input.messages (opt-in) captured from provider messages parameter - gen_ai.agent.name set to AnthropicBYOK / GeminiBYOK for identification Closes the last attribute gaps vs CAPI/Azure BYOK chat spans.

- Map enum role values to names (1→user, 2→assistant, 3→system) - Extract text from LanguageModelTextPart content arrays instead of showing '[complex]' for all messages - Use OTel GenAI input messages schema with role + parts format

Coverage matrix showing: - Anthropic/Gemini BYOK missing: operation.duration, token.usage, time_to_first_token metrics, and inference.details event - CAPI and Azure BYOK (via wrapper) fully covered - Tool/agent/session metrics covered across all providers - 4 tasks (M1-M4) to close the gap

… providers Both providers now record: - gen_ai.client.operation.duration histogram - gen_ai.client.token.usage histograms (input + output) - copilot_chat.time_to_first_token histogram - gen_ai.client.inference.operation.details log event All metrics/events now have full parity across CAPI, Azure BYOK, Anthropic BYOK, and Gemini BYOK.

… v2) The OTel SDK v2 changed the LoggerProvider constructor option from 'logRecordProcessors' to 'processors'. The old key was silently ignored, causing all log records to be dropped. This is why logs never appeared in Loki despite traces working fine.

zhichli force-pushed the zhichli/otel branch from 165fc92 to 6d02516 Compare February 21, 2026 22:27

zhichli requested a review from rebornix February 21, 2026 22:28

zhichli force-pushed the zhichli/otel branch 2 times, most recently from 43dcf31 to 973e678 Compare February 25, 2026 23:15

zhichli added 23 commits February 26, 2026 10:42

refactor: reorder OTel type imports for consistency

a59d81b

refactor: reorder OTel type imports for consistency

ea6c1db

feat(otel): record token usage and time-to-first-token metrics

c02a831

Add gen_ai.client.token.usage (input/output) and copilot_chat.time_to_first_token histogram metrics at the fetchMany success path where token counts and TTFT are available from the processSuccessfulResponse result.

docs: finalize sprint plan with completion status

8077529

style: apply formatter changes to OTel files

e6de2db

style: apply formatter changes

d35b029

fix: correct import paths in otelContrib and add IOTelService to test

b5e5484

feat: add diagnostic span exporter to log first successful export and…

41864a4

… failures

fix: register IOTelService in chatLib setupServices for NES test

91691f7

fix: register OTel ConfigKey settings in Advanced namespace for confi…

f60bc0f

…gurations test

fix: register IOTelService in shared test services (createExtensionUn…

12eb492

…itTestingServices)

fix: register IOTelService in platform test services

bce908d

zhichli force-pushed the zhichli/otel branch from c04fc51 to 480b1c8 Compare February 26, 2026 18:42

zhichli added 2 commits February 26, 2026 10:59

zhichli added 2 commits February 26, 2026 11:42

zhichli removed the request for review from rebornix February 26, 2026 20:14

zhichli added 20 commits February 26, 2026 12:17

debug(otel): add debug attribute to verify trace context capture in B…

d875126

…YOK path

fix: remove unnecessary 'google' from non-wrapper vendor set

bb56a14

feat(otel): add server.address to CAPI/Azure BYOK chat spans

29a5e11

Extract hostname from urlOrRequestMetadata when it's a URL string and set as server.address on the chat span. Works for both CAPI and CopilotLanguageModelWrapper (Azure BYOK) paths.

feat(otel): add max_tokens and output_messages to Anthropic/Gemini ch…

ea3ee92

…at spans - gen_ai.request.max_tokens from model.maxOutputTokens - gen_ai.output.messages (opt-in) from response text - Closes remaining attribute gaps vs CAPI/Azure BYOK spans

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(otel): Add OpenTelemetry GenAI instrumentation to Copilot Chat#3917

feat(otel): Add OpenTelemetry GenAI instrumentation to Copilot Chat#3917
zhichli wants to merge 47 commits intomainfrom
zhichli/otel

zhichli commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhichli commented Feb 21, 2026

Summary

What's included

Phase 0 — Foundation

Phase 1 — Wiring into chat extension

Activation

Span hierarchy (Agent mode)

Testing

Risk

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant