Skip to content

feat(otel): Add OpenTelemetry GenAI instrumentation to Copilot Chat#3917

Draft
zhichli wants to merge 47 commits intomainfrom
zhichli/otel
Draft

feat(otel): Add OpenTelemetry GenAI instrumentation to Copilot Chat#3917
zhichli wants to merge 47 commits intomainfrom
zhichli/otel

Conversation

@zhichli
Copy link
Member

@zhichli zhichli commented Feb 21, 2026

Summary

Adds opt-in OpenTelemetry instrumentation to Copilot Chat following the OTel GenAI semantic conventions. Emits traces, metrics, and events for LLM calls, tool executions, agent orchestration, and embeddings. Existing telemetry (ITelemetryService) is unchanged.

What's included

Phase 0 — Foundation

  • IOTelService interface + OTelServiceImpl (Node) with DI registration
  • NoopOtelService for disabled/test/web paths
  • Config resolver with layered env precedence (COPILOT_OTEL_* > OTEL_* > VS Code settings > defaults)
  • GenAI semantic convention constants (genAiAttributes.ts)
  • Message formatters for OTel GenAI JSON schema
  • Metric instruments (gen_ai.client.token.usage, gen_ai.client.operation.duration, copilot_chat.*)
  • Event emitters (gen_ai.client.inference.operation.details, session/tool/agent events)
  • File exporters (JSON-lines fallback for CI/offline)
  • OTLP HTTP + gRPC + console exporter support

Phase 1 — Wiring into chat extension

  • Inference spans (chat {model}) in chatMLFetcher.ts — model, tokens, TTFT, finish reasons
  • Tool spans (execute_tool {name}) in toolsService.ts — tool name/type/id, args/results (opt-in)
  • Agent spans (invoke_agent {participant}) in toolCallingLoop.ts — parent span for full hierarchy
  • Embeddings spans (embeddings {model}) in remoteEmbeddingsComputer.ts
  • Content capture — full messages, responses, system instructions, tool definitions (opt-in via COPILOT_OTEL_CAPTURE_CONTENT=true)
  • Metrics recording at all instrumentation points
  • Diagnostic exporter logs first successful export for easy verification

Activation

Off by default. Enable via env vars:

COPILOT_OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Respects telemetry.telemetryLevel — globally disabled when telemetry is off.

Span hierarchy (Agent mode)

invoke_agent copilot                    [INTERNAL]
  ├── chat gpt-4o                       [CLIENT]
  ├── execute_tool readFile             [INTERNAL]
  ├── execute_tool runCommand           [INTERNAL]
  ├── chat gpt-4o                       [CLIENT]
  └── ...

Testing

  • 63 unit tests across 6 test files covering config, formatters, events, metrics, file exporters, noop service
  • Verified E2E with local Jaeger (OTLP HTTP on :4318)

Risk

  • Bundle size: OTel deps added (~200KB budget)
  • Zero overhead when disabled (noop providers)
  • No changes to existing ITelemetryService code paths

@zhichli zhichli requested a review from rebornix February 21, 2026 22:28
@zhichli zhichli force-pushed the zhichli/otel branch 2 times, most recently from 43dcf31 to 973e678 Compare February 25, 2026 23:15
Phase 0 complete:
- spec.md: Full spec with decisions, GenAI semconv, dual-write, eval signals,
  lessons from Gemini CLI + Claude Code
- plan.md: E2E demo plan (chat ext + eval repo + Azure backend)
- src/platform/otel/: IOTelService, config, attributes, metrics, events,
  message formatters, NodeOTelService, file exporters
- package.json: Added @opentelemetry/* dependencies

OTel opt-in behind OTEL_EXPORTER_OTLP_ENDPOINT env var.
- Register IOTelService in DI (NodeOTelService when enabled, NoopOTelService when disabled)
- Add OTelContrib lifecycle contribution for OTel init/shutdown
- Add `chat {model}` inference span in ChatMLFetcherImpl._doFetchAndStreamChat()
- Add `execute_tool {name}` span in ToolsService.invokeTool()
- Add `invoke_agent {participant}` parent span in ToolCallingLoop.run()
- Record gen_ai.client.operation.duration, tool call count/duration, agent metrics
- Thread IOTelService through all ToolCallingLoop subclasses
- Update test files with NoopOTelService
- Zero overhead when OTel is disabled (noop providers, no dynamic imports)
- Add `embeddings {model}` span in RemoteEmbeddingsComputer.computeEmbeddings()
- Add VS Code settings under github.copilot.chat.otel.* in package.json
  (enabled, exporterType, otlpEndpoint, captureContent, outfile)
- Wire VS Code settings into resolveOTelConfig in services.ts
- Add unit tests for:
  - resolveOTelConfig: env precedence, kill switch, all config paths (16 tests)
  - NoopOTelService: zero-overhead noop behavior (8 tests)
  - GenAiMetrics: metric recording with correct attributes (7 tests)
…porters

- messageFormatters: 18 tests covering toInputMessages, toOutputMessages,
  toSystemInstructions, toToolDefinitions (edge cases, empty inputs, invalid JSON)
- genAiEvents: 9 tests covering all 4 event emitters, content capture on/off
- fileExporters: 5 tests covering write/read round-trip for span, log, metric
  exporters plus aggregation temporality

Total OTel test suite: 63 tests across 6 files
Add gen_ai.client.token.usage (input/output) and copilot_chat.time_to_first_token
histogram metrics at the fetchMany success path where token counts and TTFT
are available from the processSuccessfulResponse result.
… token usage

Wire emitInferenceDetailsEvent into fetchMany success path where full
token usage (prompt_tokens, completion_tokens), resolved model, request ID,
and finish reasons are available from processSuccessfulResponse.

This follows the OTel GenAI spec pattern:
- Spans: timing + hierarchy + error tracking
- Events: full request/response details including token counts

The data mirrors what RequestLogger captures for chat-export-logs.json.
Per the OTel GenAI agent spans spec, add gen_ai.usage.input_tokens and
gen_ai.usage.output_tokens as Recommended attributes on the invoke_agent span.

Tokens are accumulated across all LLM turns by listening to onDidReceiveResponse
events during the agent loop, then set on the span before it ends.

Ref: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
Defer the `chat {model}` span completion from _doFetchAndStreamChat to
fetchMany where processSuccessfulResponse has extracted token counts.

The chat span now carries:
- gen_ai.usage.input_tokens (prompt_tokens)
- gen_ai.usage.output_tokens (completion_tokens)
- gen_ai.response.model (resolved model)

The span handle is returned from _doFetchAndStreamChat via the result
object so fetchMany can set attributes and end it after tokens are known.

This matches the chat-export-logs.json pattern where each request entry
carries full usage data alongside the response.
…gs/results)

- Chat spans: add copilot.debug_name attribute for identifying orphan spans
- Chat spans: capture gen_ai.input.messages and gen_ai.output.messages when captureContent enabled
- Tool spans: capture gen_ai.tool.call.arguments and gen_ai.tool.call.result when captureContent enabled
- Extension chat endpoint: capture input/output messages when captureContent enabled
- Add CopilotAttr.DEBUG_NAME constant
- Change gen_ai.provider.name from 'openai' to 'github' for CAPI models
- Rename CopilotAttr to CopilotChatAttr, prefix values with copilot_chat.*
- Add GITHUB to GenAiProviderName enum
- Replace copilot.debug_name with gen_ai.agent.name on chat spans
- Add gen_ai.request.temperature, gen_ai.request.top_p to chat spans
- Add gen_ai.response.id, gen_ai.response.finish_reasons on success
- Add gen_ai.usage.cache_read.input_tokens from cached_tokens
- Add copilot_chat.request.max_prompt_tokens and copilot_chat.time_to_first_token
- Add gen_ai.tool.description to execute_tool spans
- Fix gen_ai.tool.call.id to read chatStreamToolCallId (was reading nonexistent prop)
- Fix tool result capture to handle PromptTsxPart and DataPart (not just TextPart)
- Add gen_ai.input.messages and gen_ai.output.messages to invoke_agent span (opt-in)
- Move gen_ai.tool.definitions from chat spans to invoke_agent span (opt-in)
- Add gen_ai.system_instructions to chat spans (opt-in)
- Fix error.type raw strings to use StdAttr.ERROR_TYPE constant
- Centralize hardcoded copilot.turn_count and copilot.endpoint_type into CopilotChatAttr
- Add COPILOT_OTEL_CAPTURE_CONTENT=true to launch.json for testing
- Document span hierarchy fixes needed in plan.md
…ation

- Add TraceContext type and getActiveTraceContext() to IOTelService
- Add storeTraceContext/getStoredTraceContext for cross-boundary propagation
- Add parentTraceContext option to SpanOptions for explicit parent linking
- Implement in NodeOTelService using OTel remote span context
- Capture trace context when execute_tool runSubagent fires (keyed by toolCallId)
- Restore parent context in subagent invoke_agent span (via subAgentInvocationId)
- Auto-cleanup stored contexts after 5 minutes to prevent memory leaks
- Update test mocks with new IOTelService methods
- Update plan.md with investigation findings
The previous implementation stored trace context keyed by chatStreamToolCallId
(model-assigned tool call ID), but looked it up by subAgentInvocationId
(VS Code internal invocation.callId UUID). These are different IDs that don't
match across the IPC boundary.

Fix: key by chatRequestId on store side (available on invocation options),
and look up by parentRequestId on subagent side (same value, available on
ChatRequest). Both reference the parent agent's request ID.

Verified: 21-span trace with subagent correctly nested under parent agent.
…YOK chat

- Set gen_ai.request.model on invoke_agent span from endpoint
- Track gen_ai.response.model from last LLM response resolvedModel
- Add copilot_chat.request.max_prompt_tokens to BYOK chat spans
- Document upstream gaps in plan.md (BYOK token usage, programmatic tool IDs)
Tests verify:
- storeTraceContext/getStoredTraceContext round-trip and single-use semantics
- getActiveTraceContext returns context inside startActiveSpan
- parentTraceContext makes child span inherit traceId from parent
- Independent spans get different traceIds without parentTraceContext
- Full subagent flow: store context in tool call, retrieve in subagent
…rphan spans

- Set gen_ai.response.finish_reasons on BYOK chat success
- Set copilot_chat.time_to_first_token on BYOK chat success
- Document Gap 4: duplicate orphan spans from CopilotLanguageModelWrapper
- Identify all orphan span categories (title, progressMessages, promptCategorization, wrapper)
…sage data

The copilotLanguageModelWrapper orphan spans are the actual CAPI HTTP
handlers, not duplicates. They contain real token usage, cache read tokens,
resolved model names, and temperature — all missing from the consumer-side
extChatEndpoint spans due to VS Code LM API limitations.

Updated plan.md with:
- Side-by-side attribute comparison table
- Three fix approaches (context propagation, span suppression, enrichment)
- Recommendation: Option 1 (propagate trace context through IPC)
@zhichli zhichli removed the request for review from rebornix February 26, 2026 20:14
…spans

- Pass _otelTraceContext through modelOptions alongside _capturingTokenCorrelationId
- Inject IOTelService into CopilotLanguageModelWrapper
- Wrap makeRequest in startActiveSpan with parentTraceContext when available
- This creates a byok-provider bridge span that makes chatMLFetcher's chat span
  a child of the original invoke_agent trace, bringing real token usage data
  into the agent trace hierarchy
…ified working

Verified: 63-span trace with Azure BYOK (gpt-5) correctly shows:
- byok-provider bridge spans linking wrapper chat spans into agent trace
- Real token usage (in:21458 out:1730 cache:19072) visible on wrapper chat spans
- hasCtx:true on all extChatEndpoint spans confirming context capture
- Two subagent invoke_agent spans correctly nested under main agent
- Zero orphan copilotLanguageModelWrapper spans
…ext propagation

Add runWithTraceContext() to IOTelService — sets parent trace context
without creating a visible span. The wrapper's chat spans now appear
directly as children of invoke_agent, eliminating the noisy
byok-provider intermediary span.

Before: invoke_agent → byok-provider → chat (wrapper)
After:  invoke_agent → chat (wrapper)
The extChatEndpoint no longer creates its own chat span. The wrapper's
chatMLFetcher span (via CopilotLanguageModelWrapper) is the single source
of truth with full token usage, cache data, and resolved model.

Before: invoke_agent → chat (empty, extChatEndpoint) + chat (rich, wrapper)
After:  invoke_agent → chat (rich, wrapper only)
…c, Gemini)

The previous commit removed the extChatEndpoint chat span, which was correct
for Azure/OpenAI BYOK (served by CopilotLanguageModelWrapper via chatMLFetcher).
But Anthropic and Gemini BYOK providers call their native SDKs directly,
bypassing CopilotLanguageModelWrapper — so they need the consumer-side span.

Now: always create a chat span in extChatEndpoint with basic metadata
(model, provider, response.id, finish_reasons). For wrapper-based providers,
the chatMLFetcher also creates a richer sibling span with token usage.
Only create the extChatEndpoint chat span for non-wrapper providers
(Anthropic, Gemini) that need it as their only span. Wrapper-based
providers (Azure, OpenAI, OpenRouter, Ollama, xAI) get a single rich
span from chatMLFetcher via CopilotLanguageModelWrapper.

Result: 1 chat span per LLM call for all provider types.
…vider

Move chat span creation into AnthropicLMProvider where actual API response
data (token usage, cache reads) is available. The span is linked to the
agent trace via runWithTraceContext and enriched with:
- gen_ai.usage.input_tokens / output_tokens
- gen_ai.usage.cache_read.input_tokens
- gen_ai.response.model / response.id / finish_reasons

Remove consumer-side extChatEndpoint span for all vendors (nonWrapperVendors
now empty) since both wrapper-based and Anthropic providers create their
own spans with full data.

Next: apply same pattern to Gemini provider.
- Add OTel chat span with full usage data to GeminiNativeBYOKLMProvider
- Remove all consumer-side span code from extChatEndpoint (dead code)
- Each provider now owns its chat span with real API response data:
  * CAPI: chatMLFetcher
  * OpenAI-compat BYOK: CopilotLanguageModelWrapper → chatMLFetcher
  * Anthropic: AnthropicLMProvider
  * Gemini: GeminiNativeBYOKLMProvider
- Fix Gemini test to pass IOTelService
Add to both providers:
- copilot_chat.request.max_prompt_tokens (model.maxInputTokens)
- server.address (api.anthropic.com / generativelanguage.googleapis.com)
- gen_ai.conversation.id (requestId)
- copilot_chat.time_to_first_token (result.ttft)

Now matches CAPI chat span attribute parity.
Extract hostname from urlOrRequestMetadata when it's a URL string
and set as server.address on the chat span. Works for both CAPI
and CopilotLanguageModelWrapper (Azure BYOK) paths.
…at spans

- gen_ai.request.max_tokens from model.maxOutputTokens
- gen_ai.output.messages (opt-in) from response text
- Closes remaining attribute gaps vs CAPI/Azure BYOK spans
When model responds with tool calls instead of text, the output_messages
attribute was empty. Now captures both text parts and tool call parts
in the output_messages, matching the OTel GenAI output messages schema.

Also: Azure BYOK invoke_agent zero tokens is a known upstream gap —
extChatEndpoint returns hardcoded usage:0 since VS Code LM API doesn't
expose actual usage from the provider side.
… BYOK spans

Same fix as CAPI — when model responds with tool calls, include them
in gen_ai.output.messages alongside text parts. All three provider
paths (CAPI, Anthropic, Gemini) now consistently capture both text
and tool call parts in output messages.
… spans

- gen_ai.input.messages (opt-in) captured from provider messages parameter
- gen_ai.agent.name set to AnthropicBYOK / GeminiBYOK for identification

Closes the last attribute gaps vs CAPI/Azure BYOK chat spans.
- Map enum role values to names (1→user, 2→assistant, 3→system)
- Extract text from LanguageModelTextPart content arrays instead of
  showing '[complex]' for all messages
- Use OTel GenAI input messages schema with role + parts format
Coverage matrix showing:
- Anthropic/Gemini BYOK missing: operation.duration, token.usage,
  time_to_first_token metrics, and inference.details event
- CAPI and Azure BYOK (via wrapper) fully covered
- Tool/agent/session metrics covered across all providers
- 4 tasks (M1-M4) to close the gap
… providers

Both providers now record:
- gen_ai.client.operation.duration histogram
- gen_ai.client.token.usage histograms (input + output)
- copilot_chat.time_to_first_token histogram
- gen_ai.client.inference.operation.details log event

All metrics/events now have full parity across CAPI, Azure BYOK,
Anthropic BYOK, and Gemini BYOK.
… v2)

The OTel SDK v2 changed the LoggerProvider constructor option from
'logRecordProcessors' to 'processors'. The old key was silently
ignored, causing all log records to be dropped.

This is why logs never appeared in Loki despite traces working fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant