Skip to content

Derive portable artifact roots from versionControlProvenance at emit-finalize#2973

Merged
michaelcfanning merged 3 commits into
devfrom
ai-emit-vcp-rebase
Jun 6, 2026
Merged

Derive portable artifact roots from versionControlProvenance at emit-finalize#2973
michaelcfanning merged 3 commits into
devfrom
ai-emit-vcp-rebase

Conversation

@michaelcfanning
Copy link
Copy Markdown
Member

@michaelcfanning michaelcfanning commented Jun 5, 2026

Summary

Replaces the caller-supplied --srcroot rewrite at emit-finalize with a VCP-driven rebasing visitor. At finalize, EmitFinalizeRebaseVisitor deconstructs absolute local file paths into relative URIs plus portable, per-repository uriBaseIds derived from run.versionControlProvenance. The shipped SARIF carries no machine-specific path.

Portable-root derivation supports github.com and dev.azure.com in this release:

  • github.com → clickable commit permalink https://github.com/<owner>/<repo>/blob/<revisionId>/.
  • dev.azure.com → commit-less repository root https://dev.azure.com/<org>/<project>/_git/<repo>/. ADO per-file web URLs are query-based (?path=...&version=GC<sha>) and cannot serve as a SARIF uriBaseId prefix (RFC 3986 relative resolution drops the query), so commit pinning rides on versionControlProvenance.revisionId, which finalize preserves and GHAzDO ingestion reads.

Follow-up to #2971 (get-schema + emit-run rename + receipt-time SRCROOT-exists check). Fast-follow scope tracked in #2972.

Contract (hard-enforced at finalize; FAILURE before any file is written)

  • Each run declares ≥1 versionControlProvenance entry whose mappedTo carries only a uriBaseId, binding a declared originalUriBaseIds root.
  • repositoryUri must be an absolute http(s) URI at the host's default port; revisionId non-empty.
  • A repositoryUri carrying credentials (user:password@), a query, or a fragment fails closed, and all diagnostics are sanitized so a secret can never surface in an error message.
  • Host derivation: github.com (<owner>/<repo>) and dev.azure.com (<org>/<project>/_git/<repo>); a .git suffix is stripped and empty / dotted / multi-segment repo names are rejected through a single shared guard. Other hosts fail with a clear, planned-follow-up message.
  • Base-id minting: one repo → bare SRCROOT; multiple → SRCROOT_<REPO-LEAF> with _2,_3 collision suffix. Mixed github + ADO repos in one run each get a distinct base. Artifact locations attribute to the owning repo by longest-prefix match on the mapped local root.
  • An absolute local path that no declared root resolves fails closed (warn-and-ship would leak the machine path) — confirmed with @michaelcfanning.

emit-run stamping

TryStampVcp sets versionControlProvenance[0].mappedTo.uriBaseId = "SRCROOT" when (a) the run declares originalUriBaseIds["SRCROOT"] and (b) the entry has no caller-supplied mappedTo. Multi-entry provenance is left untouched; a caller-authored mappedTo is never overridden. emit-run stays host-agnostic.

Generator reconstructs provenance (no hand-patching)

CweGenerateSample.ps1 previously resolved repositoryUri live from the origin remote but hardcoded revisionId/branch to a frozen github sha. Copying the taxonomy into another repo (e.g. Azure DevOps) and running it produced an incoherent versionControlProvenance — a repositoryUri for repo X paired with a commit that exists only in github sarif-sdk — that had to be hand-patched before publishing.

The script now resolves the whole triple atomically, never per-field:

  • Default → full live reconstruction from the git working tree (origin remote, HEAD, abbrev branch), with actionable failures for a missing remote, an unborn HEAD, or a detached HEAD.
  • -Deterministic → the canonical fixture pin (the v4.5.0 commit 84f83c81…), so the checked-in fixtures regenerate byte-identically on any machine, commit, or fork. The byte-gate test passes this.
  • -RevisionId / -Branch layer onto live mode (explicit anchoring / detached HEAD). The -GHAzDO variant stamps BUILD_REPOSITORY_URI / BUILD_SOURCEVERSION / BUILD_SOURCEBRANCH from the same resolved values so AdoPipelineContext's field-by-field agreement check passes.

A new live-coherence [Fact] proves default mode stamps this repo's real HEAD (not the pin) and restores the fixture afterward, so the working tree stays clean.

Genchi genbutsu (end-to-end proof)

Copied the Taxonomies directory into a live Azure DevOps repository and regenerated with no patch:

  • SRCROOT anchored at the ADO repository root; revisionId auto-resolved to the ADO HEAD; branch refs/heads/main.
  • Validation 0 errors / 0 warnings / 0 notes for both variants; 0 machine-path leaks.
  • GHAzDO ingestion of the -GHAzDO variant returned HTTP 200.

Validation

  • Full Test.UnitTests.Sarif.Multitool.Library suite: 319 passed, 1 skipped (incl. 8 new ADO / credential / mixed-host visitor [Fact]s).
  • Cwe byte-identical fixture gate + live-coherence test: green.
  • ReleaseHistory style test: green.
  • Release build with EnforceCodeStyleInBuild (IDE0005-as-error, the CI gate): clean.

Deferred to #2972

Receipt-time VCP/mappedTo validation at emit-run; non-cloud ADO shapes (<org>.visualstudio.com, collection/virtual-directory) and GHE; submodule / multi-repo collision-suffix determinism; SSH→HTTPS normalization; test-design smell review.

michaelcfanning and others added 3 commits June 5, 2026 07:44
…finalize

Replace the caller-supplied `--srcroot` rewrite with an EmitFinalizeRebaseVisitor
that deconstructs absolute local file paths into relative URIs plus portable,
per-repository uriBaseIds anchored at GitHub blob permalinks derived from
run.versionControlProvenance. The shipped SARIF carries no machine-specific path.

emit-finalize now hard-requires each run to declare at least one
versionControlProvenance entry whose mappedTo.uriBaseId (and only a uriBaseId)
binds a declared originalUriBaseIds root with an absolute http(s) repositoryUri
and a non-empty revisionId. A single repository collapses to the bare SRCROOT
base; multiple repositories each receive SRCROOT_<REPO-LEAF>, disambiguated by an
ordinal suffix on collision. Portable derivation is github.com-only in this
release (default port; owner/repo path); other hosts fail with a clear, planned
follow-up message. A local path that no declared root resolves fails finalize
rather than shipping a machine-specific path. Rebasing runs after enrichment
reads sources from the local file:// bases and before serialization; on failure
the verb writes each error to stderr and returns FAILURE before any file is
written.

emit-run's auto-stamping now binds the single source-repo entry to the source
root: it sets versionControlProvenance[0].mappedTo.uriBaseId = "SRCROOT" when the
run declares originalUriBaseIds["SRCROOT"] and the caller supplied no mappedTo,
staying hands-off for multi-entry provenance and never overriding a caller's
binding. emit-run remains host-agnostic; only finalize is github-only.

Rewrite CweGenerateSample.ps1 to drop --srcroot and anchor the fixture's
revisionId at a real, immutable sarif-sdk commit (the v4.5.0 release tag) so the
derived permalink resolves to a real blob; regenerate both Cwe sample fixtures.

Fast-follow tracked in #2972: receipt-time validation, non-github hosts,
multi-repo/submodule fixtures, and a test-design smell review.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ruct sample provenance

emit-finalize now derives a portable per-repository root for dev.azure.com
repositories alongside github.com. The github host keeps its clickable
/blob/<revisionId>/ permalink; the Azure DevOps host resolves to the
commit-less dev.azure.com/<org>/<project>/_git/<repo>/ repository root, because
ADO per-file web URLs are query-based (?path=...&version=GC<sha>) and cannot
serve as a SARIF uriBaseId prefix (RFC 3986 relative resolution drops the
query). Commit pinning therefore rides on versionControlProvenance.revisionId,
which finalize preserves and GHAzDO ingestion reads.

The host dispatch is hardened: a repositoryUri carrying credentials, a query,
or a fragment fails closed, and diagnostics are sanitized so a secret can never
surface in an error message. Repository-name segments are normalized through a
single guard that strips a .git suffix and rejects empty / dotted / multi-segment
names. github and ADO roots can be minted in the same run, each getting a
distinct SRCROOT_<REPO> base.

CweGenerateSample.ps1 previously resolved repositoryUri live from the origin
remote but hardcoded revisionId/branch to a frozen github sha. Copying the
taxonomy into another repository (e.g. Azure DevOps) and running it produced an
incoherent versionControlProvenance — a repositoryUri for repo X paired with a
commit that exists only in github sarif-sdk — that had to be hand-patched before
it could be published. The script now resolves the whole triple atomically:
the live git working tree by default (full reconstruction wherever the taxonomy
is run), or the canonical fixture pin under -Deterministic. The two are never
mixed. -RevisionId / -Branch layer onto live mode for detached-HEAD or explicit
anchoring; the -GHAzDO variant stamps BUILD_REPOSITORY_URI / BUILD_SOURCEVERSION
/ BUILD_SOURCEBRANCH from the same resolved values so AdoPipelineContext's
field-by-field agreement check passes. The byte-gate test passes -Deterministic
so the checked-in fixtures regenerate identically; a new live-coherence test
proves default mode stamps the real HEAD and restores the fixture afterward.

Verified end to end by copying the taxonomy into a live Azure DevOps repository
and regenerating with no patch: SRCROOT anchored at the ADO repository root,
revisionId auto-resolved to the ADO HEAD, validation 0/0/0, and GHAzDO ingestion
returned HTTP 200.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A CI / pipeline checkout (GitHub Actions, Azure DevOps) lands on a detached
HEAD, where `git rev-parse --abbrev-ref HEAD` returns `HEAD` and names no
branch. Default-mode provenance reconstruction therefore threw, which broke the
live-coherence test on the hosted runner and — more importantly — would break
any consumer that copied this taxonomy into their own pipeline and ran it
unattended, the exact "must hand-patch to get a valid SARIF" failure the
reconstruction work set out to remove.

When git cannot name the branch, fall back to the ref the CI system publishes
for the same checkout: BUILD_SOURCEVERSION's companion BUILD_SOURCEBRANCH on
Azure DevOps, GITHUB_REF on GitHub Actions. That ref points at the very commit
HEAD is parked on, so it is still coherent live provenance — not the
canonical-pin mixing the atomic resolver forbids. The vars are read before
Set-AdoEnv scrubs them for the multitool subprocess, so the parent value is
intact. Only when no CI ref exists either does the script fail, directing the
caller to -Branch or -Deterministic.

Verified by detaching HEAD locally with GITHUB_REF set: default mode resolves
revisionId to the live HEAD and branch to the CI ref, validation 0/0/0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@michaelcfanning michaelcfanning marked this pull request as ready for review June 6, 2026 16:01
@michaelcfanning michaelcfanning requested a review from cfaucon as a code owner June 6, 2026 16:01
@michaelcfanning michaelcfanning merged commit 49208ee into dev Jun 6, 2026
6 checks passed
michaelcfanning added a commit that referenced this pull request Jun 6, 2026
* emit-init-run: auto-stamp ADO pipeline automationDetails from env + GHAzDO sample (#2929)

* emit-init-run: auto-stamp ADO pipeline automationDetails from env + GHAzDO sample

Adds AdoPipelineContext, which detects an Azure DevOps pipeline
execution context from the standard predefined environment variables
and stamps run.automationDetails so producers that run inside ADO
pipelines automatically satisfy GHAzDO1019 and GHAzDO1020 with no
additional CLI flags.

- TryDetect is three-state (None / Partial / Complete). Partial fails
  loudly with a per-variable diagnostic before any file-system side
  effects so a misconfigured pipeline never emits a half-stamped SARIF.
- ApplyTo writes the canonical
  azuredevops/pipeline/build/<org>/<projectId>/<buildDefId>/<phaseId>/<branchRef>/<buildId>
  id and the four azuredevops/pipeline/build/* property keys ADO
  Advanced Security ingestion validates.
- Composes with the existing --automation-guid / --automation-correlation-guid
  flags; never overwrites a producer-supplied guid/correlationGuid.

CweGenerateSample.ps1 grows a -GHAzDO switch that produces the new
CweGHAzDoSample.sarif fixture alongside the existing CweSample.sarif.
The script populates the ADO env vars for the duration of emit-init-run
so AdoPipelineContext stamps automationDetails, then patches
tool.driver.fullName post-finalize so GHAzDO1018 passes. Default-mode
runs explicitly clear those same env vars so a developer shell with
TF_BUILD=True can never drift the AI-shape fixture.

CweGHAzDoSample.sarif validates with zero errors, zero warnings, and
zero notes under --rule-kind Sarif;AI;GHAzDO. CweGeneratedSampleTests
covers both fixtures with byte-identical regression gates as separate
[Fact]s sharing one private helper.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trim ReleaseHistory bullets + add copilot-instructions.md

The two bullets I just added for env-driven ADO stamping and the
GHAzDO sample fixture were PR-description-sized, not release-note-sized.
Trimmed both to match the style of their neighbors (single self-contained
sentence + concrete names + minimal facts a downstream consumer needs).
The full narrative — three-state detection prose, env-var precedence
table, composition guarantees — already lives on PR #2929 where it
belongs.

Adds .github/copilot-instructions.md so future agents in this repo see
the release-notes-vs-PR-description distinction up front, plus the
house idioms that come up repeatedly in code review (no [Theory],
GHAzDO casing, AI ruleId convention, sample-fixture convention,
side-effects-after-detection, internals-via-InternalsVisibleTo).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Port SARIF AI generation guidance from ai-plugins to sarif-sdk (#2930)

Make sarif-sdk the single source of truth for the SARIF spec markdown,
the AI-generated-findings profile, and the agent skills that emit and
validate AI SARIF.

Adds:
- docs/spec/sarif-v2.1.0-spec.md
  Convenience markdown rendering of the OASIS SARIF 2.1.0 specification
  (Plus Errata 01). The OASIS-published document is canonical; IPR notice
  preserved at top of file.
- docs/ai/generating-sarif.md
  Normative guidance for representing AI/LLM-produced security findings
  as first-class SARIF: ai/origin declaration, tool identity, result
  structure, exploitability and attacker-position vocabulary, evidence
  model, redaction, notification taxonomy (AI/EXEC/*, AI/CFG/*), and
  the full AI rule-pack appendix. Includes a Mermaid object-model
  diagram in the appendix.
- docs/ai/example.sarif
  Comprehensive reference SARIF log that conforms to the AI profile.
  Passes `dotnet sarif validate --rule-kind 'Sarif;AI'` cleanly.
- skills/emit-sarif-findings/SKILL.md
  Agent-operating procedure for emitting AI SARIF using the
  Sarif.Multitool emit verbs (emit-init-run, add-result,
  add-notification, emit-finalize --validate). Multitool-only;
  cross-references docs/ai/generating-sarif.md as the normative source.
- skills/validate-sarif-findings/SKILL.md
  Agent-operating procedure for validating AI SARIF. Uses
  `--rule-kind 'Sarif;AI'` against the multitool's AI rule pack
  (AI1003-AI2019) plus the standard SARIF rules in one pass.

Updates:
- README.md adds a short pointer section to the new spec, guidance,
  and skills directories.
- docs/multitool-usage.md gains a 'Modes' table entry for each of the
  new emit verbs (emit-init-run, add-result, add-notification,
  emit-finalize) plus a worked example.

Verification gates run before commit:
- `dotnet sarif validate docs/ai/example.sarif --rule-kind 'Sarif;AI'`
  reports 0 errors.
- End-to-end smoke test (init -> add-result -> finalize --validate)
  produces a SARIF file with 1 result, 1 rule (CWE-78 enriched from
  the embedded MITRE CWE taxonomy).
- All skill command snippets match actual --help output for the
  relevant verb at Sarif.Multitool 5.0.0.

Companion work (separate PR in microsoft/ai-plugins):
- Delete plugins/sarif/ entirely; the canonical home is now this
  repository.
- Retool Swallowtail (and other AI-detector plugins in ai-plugins)
  to invoke Sarif.Multitool emit verbs directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add PublishSampleToGhazdo.ps1 + clone-aware CweGenerateSample.ps1 (#2931)

`CweGenerateSample.ps1` now derives `--vcp-repositoryuri` and the
`emit-finalize --srcroot` prefix from `git -C $repoRoot remote get-url
origin`, falling back to `https://github.com/microsoft/sarif-sdk` when
origin is unset. On the canonical microsoft/sarif-sdk clone the generated
fixtures (CweSample.sarif, CweGHAzDoSample.sarif) are byte-identical to
the previous hardcoded form. GitHub origins get a `<repo>/blob/main/`
SRCROOT prefix; other hosts (including ADO) get the bare repo URL with
a trailing slash.

Adds `src/Sarif/Taxonomies/PublishSampleToGhazdo.ps1` -- POSTs a gzipped
SARIF to the GHAzDO SARIFs ingestion endpoint
(`/{org}/{project}/_apis/alert/repositories/{repo}/sarifs?api-version=
7.2-preview.1` on advsec.dev.azure.com, fallback dev.azure.com). Target
org/project/repo are parsed from runs[0].versionControlProvenance[0]
.repositoryUri; PAT is read from the ADO_PAT environment variable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Scrub Microsoft-internal references from AI guidance port (#2932)

Public-OSS hygiene pass on the SARIF AI guidance and skills.
Descriptor ids that are already shipped in the SDK (AI/EXEC/ALAS-SIGNAL
in AI2018.ProvideExecutionSignalArtifact and AI1014's AI/EXEC/* and
AI/CFG/* prefixes) are kept as-is so the docs match the current SDK
implementation.

Changes:
- Drop ALAS expansion and neutralize the signal-payload schema
  (descriptor id kept; no payload schema was ever enforced by the SDK).
- Replace ProjectApi with FastAPI (five sites) in API-handler examples.
- Replace 'Geneva cluster' with 'telemetry cluster' in a deployment
  example.
- Replace example rule id SWT-CPP-001 with ACME-CPP-001.
- Replace author: mikefan with sarif-sdk-maintainers in both skill
  frontmatters.
- Soften a reference to an unpublished companion remediation guidance
  document.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add multitool add-reporting-descriptor verb

Appends a fully-formed SARIF reportingDescriptor JSON object — supplied
via --input <path> or stdin — to the staged event log produced by
emit-init-run.

Two targets:
* Default → run.tool.driver.notifications[]. AI producers routinely emit
  notification descriptors (progress, telemetry, config errors). No id
  convention is enforced; notifications use opaque ids.
* --rules → run.tool.driver.rules[]. Gated against
  AIRuleIdConvention.IsNovel so only NOVEL- novel-finding descriptors
  are accepted. Taxonomy-mapped rule descriptors (e.g., CWE-89) come
  from the taxonomy enricher at finalize time, not from this verb.

Each descriptor id may appear at most once per event log. The verb scans
the existing event log on receipt and rejects duplicates against either
a prior add-reporting-descriptor event of the same target OR a
descriptor pre-populated on the run-header. A --force escape hatch is
acknowledged in error text but intentionally out of v1 scope.

Event-log plumbing:
* Adds SarifEventKinds.RuleDescriptor ("rule-descriptor") and
  SarifEventKinds.NotificationDescriptor ("notification-descriptor"),
  threaded through SarifEventLogReader's kind allow-list.
* SarifEventReplayer buffers descriptor events and merges them into the
  target list BEFORE RegisterDescriptorsFromResults runs. This ordering
  matters: auto-registration synthesizes minimal descriptors only for
  ruleIds that aren't already represented, so an explicit NOVEL-
  descriptor pre-empts the minimal one. Header pre-populated descriptors
  are preserved by reference; the verb's emit-time dedup blocks
  id collisions between header and events.
* New event kinds are additive within CurrentSchemaVersion = 1; older
  readers will skip unknown kinds harmlessly, matching the forward-
  compat shape used when Notification / Invocation kinds were added.

Tests:
* 16 [Fact] tests on AddReportingDescriptorCommand covering both happy
  paths (notifications default, --rules), id validation (missing/empty/
  non-string), the NOVEL gate (taxonomy id rejection on --rules path
  only), rich payload round-trip (messageStrings, defaultConfiguration,
  helpUri, properties — including a date-shaped property string to guard
  against Json.NET DateTime coercion), duplicate detection within and
  across targets, duplicate detection against header-pre-populated
  descriptors for both target arrays, missing-wip-file path, and two
  malformed-input cases (bad JSON, non-object root).
* 3 [Fact] tests on SarifEventReplayer covering: rule-descriptor events
  populating rules and pre-empting auto-registration, notification-
  descriptor events populating notifications, and the
  header-pre-populated + events merge semantics.

No [Theory]/[InlineData] — repeated scenarios use shared private
helpers (SeedRunHeader) per house style.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Strip editorial prefixes from AI notification taxonomy (#2934)

Notification descriptor ids now name the concern only — `DECISION`,
`RULED-OUT`, `DATA-ACCESS-DENIED`, `ALAS-SIGNAL`, `TOOL-UNAVAILABLE`,
etc. The previous `AI/EXEC/*` and `AI/CFG/*` prefixes repeated context
the surrounding SARIF already carries: the array
(`toolExecutionNotifications` vs `toolConfigurationNotifications`)
encodes the kind, and `tool.driver.name` encodes the emitter. The
same id MAY now legally appear in both arrays. Suffixing `EXEC` or
`CFG` on every id is like suffixing `Class` on every C# class — the
surrounding context already says what kind of thing it is.

Placement is selected at authoring time: `add-notification` defaults
to `toolExecutionNotifications`; `add-notification --config` (`-c`)
routes to `toolConfigurationNotifications`. The event-log kind
`SarifEventKinds.Notification` splits into `ExecutionNotification`
(`"execution-notification"`) and `ConfigurationNotification`
(`"configuration-notification"`); the replayer routes each to the
matching invocation array.

`AI1014.ExecutionNotificationPlacement` is deleted. Its sole purpose
was enforcing prefix-vs-array consistency, which is structurally
meaningless under the new convention (the array IS the kind).
`AI2018` retains its semantic; the literal id it checks changes from
`AI/EXEC/ALAS-SIGNAL` to `ALAS-SIGNAL`.

BRK by the letter of v4.6.3 (AI1014 was added there), but AI rules
adoption is low and v5.x is the right place for refinement over
back-compat.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Generalize ALAS-SIGNAL notification id to LEARNING-SIGNAL (#2935)

ALAS named a specific consumer (an internal learning system). Under
the convention shipped in #2934, notification ids name the concern,
not the consumer. LEARNING-SIGNAL describes what the signal is,
independent of who reads it.

While here, rename the AI2018 rule from ProvideExecutionSignalArtifact
to ProvideLearningSignalArtifact for consistency: the class checks
the LEARNING-SIGNAL id, the "Execution" qualifier was redundant under
the new convention (placement is encoded by the array, not the id),
and downstream learning systems aren't necessarily reading only the
execution-side array.

Affects: AI2018 rule class + file + RuleId const + 3 resource
keys/messages, the AI2018 row in docs and skills tables, and the
UNRELEASED BRK bullet for #2934 (whose own "ALAS-SIGNAL example"
becomes "LEARNING-SIGNAL", and which now documents the
class-and-id rename together).

BRK on the just-merged BRK (both still UNRELEASED) — favored over
shipping the consumer name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Stamp ReleaseHistory UNRELEASED section as v5.0.0 (#2937)

build.props was already bumped to <VersionPrefix>5.0.0</VersionPrefix>
in #2924 (the SHA-1 BRK). This finishes the v5.0.0 cut by replacing
the UNRELEASED placeholder header in ReleaseHistory.md with the
canonical version banner (Sdk / Driver / Converters / Multitool /
Multitool Library nuget links), matching the v4.6.4 format.

Picked up by #2936 (the dev to main promotion PR).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trim and split the over-descriptive v5.0.0 notification-taxonomy BRK (#2938)

Per release-notes house style, bullets are one or two self-contained
sentences; PR-description prose belongs in the PR. The original bullet
was ~3x the length of its neighbors and re-litigated the motivation.

Split into two tighter bullets:

  1. Convention change + routing mechanism (id-prefix strip, new
     --config switch, event-kind split).
  2. Rule-table changes (AI1014 removal, AI2018 rename).

Drops the "prefixes were redundant because..." explanation, the wire
value parentheticals (`"execution-notification"` etc.), and the
"ALAS named a specific consumer" parenthetical. The change itself
is visible in the renames; the reader doesn't need the rationale.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix CweGenerateSample.ps1 -GHAzDO crash inside real ADO pipelines (#2940)

The mseng microsoft.sarif-sdk pipeline broke on the first build of main
after the v5.x promotion (build 31555367). Symptom:

  CweGenerateSample.ps1 (args: -Configuration Release -GHAzDO) exited with code 1.
  ADO pipeline context is partially configured. Either populate every
  required variable or clear them all.
  Problems:
    BUILD_DEFINITIONID='1234' disagrees with SYSTEM_DEFINITIONID='9978'
    (both name the same pipeline identifier and must match)

Root cause: the deterministic-fixture env override in CweGenerateSample.ps1
stamps BUILD_DEFINITIONID=1234 for byte-stable output, but does not also
override SYSTEM_DEFINITIONID. ADO agents inject both. The verb's
must-match cross-check in AdoPipelineContext.TryDetect (correctly) refuses
to proceed when the two disagree.

Fix the script (not the verb): add SYSTEM_DEFINITIONID alongside
BUILD_DEFINITIONID in the \ ordered hashtable, plus
SYSTEM_JOBID / SYSTEM_JOBNAME alongside SYSTEM_PHASEID / SYSTEM_PHASENAME
for symmetric hygiene (those pairs are exempt from must-match but the
default-mode \ cleanup loop iterates \ and
benefits from covering the agent's full fallback set). The fixture SARIF
bytes do not change — the primary env vars were already set and are the
ones the verb actually reads.

Regression gate: new
  CweGHAzDoSample_RegenerationSucceeds_WhenAmbientAdoFallbackEnvVarsConflict
[Fact] explicitly seeds SYSTEM_DEFINITIONID / SYSTEM_JOBID / SYSTEM_JOBNAME
with values that disagree with the script's deterministic primaries
before invoking the script. Without the script fix it fails the same way
the mseng build did; with the fix it passes byte-identical.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refresh v5.0.0 release-history layout + add prefix legend (#2939)

Three changes, all in ReleaseHistory.md:

1. Add a prefix legend at the top of the file. Codifies the six prefixes
   (DEP / BRK / BUG / NEW / PRF / FUN) and the 'BRK leads each section'
   rule. Footnote notes that older sections may predate the convention.

2. Reorder the v5.0.0 section so all BRK bullets lead (BRK -> NEW -> BUG).
   Pure line shuffling; relative order preserved within each group.

3. Normalize the lone 'BUGFIX:' bullet in v4.6.4 to 'BUG:' (matches the
   legend's canonical form). The deep-history 'BUGFIX, BRK:' entry in
   the v1.x section is left alone — that's immutable shipped state.

No code or schema changes; ReleaseHistory.md only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Replace emit-init-run flags with SARIF Run JSON contract

A consumer agent reports the existing 14 typed CLI flags can't express
multiple versionControlProvenance entries (one of which carries a
properties bag documenting skills in play). Modeling every field as a
flag explodes the surface; the peer emit verbs (add-result,
add-notification, add-reporting-descriptor) already accept fully-formed
SARIF JSON via --input or stdin for exactly this reason, with a
documented rationale that applies more strongly to the run header than
to a single result.

Replace the v5.0.0 flag surface on emit-init-run with the same
input/stdin payload contract. EmitInitRunOptions shrinks to three
properties (OutputFilePath, InputFilePath, ForceOverwrite). The
SarifEventReplayer's documented partial-Run shape (tool, language,
columnKind, defaultEncoding, defaultSourceLanguage, originalUriBaseIds,
versionControlProvenance, automationDetails, baselineGuid,
redactionTokens, ...) is now reachable end-to-end through the verb.

Receipt-time validators (no filesystem side-effects on rejection):
required non-empty-string tool.driver.name; https-only
tool.driver.informationUri and versionControlProvenance[].repositoryUri;
https-or-file originalUriBaseIds["SRCROOT"].uri; canonical 8-4-4-4-12
automationDetails.guid/correlationGuid; exact-match ai/origin in
{generated, annotated, synthesized}; SARIF-log-document rejection;
parent-shape JSON-object enforcement at every nested accessor so a
JValue indexer never throws into the broad catch. ADO stamping is now
JToken-direct so producer-supplied SARIF fields outside the SDK typed
Run model survive the wip-line append; the existing typed-Run
materialization at emit-finalize is the documented boundary at which
non-typed fields are dropped, consistent with every other SDK
round-trip.

AdoPipelineContext.ApplyTo(Run) becomes
bool TryApplyTo(Run, out string error). It stamps automationDetails.id
and the four azuredevops/pipeline/build/* properties only when absent
and fails-with-diagnostic on per-field conflict. The previous
unconditional-overwrite contract was inert in v5.0.0 (the flag surface
couldn't supply those fields) but became a footgun once JSON input
could.

CweGenerateSample.ps1 rewrites its emit-init-run call to construct a
PowerShell hashtable -> ConvertTo-Json -Depth 32 -Compress -> stdin
pipe. Both CweSample.sarif and CweGHAzDoSample.sarif regenerate
byte-identically (verified by CweGeneratedSampleTests, which gates the
fixtures sha-256).

skills/emit-sarif-findings/SKILL.md Step 1 is rewritten to show the
JSON construction; the inputs table picks up the multi-VCP and
properties-bag annotations; the package constraint bumps to
Sarif.Multitool >= 5.1.0. docs/multitool-usage.md's flag example is
replaced with the stdin form.

ReleaseHistory.md gets a new v5.1.0 UNRELEASED section with three
bullets: BRK on the flag-surface removal, BRK on
AdoPipelineContext.ApplyTo, NEW on the JSON-payload contract.

Verification:
- dotnet build src/Sarif.Sdk.sln: 0 warnings, 0 errors.
- Test.UnitTests.Sarif.Multitool.Library: 217 passed, 1 skipped.
- Test.UnitTests.Sarif: 896 passed, 3 skipped.
- Test.UnitTests.Sarif.Driver: 140 passed, 1 skipped.
- CweGeneratedSampleTests (3): pass; both fixtures byte-identical.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Stamp ReleaseHistory v5.0.1 section with nuget links

src/build.props was bumped to <VersionPrefix>5.0.1</VersionPrefix>
in be6fb706 (the emit-init-run JSON-contract change). This finishes
the v5.0.1 cut by replacing the UNRELEASED placeholder header in
ReleaseHistory.md with the canonical version banner (Sdk / Driver /
Converters / Multitool / Multitool Library nuget links), matching the
v5.0.0 format. Folded into #2942 so main is shippable the moment the
promotion lands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trim v5.0.1 release-notes bullets to neighbor density

The three bullets were 4-6 sentences each, embedding validator catalogs
and finalize-time round-trip prose that belong in the PR description,
not in ReleaseHistory.md. Repo style explicitly calibrates against the
neighbors and asks for trim/split when a bullet exceeds ~3x — the BRK
and NEW bullets here now sit at roughly the same density as the v5.0.0
rename bullets above them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add multitool add-invocation verb

Mirrors add-result / add-notification / add-reporting-descriptor: takes a
fully-formed SARIF Invocation JSON object via --input <path> or stdin and
appends it to the staged event log as a SarifEventKinds.Invocation event.

SarifEventReplayer strips run.invocations[] carried on the run header, so
this verb is the only path producers have to populate the array. The verb
imposes no schema beyond must be a JSON object (SARIF makes every field on
Invocation optional); full-log shape validation lives in emit-finalize --validate.

AddInvocationOptions / AddInvocationCommand follow the established pattern.
Program.cs registers and dispatches the new verb. SKILL.md, docs/multitool-usage.md,
and ReleaseHistory.md updated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Drop unused System.Text using in EmitInitRunCommandTests

CI's BuildAndTest.ps1 invokes dotnet build with --no-incremental and
/p:EnforceCodeStyleInBuild=true, which surfaces IDE0005 (unused using)
as an error. Local Debug + default incremental builds skipped the check
and let the unused System.Text directive ride into be6fb706.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix NOVEL ruleId example and clarify enricher ownership of region.snippet

Two corrections to docs/ai/generating-sarif.md flagged by an external

AI-authoring feedback session that retargeted against Sarif.Multitool@5.0.1:

1. The Novel-findings subsection used the slash form ('NOVEL/<sub-id>' on

   result.ruleId, bare 'NOVEL' on the descriptor) - which AddResultCommand

   and AddReportingDescriptorCommand reject at receipt. The canonical form

   (per docs/AI-RuleId-Convention.md and AIRuleIdConvention.s_novelPrefix)

   is the dash-flat 'NOVEL-<sub-id>'; descriptor.id and result.ruleId are

   byte-identical. The obsolete 'ruleIndex required for NOVEL' paragraph is

   removed - each NOVEL- now has a unique id, so the SARIF 3.19.23 non-

   unique-id workaround no longer applies.

2. The Code Context subsection told AI tools to populate region.snippet

   and contextRegion.snippet on every finding. emit-finalize already runs

   InsertOptionalDataVisitor with RegionSnippets | ContextRegionSnippets |

   ComprehensiveRegionProperties | Hashes, reading the file from disk and

   filling these fields itself. The producer SHOULD emit region.startLine

   and region.endLine; the enricher owns everything else. Pre-populating

   wastes tokens and drift-risks the consumer's view of the file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restore RuleKind.Ado as [Obsolete] alias for RuleKind.GHAzDO (#2945)

#2928 renamed RuleKind.Ado to RuleKind.GHAzDO without leaving a
back-compat alias. Restore Ado as an [Obsolete] alias resolving to
the same underlying value (4), so pre-rename source still compiles
and '--rule-kind ado' continues to bind on the multitool CLI via
the existing case-insensitive enum parser. The obsolete-warning
steers new callers off the deprecated spelling without breaking
them. Two new [Fact] tests pin the alias contract (same value;
case-insensitive parse of 'ado' resolves to GHAzDO).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trigger Validate workflow on dev-targeted pushes and PRs

PRs targeting dev were silently skipping the build-and-test /
check-format / build-multitool-for-npm jobs because the workflow
filter was scoped to main only. Adding dev to both the push and
pull_request branch lists so CI gates dev work the same way it
gates main work.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Consolidate v5.0.1 release-notes for actual ship

v5.0.1 was tagged but never published to NuGet, so the bullet that lived under an unreleased v5.0.2 header folds back into v5.0.1 — that's the version that will actually ship from this PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Make autocrlf-sensitive tests deterministic across line-ending configurations (#2859)

On Windows agents with `core.autocrlf=input` (LF on disk, CRLF Environment.NewLine), several tests compared values normalized to `Environment.NewLine` against C# verbatim string literals whose embedded newlines were LF on disk. They passed on default Windows (`autocrlf=true`: CRLF on disk = CRLF NewLine) and on Linux/Mac (`autocrlf=input`: LF on disk = LF NewLine) but failed on the cross-grained combo that no CI configuration exercises.

Two principled moves:

1. Boundary normalization in `TestAssetResourceExtractor.GetResourceText` —
   canonicalize the read text to `Environment.NewLine` (`\r\n` -> `\n` ->
   `Environment.NewLine`). This is the single point where text resources
   enter the test harness, so every consumer (`FileDiffingUnitTests`,
   `InsertOptionalDataVisitorTests`, and ad-hoc callers) inherits the
   normalization for free.

2. Rewrite the affected literal-string assertions to express newlines
   explicitly with `Environment.NewLine` rather than rely on the source
   file's on-disk line endings: `string.Join(Environment.NewLine, ...)`
   for multi-line bodies, `$@"...{Environment.NewLine}..."` for short
   ones. Touches `StackTests`, `WebRequestTests`, `WebResponseTests`,
   `AndroidStudioConverterTests`, `FortifyUtilitiesTests`.

`InsertOptionalDataVisitor.txt` is embedded as a resource and hashed by a
`[Trait(TestTraits.WindowsOnly, "true")]` test where the on-disk hash must
remain stable, so `.gitattributes` pins that file to `eol=crlf`.

Co-authored-by: Michael C. Fanning <mikefan@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Drop AI1015 ProvideRunDefaultSourceLanguage from AI authoring guidance (#2948)

* Drop AI1015 ProvideRunDefaultSourceLanguage from AI authoring guidance

The partition-by-language model proved insufficient as a baseline-updating strategy (the earlier hypothesis that we could replace AI results for one language while retaining results for another on receipt of a new log file). Remove the MUST that an AI run set 'run.defaultSourceLanguage' and partition by '(repository, branch, language)' tuple, along with the 'Run partitioning by language' section in 'docs/ai/generating-sarif.md' and the AI1015 row in the rule table.

'defaultSourceLanguage' remains an accepted optional SARIF Run field for viewer rendering; we simply no longer mandate it. No code references existed for AI1015, so this is a pure doc walkback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Reclassify AI1015 drop as BUG and trim bullet

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix SARIF1012 NRE when result.ruleId does not resolve (#2944) (#2949)

`SARIF1012.MessageArgumentsMustBeConsistentWithRule` previously threw a
`NullReferenceException` when `result.message.id` was set but
`result.ruleId` did not resolve to a rule in `tool.driver.rules[]` (no
match by id, ruleIndex, or hierarchical base). The null-guard at
lines 52-53 inspected `currentRules` (the collection) rather than the
resolved `rule` instance, and used a short-circuiting null-conditional
check that silently fell through to the diagnostic-emit branch's
indexer access.

Replace the guard with an explicit three-prong check:

    rule == null
    || rule.MessageStrings == null
    || !rule.MessageStrings.ContainsKey(result.Message.Id)

All three null cases now emit the existing `MessageIdMustExist`
diagnostic with the unresolved rule id (or `null`).

Regression fixture: adds a 4th result to
`SARIF1012.MessageArgumentsMustBeConsistentWithRule_Invalid.sarif`
that references a non-existent `NoSuchRule` with `message.id: AnyId`,
plus the corresponding expected diagnostic at `startLine: 53`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Replace Generate-CweTaxonomy.ps1 with cross-platform Python regenerator (#2921) (#2950)

The PowerShell-only 'scripts/Generate-CweTaxonomy.ps1' assumed a Windows /
pwsh environment to regenerate the CWE taxonomy assets, which is friction
for non-Windows contributors and the original complaint behind #2921.

Replace it with a single 'scripts/generate_cwe_taxonomy.py' (Python 3,
stdlib only) that runs identically on Linux, macOS, and Windows. The
script:

* Downloads 'cwec_latest.xml.zip' via 'urllib.request' into a temporary
  staging directory.
* Extracts via 'zipfile' and locates the embedded 'cwec_v*.xml'.
* Parses every weakness (handling the CWE XML namespace), buckets by
  Status, sorts by numeric ID, derives the SARIF2012-conformant
  Pascal-case identifier (preferring the parenthesized common name when
  present), resolves the View-1000 ChildOf Primary parent, and emits
  the four-section help markdown (Description / Extended Description /
  Common Consequences / Potential Mitigations).
* Writes 'CweTaxonomy.sarif' and 'CweTaxonomy.brief.md' in place under
  'src/Sarif/Taxonomies/' with UTF-8 (no BOM) and LF line endings so the
  embedded resources hash identically regardless of host OS.

Default invocation requires no arguments and downloads from MITRE:

    python3 scripts/generate_cwe_taxonomy.py

For offline / testing scenarios, '--xml path/to/cwec_v*.xml' bypasses
the download; '--source-url' overrides the MITRE URL; '--output-dir'
overrides the artifact destination.

'.gitattributes' retains the LF pinning on the two generated artifacts.
'src/Sarif/Taxonomies/CweReadme.md' Regeneration section is rewritten
to show the single Python invocation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Narrow SARIF1001 case-fold relaxation to AI-origin notification descriptors (#2951) (#2955)

Initial revision of this fix universally relaxed SARIF1001's id/name comparison
to 'StringComparison.Ordinal' on the grounds that SARIF v2.1.0 § 3.49.7 only
forbids strict-identical pairs. Per maintainer review, that read overshoots:
the case-fold comparison is an authorial SHOULD layered above the spec MUST,
and SHOULD layers are precisely what validators add value with. Removing the
SHOULD globally weakens the typo-catcher for hand-authored descriptors.

The narrower carve-out: AI notification taxonomies (issue #2952) deliberately
pair a SCREAMING-CAPS opaque id with the corresponding PascalCase end-user
name (e.g. 'DECISION' / 'Decision'). That convention is machine-coordinated
and specific to AI emitters; it does not apply to hand-authored rule and
taxon descriptors.

The cut is the intersection of two existing context signals:

  1. 'IsAIOriginRun()' -- the run carries 'properties["ai/origin"]', the
     same gate used by SARIF2002, SARIF2009 (the literal peer rule for
     identifier conventions), SARIF2014, and SARIF2015.

  2. 'Context.CurrentReportingDescriptorKind == Notification' -- the
     descriptor was reached via 'tool.driver.notifications[]', not
     'rules[]' or 'taxa[]'. AI rule ids are constrained by AI1012 to
     'BASE/sub-id' or 'NOVEL-<sub-id>' forms whose hyphens / slashes
     cannot case-fold-collide with any PascalCase name, so extending
     the carve-out to rules would be a no-op for AI and a regression
     for hand-authored taxonomies.

Strict-identical comparisons remain unconditional everywhere (spec MUST).

The Invalid functional fixture now spans three runs covering each
boundary of the carve-out:

  run[0] non-AI         rules        RULE0001/RULE0001   (spec MUST)
                                     RULE0002/RULE0002   (spec MUST)
  run[1] AI-origin      notifications STRICT/STRICT       (spec MUST under AI)
                        rules        DECISION/Decision   (rules-not-exempt)
  run[2] non-AI         notifications DECISION/Decision   (non-AI-not-exempt)

The Valid functional fixture pairs a non-AI tool with hand-authored rules
and an AI-origin tool whose notifications carry 'DECISION/Decision' and
'RULE-COVERAGE-GAP/RuleCoverageGap' -- the latter pair documents the
taxonomy convention even though hyphens already keep it case-fold-distinct.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add GHAzDO1021 (ProvideShortBranchNameInVcp) (#2954) (#2958)

Add a GHAzDO validation rule for run.versionControlProvenance[].branch values that start with refs/<class>/, using ^refs/[^/]+/ to detect full-ref branch names and recommending the stripped short form.

The AdvSec Service silently drops VCP entries whose branch is not a short branch name, so the rule is scoped only to versionControlProvenance[].branch. It deliberately does not flag the full-ref branch segment embedded in run.automationDetails.id, which remains part of the existing GHAzDO1020 contract.

Add valid and invalid ValidateCommand fixtures covering short branch names and refs/heads, refs/tags, and refs/pull full refs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Enrich versionControlProvenance from CI pipeline env (ADO + GitHub Actions) (#2957) (#2959)

EmitInitRunCommand currently stamps automationDetails.id and the four
`azuredevops/pipeline/build/*` property keys when `TF_BUILD=True` is
detected, but it does not lift anything into `versionControlProvenance`
- even though `BUILD_SOURCEBRANCH` is already in hand. AdvSec ingestion
relies on VCP for branch + revision attribution, and this gap means the
data lands only when a producer hand-constructs the entry into the
run-header JSON. The same gap exists for AI scanners running under
GitHub Actions.

Extend AdoPipelineContext to read the two optional argument vars
`BUILD_REPOSITORY_URI` and `BUILD_SOURCEVERSION`, and derive a short
branch name from the existing `BUILD_SOURCEBRANCH` by stripping any
leading `refs/<class>/` segment (so `refs/heads/main` becomes
`main`; `refs/pull/42/merge` becomes `42/merge`; `refs/tags/v1`
becomes `v1`). The two new env vars are optional - absence does not
degrade Complete -> Partial - but malformed presence does (URI must be
absolute http(s); revision must match `^[0-9a-fA-F]{7,40}$`).

Add a parallel `GitHubActionsContext` for `GITHUB_ACTIONS=true`
that mirrors the same shape but is VCP-scoped (no pipeline-identity
contract today): `GITHUB_SERVER_URL` + `GITHUB_REPOSITORY` compose
the repository URI, `GITHUB_SHA` supplies the revision (same hex
regex), and `GITHUB_REF_NAME` is preferred over `GITHUB_REF`
(stripping the same `refs/<class>/` prefix). Custom GHES servers
compose correctly via trailing-slash normalization.

EmitInitRunCommand grows a `TryStampVcp` JObject-direct stamper that
mirrors the existing `TryStampAdoContext` shape and operates on three
input shapes:

1. `versionControlProvenance` absent or empty array -> synthesize a
   new entry only when `repositoryUri` was detected (anchor field).
   Branch/revision without a repository URI is informationally thin
   and the synthesized entry would not bind to a repo for ingestion.
2. `versionControlProvenance` contains exactly one entry -> enrich
   missing fields; fail with a per-field conflict diagnostic when any
   supplied field disagrees with the detected pipeline value. Repository
   URI equality treats scheme/host case-insensitively (RFC 3986) via
   `Uri.TryCreate` round-trip; branch and revision are byte-wise.
3. `versionControlProvenance` contains multiple entries -> leave
   untouched. The caller has declared a multi-repo shape and we refuse
   to guess which entry names the pipeline's source repo.

A `TryResolveVcpFields` orchestrator layers the two sources before
stamping: ADO is the higher-priority source per the documented
"env-takes-priority" rule, GHA fills any gap where ADO is silent, and
fields populated on both sources MUST agree or the verb aborts with a
diagnostic naming both sources. The stamper itself is source-agnostic -
it takes the resolved (repositoryUri, revisionId, branch) triple and
does not know which env produced each field.

Probe-before-write semantics on the single-entry path leave the JObject
unchanged when a conflict is detected, matching the existing
`TryStampAdoContext` contract - a half-stamped VCP is worse than a
clean refusal.

Why no disk-git fallback?
The verb deliberately does not shell out to `git.exe` to recover this
data from the working tree. The two CI envs cover both surfaces an AI
scanner lands in, `add-result`'s producer-supplied JSON is the
universal escape hatch for everything else, and adding a soft runtime
dependency on `git.exe` (with its own failure modes around shallow
clones, detached HEADs, and non-existent branches) would make the
verb's output depend on disk state. Producers running locally outside
CI either set the env vars themselves or populate VCP directly in the
input JSON.

CI fixture isolation
`CweGenerateSample.ps1`'s `\` map gains the two new optional
ADO vars (`BUILD_REPOSITORY_URI` set to the resolved git remote URL;
`BUILD_SOURCEVERSION` set to the same zero-SHA placeholder the
hardcoded VCP entry carries) so the -GHAzDO variant stamps a fully
populated ADO env shape and AdoPipelineContext detects no conflict
against the supplied VCP. The same map also adds `\` entries for
`GITHUB_ACTIONS` / `GITHUB_SERVER_URL` / `GITHUB_REPOSITORY` /
`GITHUB_SHA` / `GITHUB_REF_NAME` / `GITHUB_REF` so that ambient
GitHub Actions env on the macOS CI runner (which sets a real
`GITHUB_SHA`) cannot trip GitHubActionsContext into reporting a
revisionId that conflicts with the zero-SHA placeholder. Without this
scrubbing, both `CweSample_Sarif_IsByteIdenticalToCweGenerateSampleScriptOutput`
and `CweGHAzDoSample_Sarif_IsByteIdenticalToCweGenerateSampleScriptOutput`
break under macos-latest. The same property is now gated by
`CweGHAzDoSample_RegenerationSucceeds_WhenAmbientGitHubActionsEnvVarsConflict`,
the GHA-side parallel of the existing ambient-ADO regression test.

Closes #2957.

Tests: 7 new EmitInitRunCommandTests covering GHA-only stamping, ADO+GHA
agreement, the gap-fill path, cross-source disagreement on each field,
GHA partial-env refusal, and producer-supplied conflicts under GHA. 11
new GitHubActionsContextTests covering detection states, malformed
inputs, REF_NAME vs REF precedence, and URI normalization. 1 new
CweGeneratedSampleTests fact gating ambient-GHA-env isolation. The
original 11 ADO VCP EmitInitRunCommandTests and 9 AdoPipelineContextTests
still green. 263/264 in Test.UnitTests.Sarif.Multitool.Library and
4/4 CweGeneratedSampleTests pass locally.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Stamp ReleaseHistory v5.0.2 section with nuget links and bump VersionPrefix (#2960)

Promotes the v5.0.2 work to ship-ready:

- `src/build.props`: `VersionPrefix` 5.0.1 -> 5.0.2,
  `PreviousVersionPrefix` 5.0.0 -> 5.0.1.
- `ReleaseHistory.md`: stamp the `v5.0.2` header with nuget links
  for Sdk / Driver / Converters / Multitool / Multitool Library
  (UNRELEASED -> shipped).
- `skills/emit-sarif-findings/SKILL.md`: bump the recommended
  Sarif.Multitool minimum from 5.0.1 to 5.0.2. v5.0.2 is the first
  release where `emit-init-run` enriches versionControlProvenance
  from the CI pipeline environment (Azure DevOps + GitHub Actions),
  which the skill's required commit-sha / branch / repo-uri inputs
  are stamped from automatically.

Six v5.0.2 bullets ship:
* BRK: scripts/Generate-CweTaxonomy.ps1 -> scripts/generate_cwe_taxonomy.py
  (#2921 / #2950).
* NEW: `emit-init-run` enriches `versionControlProvenance` from
  ADO + GitHub Actions env (#2957 / #2959).
* NEW: GHAzDO1021 `ProvideShortBranchNameInVcp` (#2954 / #2958).
* BUG: Drop AI1015 `ProvideRunDefaultSourceLanguage` (#2948).
* BUG: SARIF1012 NRE on unresolved ruleId (#2944 / #2949).
* BUG: SARIF1001 case-fold relaxation for AI notification descriptors
  (#2951 / #2955).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert GHAzDO1021 + short-form branch normalization (false premise) (#2962)

The GHAzDO1021 rule (``ProvideShortBranchNameInVcp``) and the related
short-form branch-name normalization in the ADO and GHA env enrichers
were built on a false premise. The original observation -- that the
GHAzDO/AdvSec ingestion service silently dropped ``run.versionControlProvenance[]``
entries whose ``branch`` started with ``refs/<class>/`` -- turned out
to be job-processing latency misread as validation failure. A product
engineer from the AdvSec team has since confirmed that ingestion
accepts both short (``main``) and long (``refs/heads/main``) shapes,
so there is nothing to warn about and nothing to normalize.

This change reverts the lot before v5.0.2 ships:

* Delete ``GHAzDO1021.ProvideShortBranchNameInVcp`` rule plus its
  four Valid/Invalid Inputs and ExpectedOutputs fixtures, the resx +
  Designer entries, and the ``RuleId`` constant.
* ``AdoPipelineContext``: drop ``s_branchRefPrefixRegex``,
  ``BranchShortName``, and ``NormalizeBranchRef``. ``TryDetect``
  passes ``BUILD_SOURCEBRANCH`` through verbatim; ``BranchRef`` is
  the sole branch property, used directly when stamping VCP.
* ``GitHubActionsContext``: drop the ``GITHUB_REF_NAME`` fallback
  entirely (the runner always sets both env vars, so this is invisible
  in production but keeps the property honestly long-form). Rename
  ``BranchShortName`` -> ``BranchRef``; new ``TryReadOptionalBranchRef``
  is a pass-through.
* ``EmitInitRunCommand``: rename ``vcpBranchShortName`` -> ``vcpBranch``;
  ``TryResolveVcpFields`` / ``TryStampVcp`` parameter renames; doc
  comments updated.
* Tests: ``ValidateCommandTests`` drops the two GHAzDO1021_* methods;
  ``AdoPipelineContextTests`` renames the four "BranchShortName_Strips*"
  tests to "Passes*Through" and asserts only ``BranchRef`` (long form);
  ``GitHubActionsContextTests`` switches to ``GITHUB_REF`` setup, drops
  the two RefName-preference tests in favour of three pass-through
  tests; ``EmitInitRunCommandTests`` updates four env setups and six
  assertions to the long form.
* ``CweGenerateSample.ps1`` supplies ``branch = 'refs/heads/main'`` so
  the GHAzDO-variant sample agrees with the env-derived long form on
  cross-source check; ``CweSample.sarif`` and ``CweGHAzDoSample.sarif``
  regenerated.
* ``ReleaseHistory.md`` v5.0.2 UNRELEASED: drop the GHAzDO1021 NEW
  bullet; soften the VCP-enrichment bullet wording to reflect
  pass-through semantics (no short-form derivation).

No version bump: v5.0.2 is unreleased (the dev->main promote PR is
open and blocked); this lands as part of the same release window.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* AI fingerprints rule split + authored-region reconciliation + release-note guards (#2966)

* Split AI fingerprints rule into AI1007 (Error) + AI2011 partial (Warning)

The single Note-level AI2011 DoNotPersistFingerprints rule flagged both
result.fingerprints and result.partialFingerprints with one message. Per the
AI rule-id band convention (AI1xxx = MUST/SHALL = Error; AI2xxx = SHOULD =
Warning/Note), persisted fingerprints are a hard MUST-NOT while partial
fingerprints are advisory. Split accordingly:

- AI1007 DoNotPersistFingerprints (Error): result.fingerprints only.
- AI2011 DoNotPersistPartialFingerprints (Warning): result.partialFingerprints only.

RuleId.AIDoNotPersistFingerprints now maps to AI1007; new
RuleId.AIDoNotPersistPartialFingerprints maps to AI2011. Resources, tests,
docs, and ReleaseHistory updated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: moniker release-note rule ids + region rationale

- Rewrite the v5.0.3 BRK entry in enquoted-moniker form
  (`AI2011.DoNotPersistFingerprints`) and drop implementation detail.
- Extend the fingerprinting Rationale to credit region coordinates and
  region snippets (core + context) as matching signal.
- Add ReleaseHistoryStyleTests.ReleaseHistory_RuleIdsUseQuotedMonikers:
  every rule id must be introduced by its `Id.Name` moniker at least once
  per entry. Remediate all pre-existing debt file-wide with era-correct
  names (ids get renumbered: old AI1007 = ProvideExploitability, etc.).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Reconcile authored regions via OverwriteExistingData (#1784)

FileRegionsCache previously discarded any divergence between an authored
region coordinate and the value computed from the source text (a no-op
Assert), silently shipping regions that point at the wrong span. This
makes OptionallyEmittedData.OverwriteExistingData govern that
reconciliation:

  * UNSET (default, 'validate me'): a divergent authored coordinate
    (startLine/startColumn/endLine/endColumn/charOffset/charLength), or a
    character span past end-of-file, throws ArgumentException (paramName
    'inputRegion') instead of being trusted blindly.
  * SET ('trust me / recompute'): the divergent authored coordinate is
    overwritten with the computed value. This also makes the flag honest:
    previously it only affected snippets/hashes/contents and never touched
    region coordinates.

Absent (0) coordinates are still computed and filled exactly as before, so
callers that omit coordinates are unaffected. FileRegionsCache.PopulateText-
RegionProperties gains an overwriteExistingData parameter (default false);
InsertOptionalDataVisitor threads the existing OverwriteExistingData flag
into it. emit-finalize does not set OverwriteExistingData, so it validates
authored regions for free.

Closes #1784

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add SDK-owned AI emit-profile input schemas (result + run-header) with C# drift tests (#2967)

Add two Draft 2020-12 JSON input schemas the get-schema verb serves for AI
SARIF producers, under Sarif.Multitool.Library/GetSchema:

* ai-result.schema.json - overlay on the SARIF result that pins the AI
  profile's ruleId grammar (taxonomy sub-id form plus the exclusive NOVEL-
  escape via a load-bearing if/then/else), requires message.markdown and a
  non-empty locations array with region.startLine >= 1, and bounces the
  index/identity state the multitool assigns at finalize.
* ai-run-header.schema.json - overlay on the SARIF run that requires
  tool.driver.name, a non-empty versionControlProvenance, and
  properties[ai/origin], canonicalizes URI schemes and GUIDs, and bounces
  the header fields the replayer ignores.

The schemas carry only concise, consumer-facing description annotations; the
xUnit drift tests (AIInputSchemaDriftTests, one Fact per contract clause) are
the single source of truth pinning each schema verdict to its C# rule. The
tests validate with JsonSchema.Net (added to Directory.Packages.props) because
the in-repo Microsoft.Json.Schema validator predates Draft 2020-12 and would
silently no-op the load-bearing if/then/else routing.

Also remove AI1011.RedactedRunMarker (the rule, its RuleId constant, resource
strings, tests, and doc rows) - it is out of scope for the AI emit profile.
Recorded as a BRK entry under v5.0.3.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add AI emit-profile input schemas for the invocation and reporting-descriptor verbs (#2968)

* Add AI emit-profile input schemas for notification, invocation, and descriptor verbs

Author SDK-owned Draft 2020-12 input schemas for the remaining AI emit
verbs, each pinned to its C# receipt contract by an xUnit drift fact:

- ai-notification.schema.json / ai-invocation.schema.json: any JSON object
  (the verbs validate only "payload is an object" at receipt; richer
  cross-document checks run at emit-finalize --validate).
- ai-reporting-descriptor.schema.json: object requiring a non-empty
  string id (minLength 1 matches the verb IsNullOrEmpty gate; whitespace
  id accepted).
- ai-rule-descriptor.schema.json: the --rules path, additionally gating id
  on the NOVEL- prefix (IsNovel), prefix-only by design.

All four are standalone (no SARIF base $ref) because the verbs deliberately
do not enforce the base required fields (notification.message,
invocation.executionSuccessful); $ref'ing the base would over-constrain.

Wire the four JSONs into the test csproj copy ItemGroup; add four drift
facts (16 total, green). Add a NEW ReleaseHistory entry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refactor AI reporting-descriptor schema into base + novel overlay

Collapse the two add-reporting-descriptor input schemas to a single base
contract plus a minimal overlay, matching the verb's actual shape: there
is ONE reportingDescriptor object; the --rules flag only chooses its
storage target (rules[] vs notifications[]).

- ai-reporting-descriptor.schema.json: the general descriptor contract
  (object + non-empty string id, mirroring the verb's IsNullOrEmpty gate).
- ai-novel-rule-descriptor.schema.json: a minimal overlay that $refs the
  base and adds the single --rules-path tightening — id must start with
  NOVEL- (AIRuleIdConvention.IsNovel, prefix-only). Composition reads "a
  novel rule descriptor IS a reporting descriptor whose id starts with
  NOVEL-", keeping one source of truth for the descriptor shape.

Removes the earlier ai-rule-descriptor.schema.json. Drift tests stay at
16 facts; the overlay's inherited non-empty-id rejects confirm its $ref
resolves at runtime.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trim v5.0.3 NEW entry under the 300-char terseness cap

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove add-notification; notifications travel inline on atomic add-invocation

SARIF has no run-level notifications array, so a flat serial event log cannot
route a streamed notification to one of several parallel invocations. Make
add-invocation the sole carrier of notifications: the producer holds per-process
state and emits ONE fully-formed invocation (with inline
toolExecutionNotifications / toolConfigurationNotifications) when the process
finishes. Parallelism is then free and the replayer is deterministic.

- Remove add-notification verb + AddNotificationCommand/Options + schema/tests.
- Remove execution-notification / configuration-notification event kinds; drop
  the replayer notification buffers, synthetic-invocation attach, and clock reads.
- add-invocation now requires executionSuccessful (bool) + non-whitespace
  commandLine, and auto-stamps endTimeUtc + any unset inline notification timeUtc
  using a single wall-clock now (producer values preserved).
- Enrich ai-invocation.schema.json to a rich overlay; update drift tests.
- Migrate CweGenerateSample.ps1 to emit the notification inline on an
  add-invocation payload; regenerate CweSample.sarif / CweGHAzDoSample.sarif.
- Update doc-comments, CweReadme.md, and ReleaseHistory.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix ReleaseHistory terseness gate; apply PR review trims

- ReleaseHistory.md: trim the add-notification-removal BRK entry under the
  300-char terseness gate (ReleaseHistoryStyleTests was red on all 3 platforms);
  reduce the NEW schema entry to the terse one-sentence form and drop the
  removed add-notification verb from it (PR review).
- ai-reporting-descriptor.schema.json: trim the description and drop the NOVEL-
  rule sentence (that gate lives in the ai-novel-rule-descriptor overlay) (PR review).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Require workingDirectory on the AI invocation profile

Per PR review: the add-invocation overlay now requires a workingDirectory
artifactLocation (with a non-whitespace uri), in addition to executionSuccessful
and commandLine. The AI producer knows its working directory and it anchors the
relative paths in the scan, so the profile surfaces and requires it even though
SARIF makes it optional. endTimeUtc stays auto-stamped (not required); we keep
startTimeUtc/exitCode permitted-but-optional via the base $ref.

- ai-invocation.schema.json: add workingDirectory to required + a property
  override requiring a non-empty uri; trim the description.
- AddInvocationCommand: extend the receipt gate to reject a missing/empty
  workingDirectory.uri; update doc-comment.
- AIInputSchemaDriftTests + AddInvocationCommandTests: pin the new accept/reject
  cases (missing-workingDirectory, workingDirectory-without-uri, empty uri).
- CweGenerateSample.ps1 + regenerated CweSample/CweGHAzDoSample fixtures: the
  sample invocation now carries workingDirectory.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Make notification timeUtc producer-owned; fix invocation workingDirectory artifact handling

Notification timestamps are now producer-owned and required: the add-invocation
verb never auto-stamps notification timeUtc. It is required by the input schema
overlay and enforced by the C# receipt gate. Only endTimeUtc is auto-stamped
(receipt ~ process end).

Sample now carries a real workingDirectory ({uri, uriBaseId}) plus an
arguments[] array to exercise rich invocation structure.

Two directory-related bugs fixed (the second was masked by the first):
- FileRegionsCache.GetHashData returns null (not the sha-256 of the empty
  string) for a path resolving to a directory or missing file.
- AddFileReferencesVisitor no longer promotes an invocation's workingDirectory
  into run.artifacts -- it is process context, not a scanned artifact. This
  removes a location-only artifact (SARIF2004) that surfaced once the bogus
  empty hash was gone.

Regenerated CweSample.sarif and CweGHAzDoSample.sarif: 0 errors/warnings/notes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Tighten AI ruleId grammar to CWE-only with lowercase-kebab sub-ids

Per maintainer decision, AI result.ruleId is now CWE-only:
  CWE-<number>/<sub-id>  or  NOVEL-<sub-id>
where <sub-id> is lowercase-alphanumeric kebab (single hyphens, no
leading/trailing/consecutive hyphen). CVE, OWASP, and mixed-case
sub-ids are no longer accepted.

Because CWE- and NOVEL- prefixes are now disjoint, the result schema's
if/then/else ruleId routing is exactly equivalent to a plain anyOf, so
it is flattened and the now-false IsNotFlattenableToAnyOf drift test is
removed. The descriptor authoring gate stays prefix-only by design.

Surfaces synced: AIRuleIdConvention(.Exception), ai-result.schema.json,
convention + drift tests (case-for-case), docs, stale prose, and a
breaking ReleaseHistory entry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: tighten novel-descriptor gate, scrub stale prose

Resolve the unresolved review threads on the AI emit-schemas work:

- Tighten the add-reporting-descriptor --rules gate from a NOVEL- prefix
  check to the full novel grammar (IsNovel + IsAcceptable), so a rule
  descriptor id is byte-identical to the result ruleId that references it.
  Mirror the tightening in ai-novel-rule-descriptor.schema.json and in the
  drift / command tests.
- Emit "true"/"false" via a ternary instead of ToString().ToLowerInvariant().
- Drop redundant minLength:1 where paired with pattern \S in
  ai-invocation.schema.json; correct the workingDirectory uri prose to
  non-whitespace.
- Add the section 3.49.3 spec ref to the novel-descriptor id description.
- Scrub historical add-notification removal narration from
  CweGenerateSample.ps1 and the AddInvocationCommand / SarifEventKinds doc
  comments; restate as positive invariants.
- Remove an obvious one-line comment in SarifEventReplayer.
- Add a BRK ReleaseHistory bullet for the gate tightening.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Simplify emit doc comments to describe current design

Rewrite the invocation/notification doc comments and inline comments across
AddInvocationCommand, SarifEventKinds, SarifEventReplayer, and
AddReportingDescriptorCommand to state the current, proper design plainly.
Remove reverse-mirror justifications (why notifications live inline rather
than as a standalone event, why an object carries its properties) and the
process-completion-proxy/wall-of-text framing; keep only the load-bearing
invariants a reader needs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Clarify emit/AI comments and docs; scrub stale add-notification references

Editorial pass over the AI emit-profile surface. The changes are
comment- and documentation-only: no program behavior, output strings,
or public signatures change.

Comments:
- Strip defensive, reverse-mirror, and over-descriptive prose from the
  Emit verb/option doc comments and SarifEventKinds / SarifEventReplayer
  so each comment plainly states the current design rather than
  narrating obvious code, restating what was just said, or justifying
  why a past mistake was fixed.
- Remove complement-stating clauses (what isn't accepted) from the AI
  rule-id convention docs once the positive rule is stated.
- Preserve load-bearing intent: the add-invocation receipt contract
  (producer-supplied timeUtc on inline notifications, auto-stamped
  endTimeUtc), the FileRegionsCache reconciliation pointer, and the
  emit-init-run ADO-seeds / GHA-fills-gaps VCP precedence rule.

Docs and skills:
- Drop the removed add-notification verb from multitool-usage.md and
  rewrite the routing guidance in generating-sarif.md and the
  emit/validate skills to the current model: notifications travel
  inline on the add-invocation payload, with placement selected
  structurally by the toolExecutionNotifications /
  toolConfigurationNotifications array.

Other:
- Normalize CWE placeholder ids to CWE-NNNN (four digits is the catalog
  maximum, CWE-1434) wherever a placeholder, not a concrete example,
  is meant.
- Add .github/prompts/comment-editor.prompt.md, a reusable adversarial
  comment-editor prompt codifying this editorial standard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Co-locate endTimeUtc SARIF formatting in StampEndTimeUtcIfOmitted

StampEndTimeUtcIfOmitted now takes the receipt DateTime and performs the
SARIF date-string formatting itself, rather than receiving a
pre-formatted string from Run(). The format construction lives next to
the stamp, so the SARIF-friendly representation is verifiable at the
point of use. Output is unchanged.

The producer-supplied endTimeUtc remains a pass-through string: payloads
are read with DateParseHandling.None to preserve ISO-8601 text exactly,
so only the value the verb itself authors is formatted here.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trim over-length add-reporting-descriptor release note under 300 chars

ReleaseHistory_CurrentSectionEntriesAreTerse flagged the
add-reporting-descriptor --rules bullet at 389 chars. Elide the
rejection-mechanics list (bare NOVEL-, slashes, uppercase tails,
trailing hyphens) — that detail lives in the API docs and PR — and keep
the consumer-facing essentials: the full NOVEL- grammar gate and that
the descriptor id is byte-identical to the ruleId that references it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Correct workingDirectory artifacts rationale: no added data, not role

The prior wording justified keeping an invocation's workingDirectory out
of run.artifacts on the grounds that it is "process context, not a
scanned artifact." That is wrong: SARIF defines a directory artifact
role (ArtifactRoles.Directory, 3.24.6), so a directory is a legitimate
artifact. The real reason is that the location is bare -- no hashes,
contents, or length -- so an artifacts-table entry would carry nothing,
which is equally true for process context and scanned artifacts. Reword
the code comment and the release note accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trim add-reporting-descriptor note to the grammar-gate essential

Drop the byte-identical-ruleId follow-on; the grammar-gate change is the
consumer-facing fact.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add get-schema verb and align AI emit verb/schema names (#2971)

* Add get-schema verb and align AI emit verb/schema names

Introduce `multitool get-schema <verb>`, which streams the embedded JSON
Schema that validates a named emit verb's input to stdout or `--output`,
with `--list` to enumerate the servable verbs. The schemas served are the
same bytes the emit verbs validate against, so a producer can fetch the
contract for the exact verb it is about to call.

Bring the AI emit surface into verb<->schema naming parity:

- Rename `emit-init-run` to `emit-run` (schema `ai-run`).
- Split `add-reporting-descriptor` into `add-notification-reporting-descriptor`
  and `add-rule-reporting-descriptor`, removing the `--rules` flag; the verb
  now selects the target array and id gate. Shared body lives in the internal
  `ReportingDescriptorEmitter`.
- Rename the input schemas to mirror their verbs: `ai-run`, `ai-result`,
  `ai-invocation`, `ai-notification-reporting-descriptor`,
  `ai-rule-reporting-descriptor`. The rule schema is now standalone (no
  cross-schema `allOf`).

`emit-finalize` is reserved in the get-schema catalog with no schema this
release; its whole-log `ai-log` schema is the fast-follow tracked in #2970.

Tests: a new in-sync test pins every catalog verb to bytes identical across
the embedded resource, the on-disk `GetSchema/` file, and what the verb
writes, and asserts catalog<->embedded-manifest parity so the surface cannot
silently drift. The drift and descriptor tests are updated for the renames
and split. Cwe sample fixtures regenerate byte-identical.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR #2971 review: emit options base, SRCROOT disk check, verb doc-comment wording, SKILL descriptor verbs

- Factor a shared EmitInputOptionsBase (OutputFilePath + InputFilePath) and
  inherit it from the five emit verb option classes; EmitRunOptions keeps only
  --force-overwrite.
- emit-run now rejects a file: SRCROOT whose path does not resolve to an existing
  directory on disk at receipt, so finalize can enrich against an observable
  checkout. Adds passing/failing receipt tests.
- Correct the command doc-comment summaries to 'Implements <verb>' (the dotnet
  tool command is 'sarif', not 'multitool').
- SKILL.md: reference add-notification-reporting-descriptor and
  add-rule-reporting-descriptor in the verb list and add a descriptor step;
  note the file: SRCROOT disk-existence requirement.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Derive portable artifact roots from versionControlProvenance at emit-finalize (#2973)

* Derive portable artifact roots from versionControlProvenance at emit-finalize

Replace the caller-supplied `--srcroot` rewrite with an EmitFinalizeRebaseVisitor
that deconstructs absolute local file paths into relative URIs plus portable,
per-repository uriBaseIds anchored at GitHub blob permalinks derived from
run.versionControlProvenance. The shipped SARIF carries no machine-specific path.

emit-finalize now hard-requires each run to declare at least one
versionControlProvenance entry whose mappedTo.uriBaseId (and only a uriBaseId)
binds a declared originalUriBaseIds root with an absolute http(s) repositoryUri
and a non-empty revisionId. A single repository collapses to the bare SRCROOT
base; multiple repositories each receive SRCROOT_<REPO-LEAF>, disambiguated by an
ordinal suffix on collision. Portable derivation is github.com-only in this
release (default port; owner/repo path); other hosts fail with a clear, planned
follow-up message. A local path that no declared root resolves fails finalize
rather than shipping a machine-specific path. Rebasing runs after enrichment
reads sources from the local file:// bases and before serialization; on failure
the verb writes each…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant