fix(canonical-family): migrate-uppercase tool + preflight guard for stale lowercase rows#131
Closed
sroussey wants to merge 2 commits into
Closed
fix(canonical-family): migrate-uppercase tool + preflight guard for stale lowercase rows#131sroussey wants to merge 2 commits into
sroussey wants to merge 2 commits into
Conversation
…amily canonical rows The June 4-8 lowercase-fold window in normalizeFamilyName left rows whose normalized_name is lowercase. The subsequent revert to UPPER means the resolver's exact-match lookup (findByResolverAndName) misses them on every ingest, silently double-minting canonical family ids and orphaning identity links, memberships, and operator-installed aliases on the lowercase row. `sec canonical sponsor-family migrate-uppercase` and the matching `underwriter-family` command UPPER-fold every offender in a single transaction. Dry-run by default; --apply writes. Scope to one resolver_version with --resolver-version. Concurrency: SQLite uses BEGIN EXCLUSIVE so the writer lock is held from the planning SELECT through the UPDATE and the residual-row recount. Postgres uses SERIALIZABLE for the same guarantee. The repository fallback (InMemoryTabularStorage under TestingDI) mirrors the algorithm shape so tests exercise the same find / collision / apply logic. Collisions: when a lowercase row already has a same-resolver-version UPPER sibling, the UPPER-fold would violate the (resolver_version, normalized_name) natural key. We refuse to write, dump the lower_id, upper_id, normalized name, plus membership / link / alias counts as TSV on stderr, and exit 2 so an operator can merge them via `canonical <kind> alias` before retrying.
…ical-family rows Adds a cheap two-COUNT preflight that runs at the tail of the program's preAction hook, after DI is fully initialised. The check classifies the target subcommand into one of three buckets: * Safety-critical (`resolve`, `canonical … alias`, `spac by-family`, `underwriter by-family`) — paths where family resolution round-trip correctness is load-bearing. If any lowercase rows remain in canonical_sponsor_family / canonical_underwriter_family, the guard throws SecCliConfigurationError so the command never runs against a silently-double-minting DB. SecCliConfigurationError already has a quiet exit path in src/sec.ts (no stack trace). * Exempt (`init`, the `db` subcommand group, and `migrate-uppercase` itself) — chicken-and-egg: these are the commands needed to set up or repair the DB. * Everything else — console.warn (yellow when --color is on) with the same body, no exit. The operator sees the warning every invocation until the migration is applied. `--allow-stale` is added to the global options as an explicit override for the safety-critical throw. A read failure (e.g. tables not yet created) is treated as "no stale rows" so a first-run `db setup` is not blocked by a self-check that has nothing to read.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The June 8 revert to UPPER normalization (commit 330523d) leaves any operator DB with rows minted during the lowercase window (June 4-8 for sponsor-family, entire history for underwriter-family) unreachable by the resolver. Next ingest silently double-mints canonical ids and orphans identity links, memberships, and operator-installed aliases.
This PR adds:
sec canonical {sponsor,underwriter}-family migrate-uppercase [--apply]. Dry-run by default. SQLiteBEGIN EXCLUSIVE/ PostgresSERIALIZABLEfor concurrent-ingest safety. Refuses (exit 2) when collisions would force a merge; prints dependent-row counts so the operator can runsec canonical … aliasfirst.console.warnfor most subcommands; throwsSecCliConfigurationErroron safety-critical paths (resolve,canonical … alias,… by-family) unless--allow-staleis passed.Test plan
bun test src/commands/canonicalFamilyMigrate.test.ts— seed lowercase, --apply, assert UPPER; seed collision, assert exit 2 + no writes; dry-run; --resolver-version scope.bun test src/cli/StaleCanonicalGuard.test.ts— warn path, throw path, --allow-stale bypass.bun scripts/test.ts— full suite green.Out of scope
canonical_*_family.normalized_name.Generated by Claude Code