Add offering terms, underwriters, and use-of-proceeds extraction#128
Merged
Conversation
…e Sponsor section Also updates the existing storage e2e dead-letter assertion to account for the three new SECTION_NOT_FOUND entries (offering-terms / underwriters / use-of-proceeds) produced for fixtures lacking those headings. https://claude.ai/code/session_01X8jhKh5Tnkz3st8bS1scd6
…egistry expectations The four new segmenter sections (The Offering / Underwriting / Use of Proceeds / The Sponsor) are now legitimately resolved from the real S-1 golden fixtures, and the underwriter-family resolver kind adds a registered component.
…us; harden offering figures Code review (xhigh) findings: - dropPrevious, computeResolverCoverage, and 'sec resolve' branched person-vs- else=company, so the family resolver kinds (sponsor-family, underwriter-family) fell through to the company tier: drop-previous would destructively purge company canonical/identity-link data, coverage reported company numbers, and 'sec resolve --kind underwriter-family' wrote mislabeled company links. Add FAMILY_RESOLVER_IDS/isFamilyResolverId and refuse these kinds explicitly. - IssuerTickerRepo.history now breaks filing_date ties deterministically (primary first, then ticker) so same-date unit/share/warrant symbols order stably across backends. - Round share/unit counts before writing integer-typed columns so a fractional model figure no longer dead-letters the whole offering section.
Contributor
There was a problem hiding this comment.
Pull request overview
Extends the S‑1 extractor and storage layer to capture IPO/SPAC deal details (offering terms, tickers, underwriters, and use‑of‑proceeds) and introduces an underwriter-family canonicalization tier parallel to sponsor-family, with shared “family resolver” base logic and consolidated section dead-letter handling.
Changes:
- Added new S‑1 section segmentation + extractors for offering terms, underwriters, and use of proceeds, plus persistence to new storage tables.
- Implemented underwriter-family canonical repos/aliases/memberships/linking and added CLI commands for alias management and issuer lookup by family.
- Refactored sponsor/underwriter family canonicalization into shared base repos/resolver utilities and unified S‑1 section processing via
runSection().
Reviewed changes
Copilot reviewed 60 out of 60 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/storage/versioning/componentRegistry.test.ts | Updates component registry expectations for the new resolver kind. |
| src/storage/versioning/ceremonies.ts | Prevents unsafe fallthrough in dropPrevious and explicitly refuses unsupported family resolver purges. |
| src/storage/versioning/bootstrapComponentVersions.test.ts | Adds bootstrap seed test for underwriter-family resolver version. |
| src/storage/use-of-proceeds/UseOfProceedsSchema.ts | Introduces TypeBox schema + DI token for use-of-proceeds storage. |
| src/storage/use-of-proceeds/UseOfProceedsRepo.ts | Adds repository wrapper for use-of-proceeds persistence and queries. |
| src/storage/use-of-proceeds/UseOfProceedsRepo.test.ts | Unit tests for UseOfProceedsRepo save/query/clear. |
| src/storage/offering/SpacUnitTermsSchema.ts | Adds storage schema/token for SPAC unit terms. |
| src/storage/offering/SpacUnitTermsRepo.ts | Adds repo for SPAC unit terms read/write. |
| src/storage/offering/SpacUnitTermsRepo.test.ts | Unit tests for SPAC unit terms persistence. |
| src/storage/offering/OfferingTermsSchema.ts | Adds storage schema/token for equity offering terms. |
| src/storage/offering/OfferingTermsRepo.ts | Adds repo for equity offering terms read/write. |
| src/storage/offering/OfferingTermsRepo.test.ts | Unit tests for equity offering terms persistence. |
| src/storage/offering/IssuerTickerSchema.ts | Adds point-in-time issuer ticker series schema/token. |
| src/storage/offering/IssuerTickerRepo.ts | Adds ticker series persistence + deterministic history ordering. |
| src/storage/offering/IssuerTickerRepo.test.ts | Unit tests for ticker history ordering and clear behavior. |
| src/storage/canonical/UnderwriterLinkSchema.ts | Adds issuer↔underwriter link schema (company + family tier). |
| src/storage/canonical/UnderwriterLinkRepo.ts | Adds repo for underwriter links + issuer lookup by family. |
| src/storage/canonical/UnderwriterLinkRepo.test.ts | Unit tests for underwriter link save/clear/list. |
| src/storage/canonical/UnderwriterFamilyMembershipSchema.ts | Adds membership schema/token for underwriter family membership. |
| src/storage/canonical/UnderwriterFamilyMembershipRepo.ts | Implements underwriter membership repo via shared base. |
| src/storage/canonical/UnderwriterFamilyMembershipRepo.test.ts | Unit tests for membership recording + family listing. |
| src/storage/canonical/SponsorFamilyMembershipRepo.ts | Refactors sponsor membership repo to use shared FamilyMembershipRepo. |
| src/storage/canonical/FamilyMembershipRepo.ts | Adds shared membership repo base for sponsor/underwriter families. |
| src/storage/canonical/CanonicalUnderwriterFamilySchema.ts | Adds canonical underwriter family schema/token. |
| src/storage/canonical/CanonicalUnderwriterFamilyRepo.ts | Adds repo for canonical underwriter families. |
| src/storage/canonical/CanonicalUnderwriterFamilyRepo.test.ts | Unit tests for find-by-(resolver,name) underwriter families. |
| src/storage/canonical/CanonicalUnderwriterFamilyAliasRepo.ts | Adds underwriter family alias repo based on shared alias base. |
| src/storage/canonical/CanonicalUnderwriterFamilyAliasRepo.test.ts | Unit tests for underwriter family alias semantics. |
| src/storage/canonical/CanonicalSponsorFamilyAliasRepo.ts | Refactors sponsor family alias repo to shared alias base. |
| src/storage/canonical/CanonicalFamilyAliasRepo.ts | Adds shared single-hop alias repo base for family tiers. |
| src/storage/canonical/CanonicalAliasSchemas.ts | Adds underwriter-family alias schema + DI token. |
| src/sec/html/parseEdgarHtml.golden.test.ts | Updates golden segmentation expectations for new S‑1 sections. |
| src/sec/forms/registration-statements/s1/useOfProceedsSchema.ts | Adds structured-output schema for use-of-proceeds extractor. |
| src/sec/forms/registration-statements/s1/underwriterSchema.ts | Adds structured-output schema for underwriter extractor. |
| src/sec/forms/registration-statements/s1/sectionExtractors.ts | Adds extractOfferingTerms, extractUnderwriters, extractUseOfProceeds. |
| src/sec/forms/registration-statements/s1/sectionExtractors.test.ts | Adds unit tests for the new extractor functions. |
| src/sec/forms/registration-statements/s1/offeringTermsSchema.ts | Adds structured-output schema for offering terms + tickers. |
| src/sec/forms/registration-statements/s1/DocumentTreeSegmenter.test.ts | Tests tree segmentation for new offering-related headings. |
| src/sec/forms/registration-statements/s1/DocumentSegmenter.ts | Adds S‑1 section IDs + heading patterns for new sections. |
| src/sec/forms/registration-statements/Form_S_1.storage.useofproceeds.test.ts | Integration test for writing use-of-proceeds rows. |
| src/sec/forms/registration-statements/Form_S_1.storage.underwriters.test.ts | Integration test for underwriter extraction + family resolution + linking. |
| src/sec/forms/registration-statements/Form_S_1.storage.ts | Major refactor: adds new persistence, new resolvers, and runSection() ceremony. |
| src/sec/forms/registration-statements/Form_S_1.storage.test.ts | Updates expectations for newly dead-lettered sections when headings absent. |
| src/sec/forms/registration-statements/Form_S_1.storage.offering.test.ts | Integration tests for equity vs SPAC offering terms and ticker persistence. |
| src/resolver/UnderwriterFamilyResolver.ts | Adds resolver that maps underwriter common name → canonical family id. |
| src/resolver/UnderwriterFamilyResolver.test.ts | Unit tests for underwriter family normalization, reuse, and alias following. |
| src/resolver/SponsorFamilyResolver.ts | Refactors sponsor resolver to use shared FamilyResolver. |
| src/resolver/resolverIds.ts | Adds underwriter-family and introduces isFamilyResolverId helper. |
| src/resolver/resolverIds.test.ts | Updates resolver ID assertions. |
| src/resolver/FamilyResolver.ts | Introduces shared mutexed “family name → canonical id” resolver core. |
| src/config/TestingDI.ts | Registers new storages/tokens for in-memory tests. |
| src/config/setupAllDatabases.ts | Adds database setup for new schemas/tables. |
| src/config/DefaultDI.ts | Registers new storages/tokens for default (SQLite) DI. |
| src/commands/underwriterFamily.ts | Adds CLI commands and query to list IPO issuers by underwriter family. |
| src/commands/underwriterFamily.test.ts | Unit tests for underwriter-family issuer query and alias unioning. |
| src/commands/index.ts | Wires in the new underwriter-family CLI command group. |
| src/cli/queries/ResolverCoverage.ts | Refuses coverage for family-tier resolver kinds. |
| src/cli/queries/ResolverCoverage.test.ts | Adds test asserting family resolver kinds are rejected for coverage. |
| src/cli/groups/resolve.ts | Refuses sec resolve for family-tier resolver kinds. |
| CLAUDE.md | Documents new S‑1 offering sections and underwriter-family management commands. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…rect CLAUDE.md versioning docs Addresses Copilot review on PR #128: - role_detail now an enum-constrained nullable column (UNDERWRITER_ROLES | null) instead of free-form string, so typos/free-form values can't be persisted. - CLAUDE.md no longer advertises 'sec version coverage/drop-previous resolver underwriter-family' (and resolve) as supported — those intentionally refuse family-tier kinds; documented as deferred.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends S-1 form extraction to capture offering details (equity terms, SPAC units, tickers) and underwriter information, plus use-of-proceeds narrative. Introduces a new
underwriter-familyresolver tier parallel to the existing sponsor-family canonical structure, and refactors shared family-resolver logic into reusable base classes.Key Changes
New extraction sections:
extractOfferingTerms()— parses "The Offering" section for equity IPO terms (shares, price, proceeds) or SPAC unit compositionextractUnderwriters()— parses "Underwriting" section for underwriter legal names, roles, and common namesextractUseOfProceeds()— parses "Use of Proceeds" section for narrative line items and amountsOfferingTermsRepo(equity) orSpacUnitTermsRepo(SPAC units) based on issuer SIC codeIssuerTickerRepoas a point-in-time series keyed by filing dateUnderwriter family canonical tier:
CanonicalUnderwriterFamilyRepo,CanonicalUnderwriterFamilyAliasRepo,UnderwriterFamilyMembershipRepo, andUnderwriterLinkRepomirror the sponsor-family structureUnderwriterFamilyResolverresolves underwriter legal entities to canonical family IDssec canonical underwriter-family alias|alias-remove|alias-listandsec underwriter by-familyfor family management and IPO issuer lookupRefactored shared logic:
FamilyResolverbase class withnormalizeFamilyName()used by both sponsor and underwriter resolversCanonicalFamilyAliasRepobase class for single-hop alias resolution (add/remove/resolve/list/listByTarget/listOrphans)FamilyMembershipRepobase class for co-occurrence membership queriesSponsorFamilyMembershipRepoandSponsorFamilyAliasReponow extend these basesSection processing refactor:
runSection()helper consolidates dead-letter ceremony for all seven S-1 sections (entity + derived)Storage schemas and repos:
OfferingTermsSchema/OfferingTermsRepo— one row per non-SPAC filing with equity termsSpacUnitTermsSchema/SpacUnitTermsRepo— one row per SPAC filing with unit compositionIssuerTickerSchema/IssuerTickerRepo— longitudinal ticker series by CIK + filing dateUseOfProceedsSchema/UseOfProceedsRepo— narrative line items with amountsUnderwriterLinkSchema/UnderwriterLinkRepo— issuer-to-underwriter-family relationships with rolesTests:
Form_S_1.storage.offering.test.ts— offering terms and ticker persistence for equity and SPAC filingsForm_S_1.storage.underwriters.test.ts— underwriter extraction and family resolutionForm_S_1.storage.useofproceeds.test.ts— use-of-proceeds line item extractionsectionExtractors.test.ts— unit tests for each new extractor functionDependency injection:
DefaultDI.tsandTestingDI.tsunderwriter-familytoRESOLVER_IDSand component registrysetupAllDatabases.tsto initialize new schemashttps://claude.ai/code/session_01X8jhKh5Tnkz3st8bS1scd6