Skip to content

Add offering terms, underwriters, and use-of-proceeds extraction#128

Merged
sroussey merged 28 commits into
mainfrom
claude/great-thompson-ytNPX
Jun 6, 2026
Merged

Add offering terms, underwriters, and use-of-proceeds extraction#128
sroussey merged 28 commits into
mainfrom
claude/great-thompson-ytNPX

Conversation

@sroussey

@sroussey sroussey commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Extends S-1 form extraction to capture offering details (equity terms, SPAC units, tickers) and underwriter information, plus use-of-proceeds narrative. Introduces a new underwriter-family resolver tier parallel to the existing sponsor-family canonical structure, and refactors shared family-resolver logic into reusable base classes.

Key Changes

New extraction sections:

  • extractOfferingTerms() — parses "The Offering" section for equity IPO terms (shares, price, proceeds) or SPAC unit composition
  • extractUnderwriters() — parses "Underwriting" section for underwriter legal names, roles, and common names
  • extractUseOfProceeds() — parses "Use of Proceeds" section for narrative line items and amounts
  • Routes offering data to OfferingTermsRepo (equity) or SpacUnitTermsRepo (SPAC units) based on issuer SIC code
  • Stores exact ticker symbols in IssuerTickerRepo as a point-in-time series keyed by filing date

Underwriter family canonical tier:

  • New CanonicalUnderwriterFamilyRepo, CanonicalUnderwriterFamilyAliasRepo, UnderwriterFamilyMembershipRepo, and UnderwriterLinkRepo mirror the sponsor-family structure
  • UnderwriterFamilyResolver resolves underwriter legal entities to canonical family IDs
  • CLI commands sec canonical underwriter-family alias|alias-remove|alias-list and sec underwriter by-family for family management and IPO issuer lookup

Refactored shared logic:

  • Extracted FamilyResolver base class with normalizeFamilyName() used by both sponsor and underwriter resolvers
  • Extracted CanonicalFamilyAliasRepo base class for single-hop alias resolution (add/remove/resolve/list/listByTarget/listOrphans)
  • Extracted FamilyMembershipRepo base class for co-occurrence membership queries
  • SponsorFamilyMembershipRepo and SponsorFamilyAliasRepo now extend these bases

Section processing refactor:

  • Unified runSection() helper consolidates dead-letter ceremony for all seven S-1 sections (entity + derived)
  • Applies confidence floor, handles empty/low-confidence/invalid-output cases consistently
  • Eliminates repetitive try-catch and dead-letter boilerplate across management, beneficial-ownership, related-party, offering-terms, underwriters, use-of-proceeds, and sponsor sections

Storage schemas and repos:

  • OfferingTermsSchema / OfferingTermsRepo — one row per non-SPAC filing with equity terms
  • SpacUnitTermsSchema / SpacUnitTermsRepo — one row per SPAC filing with unit composition
  • IssuerTickerSchema / IssuerTickerRepo — longitudinal ticker series by CIK + filing date
  • UseOfProceedsSchema / UseOfProceedsRepo — narrative line items with amounts
  • UnderwriterLinkSchema / UnderwriterLinkRepo — issuer-to-underwriter-family relationships with roles

Tests:

  • Form_S_1.storage.offering.test.ts — offering terms and ticker persistence for equity and SPAC filings
  • Form_S_1.storage.underwriters.test.ts — underwriter extraction and family resolution
  • Form_S_1.storage.useofproceeds.test.ts — use-of-proceeds line item extraction
  • sectionExtractors.test.ts — unit tests for each new extractor function
  • Repo and resolver unit tests for all new canonical/storage classes

Dependency injection:

  • Registered new repos in DefaultDI.ts and TestingDI.ts
  • Added underwriter-family to RESOLVER_IDS and component registry
  • Updated setupAllDatabases.ts to initialize new schemas

https://claude.ai/code/session_01X8jhKh5Tnkz3st8bS1scd6

claude added 27 commits June 6, 2026 03:41
…e Sponsor section

Also updates the existing storage e2e dead-letter assertion to account for the
three new SECTION_NOT_FOUND entries (offering-terms / underwriters /
use-of-proceeds) produced for fixtures lacking those headings.

https://claude.ai/code/session_01X8jhKh5Tnkz3st8bS1scd6
…egistry expectations

The four new segmenter sections (The Offering / Underwriting / Use of Proceeds /
The Sponsor) are now legitimately resolved from the real S-1 golden fixtures, and
the underwriter-family resolver kind adds a registered component.
…us; harden offering figures

Code review (xhigh) findings:
- dropPrevious, computeResolverCoverage, and 'sec resolve' branched person-vs-
  else=company, so the family resolver kinds (sponsor-family, underwriter-family)
  fell through to the company tier: drop-previous would destructively purge
  company canonical/identity-link data, coverage reported company numbers, and
  'sec resolve --kind underwriter-family' wrote mislabeled company links. Add
  FAMILY_RESOLVER_IDS/isFamilyResolverId and refuse these kinds explicitly.
- IssuerTickerRepo.history now breaks filing_date ties deterministically
  (primary first, then ticker) so same-date unit/share/warrant symbols order
  stably across backends.
- Round share/unit counts before writing integer-typed columns so a fractional
  model figure no longer dead-letters the whole offering section.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the S‑1 extractor and storage layer to capture IPO/SPAC deal details (offering terms, tickers, underwriters, and use‑of‑proceeds) and introduces an underwriter-family canonicalization tier parallel to sponsor-family, with shared “family resolver” base logic and consolidated section dead-letter handling.

Changes:

  • Added new S‑1 section segmentation + extractors for offering terms, underwriters, and use of proceeds, plus persistence to new storage tables.
  • Implemented underwriter-family canonical repos/aliases/memberships/linking and added CLI commands for alias management and issuer lookup by family.
  • Refactored sponsor/underwriter family canonicalization into shared base repos/resolver utilities and unified S‑1 section processing via runSection().

Reviewed changes

Copilot reviewed 60 out of 60 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/storage/versioning/componentRegistry.test.ts Updates component registry expectations for the new resolver kind.
src/storage/versioning/ceremonies.ts Prevents unsafe fallthrough in dropPrevious and explicitly refuses unsupported family resolver purges.
src/storage/versioning/bootstrapComponentVersions.test.ts Adds bootstrap seed test for underwriter-family resolver version.
src/storage/use-of-proceeds/UseOfProceedsSchema.ts Introduces TypeBox schema + DI token for use-of-proceeds storage.
src/storage/use-of-proceeds/UseOfProceedsRepo.ts Adds repository wrapper for use-of-proceeds persistence and queries.
src/storage/use-of-proceeds/UseOfProceedsRepo.test.ts Unit tests for UseOfProceedsRepo save/query/clear.
src/storage/offering/SpacUnitTermsSchema.ts Adds storage schema/token for SPAC unit terms.
src/storage/offering/SpacUnitTermsRepo.ts Adds repo for SPAC unit terms read/write.
src/storage/offering/SpacUnitTermsRepo.test.ts Unit tests for SPAC unit terms persistence.
src/storage/offering/OfferingTermsSchema.ts Adds storage schema/token for equity offering terms.
src/storage/offering/OfferingTermsRepo.ts Adds repo for equity offering terms read/write.
src/storage/offering/OfferingTermsRepo.test.ts Unit tests for equity offering terms persistence.
src/storage/offering/IssuerTickerSchema.ts Adds point-in-time issuer ticker series schema/token.
src/storage/offering/IssuerTickerRepo.ts Adds ticker series persistence + deterministic history ordering.
src/storage/offering/IssuerTickerRepo.test.ts Unit tests for ticker history ordering and clear behavior.
src/storage/canonical/UnderwriterLinkSchema.ts Adds issuer↔underwriter link schema (company + family tier).
src/storage/canonical/UnderwriterLinkRepo.ts Adds repo for underwriter links + issuer lookup by family.
src/storage/canonical/UnderwriterLinkRepo.test.ts Unit tests for underwriter link save/clear/list.
src/storage/canonical/UnderwriterFamilyMembershipSchema.ts Adds membership schema/token for underwriter family membership.
src/storage/canonical/UnderwriterFamilyMembershipRepo.ts Implements underwriter membership repo via shared base.
src/storage/canonical/UnderwriterFamilyMembershipRepo.test.ts Unit tests for membership recording + family listing.
src/storage/canonical/SponsorFamilyMembershipRepo.ts Refactors sponsor membership repo to use shared FamilyMembershipRepo.
src/storage/canonical/FamilyMembershipRepo.ts Adds shared membership repo base for sponsor/underwriter families.
src/storage/canonical/CanonicalUnderwriterFamilySchema.ts Adds canonical underwriter family schema/token.
src/storage/canonical/CanonicalUnderwriterFamilyRepo.ts Adds repo for canonical underwriter families.
src/storage/canonical/CanonicalUnderwriterFamilyRepo.test.ts Unit tests for find-by-(resolver,name) underwriter families.
src/storage/canonical/CanonicalUnderwriterFamilyAliasRepo.ts Adds underwriter family alias repo based on shared alias base.
src/storage/canonical/CanonicalUnderwriterFamilyAliasRepo.test.ts Unit tests for underwriter family alias semantics.
src/storage/canonical/CanonicalSponsorFamilyAliasRepo.ts Refactors sponsor family alias repo to shared alias base.
src/storage/canonical/CanonicalFamilyAliasRepo.ts Adds shared single-hop alias repo base for family tiers.
src/storage/canonical/CanonicalAliasSchemas.ts Adds underwriter-family alias schema + DI token.
src/sec/html/parseEdgarHtml.golden.test.ts Updates golden segmentation expectations for new S‑1 sections.
src/sec/forms/registration-statements/s1/useOfProceedsSchema.ts Adds structured-output schema for use-of-proceeds extractor.
src/sec/forms/registration-statements/s1/underwriterSchema.ts Adds structured-output schema for underwriter extractor.
src/sec/forms/registration-statements/s1/sectionExtractors.ts Adds extractOfferingTerms, extractUnderwriters, extractUseOfProceeds.
src/sec/forms/registration-statements/s1/sectionExtractors.test.ts Adds unit tests for the new extractor functions.
src/sec/forms/registration-statements/s1/offeringTermsSchema.ts Adds structured-output schema for offering terms + tickers.
src/sec/forms/registration-statements/s1/DocumentTreeSegmenter.test.ts Tests tree segmentation for new offering-related headings.
src/sec/forms/registration-statements/s1/DocumentSegmenter.ts Adds S‑1 section IDs + heading patterns for new sections.
src/sec/forms/registration-statements/Form_S_1.storage.useofproceeds.test.ts Integration test for writing use-of-proceeds rows.
src/sec/forms/registration-statements/Form_S_1.storage.underwriters.test.ts Integration test for underwriter extraction + family resolution + linking.
src/sec/forms/registration-statements/Form_S_1.storage.ts Major refactor: adds new persistence, new resolvers, and runSection() ceremony.
src/sec/forms/registration-statements/Form_S_1.storage.test.ts Updates expectations for newly dead-lettered sections when headings absent.
src/sec/forms/registration-statements/Form_S_1.storage.offering.test.ts Integration tests for equity vs SPAC offering terms and ticker persistence.
src/resolver/UnderwriterFamilyResolver.ts Adds resolver that maps underwriter common name → canonical family id.
src/resolver/UnderwriterFamilyResolver.test.ts Unit tests for underwriter family normalization, reuse, and alias following.
src/resolver/SponsorFamilyResolver.ts Refactors sponsor resolver to use shared FamilyResolver.
src/resolver/resolverIds.ts Adds underwriter-family and introduces isFamilyResolverId helper.
src/resolver/resolverIds.test.ts Updates resolver ID assertions.
src/resolver/FamilyResolver.ts Introduces shared mutexed “family name → canonical id” resolver core.
src/config/TestingDI.ts Registers new storages/tokens for in-memory tests.
src/config/setupAllDatabases.ts Adds database setup for new schemas/tables.
src/config/DefaultDI.ts Registers new storages/tokens for default (SQLite) DI.
src/commands/underwriterFamily.ts Adds CLI commands and query to list IPO issuers by underwriter family.
src/commands/underwriterFamily.test.ts Unit tests for underwriter-family issuer query and alias unioning.
src/commands/index.ts Wires in the new underwriter-family CLI command group.
src/cli/queries/ResolverCoverage.ts Refuses coverage for family-tier resolver kinds.
src/cli/queries/ResolverCoverage.test.ts Adds test asserting family resolver kinds are rejected for coverage.
src/cli/groups/resolve.ts Refuses sec resolve for family-tier resolver kinds.
CLAUDE.md Documents new S‑1 offering sections and underwriter-family management commands.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/storage/canonical/UnderwriterLinkSchema.ts
Comment thread CLAUDE.md Outdated
…rect CLAUDE.md versioning docs

Addresses Copilot review on PR #128:
- role_detail now an enum-constrained nullable column (UNDERWRITER_ROLES | null)
  instead of free-form string, so typos/free-form values can't be persisted.
- CLAUDE.md no longer advertises 'sec version coverage/drop-previous resolver
  underwriter-family' (and resolve) as supported — those intentionally refuse
  family-tier kinds; documented as deferred.
@sroussey sroussey merged commit 960d707 into main Jun 6, 2026
1 check passed
@sroussey sroussey deleted the claude/great-thompson-ytNPX branch June 6, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants