Skip to content

Harden codegen contracts and stabilize examples#3943

Merged
raphael merged 21 commits into
v3from
codegen-hardening
Jun 13, 2026
Merged

Harden codegen contracts and stabilize examples#3943
raphael merged 21 commits into
v3from
codegen-hardening

Conversation

@raphael

@raphael raphael commented Jun 13, 2026

Copy link
Copy Markdown
Member

Summary

This PR hardens Goa code generation around three contracts that were previously implicit:

  • Code generators fail fast on internal invariant violations instead of silently producing partial or ambiguous output.
  • The evaluated design tree is read-only after a single explicit normalization pass, with purity tests covering service, transport, OpenAPI, JSON-RPC, SSE, WebSocket, validation, and gRPC generation.
  • Generated examples are anchored to design identities instead of depending on traversal order or prior PRNG consumption.

The implementation also removes several duplicate or fragile generator paths. JSON-RPC is now modeled as a real HTTP transport variant instead of relying on template-source surgery, gRPC protobuf wrapping uses explicit transform hooks, and HTTP/gRPC type file emission is consolidated behind shared constructors.

Why

Several generator behaviors were correct only because mutation order happened to line up across generators. That made output stability hard to reason about and made downstream changes risky: one generator could mutate shared expressions and affect another generator's examples, validations, or type names.

This PR makes those ownership boundaries explicit:

  • codegen.NormalizeRoot is the only sanctioned post-finalize design mutation.
  • expr.EffectiveValidation computes validation without mutating attributes or alias chains.
  • expr.ExampleGenerator supports derived, rebased, and field-scoped streams so examples are a pure function of the design element they document.
  • OpenAPI body, parameter, schema, and response examples use those scoped streams instead of the root stream.

Notable Fixes

  • Preserves field descriptions for primitive alias schemas while keeping alias validation and anchored examples.
  • Avoids generating unused projected view wrappers for synthetic method result types introduced by normalization.
  • Adds nil guards in generated pointer conversion helpers where transport boundary data can legitimately be absent.
  • Keeps JSON-RPC SSE server streams behind one generated base type and removes duplicated SSE-specific server generation.

Review Notes

The diff is test-heavy. Non-test Go source is smaller overall, while the branch adds coverage and golden fixtures for the new contracts.

Expected generated output changes are limited to:

  • design-anchored example value re-keying in CLI/OpenAPI output,
  • validated bug fixes such as nil guards in pointer conversion helpers,
  • OpenAPI schema-description fixes where the generated schema previously lost authored descriptions.

Test Plan

  • go test ./...
  • make lint
  • Regenerated and compared large downstream consumers against a baseline generated from the branch parent.
  • Verified an example perturbation run that consumes extra PRNG draws before generation still produces byte-identical generated output.
  • Restored all downstream module files and generated files after validation; no downstream generated artifacts are included in this PR.

raphael added 21 commits June 11, 2026 17:43
Pins every file the transport and example generators produce for a design
covering plain JSON-RPC methods, required/optional request IDs, custom
errors with JSON-RPC code mappings, WebSocket streaming, SSE streaming,
mixed HTTP+JSON-RPC transports, and a plain HTTP service. A manifest
golden catches files appearing or disappearing.
Code generation runs only after design validation, so any internal error
past that point is a bug in Goa. Convert the silent failure paths into
panics matching the established 'panic(err) // bug' convention:

- The five transform-error sites that printed to stdout and emitted
  files with empty constructor bodies now panic.
- rpcTag panics on a missing rpc:tag instead of emitting field number 0,
  which produced invalid .proto output rejected by protoc far from the
  cause.
- The gRPC message collector panics on a lookup miss instead of leaving
  a nil message that NPEs at template render time.
- extractHeaders/extractCookies assert the primitive-payload mapping
  invariant instead of silently substituting the whole payload.
- collectAttributes panics on a missing parent attribute instead of
  leaving nil slots, and unknown default-value slice types panic instead
  of emitting possibly non-compiling literals.
- The unreachable string-concat type-name fallback in HTTP request
  builders is gone.

The generated goa binary now recovers panics at its single generation
boundary and prints a 'this is a bug in Goa, please report it' message
with the stack trace; design errors keep their [file:line] eval path.

Close the validation gap the rpcTag panic exposed: custom object error
types are rendered as protobuf messages but their fields were never
tag-validated, so untagged fields silently generated broken .proto
files. gRPC endpoint validation now checks them, and validateMessage no
longer stops tag-checking after the first matched attribute. Test DSLs
that relied on the lax validation now declare Field tags.
- Delete ConvertFile, a ~190-line verbatim dead copy of ConvertFiles, and
  factor ConvertFiles' duplicated collection/grouping loops into
  relevantTypeMaps/groupByConvertPath. A type used as both payload and
  result no longer generates duplicate numbered conversion functions
  (ConvertToFoo2); the convert test pinned that bug and now expects a
  single function.
- Remove the dead union-interface machinery: AttributeContext.IsInterface
  was never set, so its branches in the shared and protobuf transforms
  were unreachable, as was UnionValTypeName. generateHelper now works on
  a Dup'd context instead of mutating the caller's context with deferred
  restores.
- Fold transformAttributeHelpers into collectHelpers behind a topLevel
  flag; drop generateHelper's redundant seen check and
  transformPrimitive's redundant compatibility check.
- Collapse the 12 identical typed-default cases into one reflection loop
  and remove transformUnion's guards for states the preceding length and
  compatibility checks already exclude.
- Table-ize the four identical credential-scheme cases in BuildSchemeData.
- Collapse the no-op package-prefix branch in goTypeDefWithPkgOverride,
  inline the goTypeDef pass-through, and drop GoTypeDefWithTargetPkg in
  favor of the package-private implementation.
- Drop AttributeTags' ignored first parameter (and attributeTags' unused
  parent param in http typedef).
- collectUnionTypes/buildUnionTypeData absorb their view-mode twins
  behind a view flag; buildTypeInits/buildProjections collapse into
  buildViewConversions with a direction parameter.
- MethodData.IsJSONRPCWebSocket is now the single owner of the
  'JSON-RPC method streams over WebSocket' fact; endpoint, service, and
  example generators consume it instead of re-deriving it three
  different ways, and isJSONRPCSSE is gone.
- Unique is implemented via PeekUnique so the probing logic exists once.

Generated output is byte-identical; net -116 lines.
HTTP (net -375 LOC):
- One element walker replaces the extractPathParams/extractQueryParams/
  extractHeaders/extractCookies quartet; the four extractors are thin
  adapters over it.
- One side-parameterized types file builder replaces the mirrored
  server_types.go/client_types.go, with a single local dedup mechanism.
  This fixes the bug where a request body seen on an earlier endpoint
  skipped the later endpoint's WebSocket payload types (no golden design
  exercises the combination so output is unchanged).
- InitArgData construction reuses the embedded AttributeData via small
  per-kind clone helpers instead of six field-by-field copy loops.
- Sweep: always-true guards, finalize-guaranteed nil guards, dead
  Element.AttributeName, RunHTTPDSL wrapper (26 callers now call
  expr.RunDSL), per-endpoint template re-parsing (requestInitTemplate's
  FuncMap was entirely dead), triplicated client-endpoint-init section
  literals, hand-rolled hasSSE loops and duplicated dict helper.

gRPC:
- One side-parameterized types file builder replaces the mirrored
  server_types.go/client_types.go; shared usesAnyType and
  initArgsFromMetadata helpers replace four copies each of the structpb
  detection and metadata-arg loops.
- protobuf_transform: dead targetPtr/sourcePtr params and unreachable
  template branches removed; dead TargetWrapperRefs and package-guess
  heuristics deleted; the live protoc oneof wrapper naming contract is
  one documented helper (protocOneofWrapperRef); checkNil's subsumed
  isRef clause removed; unAlias lives next to getPrimitive.
- buildInitData drops its ignored parameter; duplicate nested meta
  condition removed.

transformPrimitive no longer returns an always-nil error. Generated
output is byte-identical throughout. CLIArgs construction stays in
analyze: building it in client_cli would reorder Example() calls and
churn example values everywhere.
- codegen/cli owns the shared flag construction (MakeFlags/FlagArgData),
  endpoint-parser file assembly (EndpointParserFile), and payload-builder
  file assembly (PayloadBuildersFile); http and grpc client_cli map their
  transport data and delegate. The HTTP parse-endpoint section keeps its
  name and template source (jsonrpc still post-processes it until the
  Phase 3 de-surgery).
- codegen/example owns RootPath and APIPkg, replacing five copies of the
  rootPath/apiPkg boilerplate. The two sites that intentionally use a
  different Unique suffix pattern keep their behavior.
- Location.PackageNameOrDefault replaces the pkgWithDefault duplicate in
  both transports (14 call sites).

Generated output is byte-identical.
…ints

JSON-RPC stops editing HTTP's un-rendered template text with
strings.Replace/regexp. Each former surgery site is now a structural
mechanism in the HTTP templates/data, keyed on a new endpoint-level
EndpointData.IsJSONRPC fact (the method-level flag is wrong for methods
exposed over both HTTP and JSON-RPC — caught by the kitchen-sink Mixed
service):

- request_decoder.go.tpl grows IsJSONRPC branches for the RawRequest
  param and body rewind; the rotted 'return nil,' replacement is gone.
- request encoders are always named for JSON-RPC endpoints and a new
  jsonrpc_request_envelope partial renders the envelope (ID handling
  from Payload.IDAttribute, no-payload case included), replacing the
  regex injection, newJSONRPCBody, the minimal-encoder template, and
  the mid-generation RequestEncoder mutation. The remaining structural
  response-decoder swap is drift-guarded with a count assertion.
- parse_endpoint.go.tpl branches on commandData.JSONRPC for the
  ConnConfigurer/ConnConfigureFunc variation.
- Example CLI templates take FuncSuffix/VarPrefix data; example server
  templates absorb the JSON-RPC blocks behind JSONRPCServices (always
  set, typed nil when absent) and the jsonrpc-owned template copies and
  mount-splice consts are deleted.

Only golden change: EncodePingRequest relocated to endpoint order in
calc client encode_decode (identical body).
http/codegen.ServicesData carries an explicit jsonrpc flag set by the
new NewJSONRPCServicesData constructor; the file constructors JSON-RPC
reuses (types, paths, CLI, example CLI, encode/decode) derive output
paths, header titles, and CLI import paths from it. The jsonrpc
pass-through wrappers (server_types, client_types, paths, client_cli,
example_cli, updateHeader) are deleted and the generators call the http
constructors directly with the JSON-RPC data.

Two latent bugs fixed by construction, visible in the kitchen-sink
goldens: generated JSON-RPC client/types.go headers no longer claim to
be HTTP, and the JSON-RPC CLI no longer imports nonexistent gen/http
client packages for JSON-RPC-only services (updateHeader only rewrote
paths containing 'gen/http').

JSON-RPC endpoints no longer generate dead response/error encoders:
responses are encoded inline via jsonrpc.MakeSuccessResponse, so the
forced emission in http server.go (and jsonrpc's compensating
error-encoder section filter) is gone; the dead Encode*Response
functions disappear from generated jsonrpc servers.

Misc: RunJSONRPCDSL deleted, hasJSONRPCSSE drops its unused params and
guards, the two byte-identical server templates are read from http's
template FS via the new ReadTemplate accessor, and SSE path derivation
uses Service.PathName.
JSON-RPC SSE services generated two parallel stream implementations:
a per-endpoint stream in server/stream.go and a service-level stream in
server/sse.go used only to send protocol-level errors, each with its own
copy of the event writer, header setup, and error sending. One
unexported generated base (sseServerStream, new
sse_server_stream_base.go.tpl) now owns the event writer and
send/sendError; per-endpoint streams embed it keeping their public
Send/SendAndClose/SendError signatures; protocol-level errors
instantiate the base directly. The dead service-level Send/SendError,
its dedupe machinery, and the unused decoder field are gone, and the
SSE endpoint-error path in the handler routes through the per-endpoint
SendError's design-driven error mapping instead of a hardcoded switch
plus fallback.

Everything renders into gen/jsonrpc/<svc>/server/sse.go via ServerFiles;
client/stream.go renders only from ClientFiles (it was generated twice,
once from the misnamed SSEServerFiles which is now deleted along with
its generator wiring).
wrapAttr marks every synthesized wrapper message with grpc:wrapped meta
at creation. Transforms and helper collection unwrap deterministically
via isWrappedAttr/unwrapAttr on the proto side of each pair instead of
calling IsCompatible, treating failure as 'must be wrapped', unwrapping
an attribute literally named field, and retrying — eight retry sites
plus two unconditional unwraps converted. IsCompatible remains at each
site as a pure assertion whose failure now means a real bug, and
unwrapAttr panics when asked to unwrap a non-wrapper, closing the
latent hole where a user type with a genuine 'field' attribute could be
silently mis-unwrapped.

The marker is invisible to type hashing and .proto rendering; generated
output is byte-identical. New table-driven tests pin the marker at each
nesting level and the unwrap panic.
The gRPC transform was a ~1,040-line drifted fork of the shared Go
transform engine. The engine now exposes explicit extension points
(TransformAttrs.Hooks): UnwrapPair/WrapDirective for synthetic wrapper
messages, FieldPairAttrs for per-field normalization, ConvertPrimitive
for cast/Any conversions, full TransformArray/Map/Union renderer
overrides (union parent-message context threaded by parameter, killing
the fork's mutable message state, dupTransformAttrs and
isUnionMessage), HelperNameAttrs, GuardCondition, ZeroTypeName,
ObjectDeref, and an InlineCompositeElems policy flag. Nil hooks are
bit-identical to the previous engine behavior for the HTTP, JSON-RPC,
and service generators.

grpc/codegen now implements those hooks in proto_hooks.go and deletes
its copies of transformAttribute, transformObject,
transformAttributeHelpers, collectHelpers, walkMatches,
transformHelperName and the transformAttrs state struct — 1,040 lines
of drift-prone duplication down to 460 lines of genuinely
proto-specific code.

Two fork bugs die with the drift, both producing non-compiling Go:
bytes fields with defaults compared a slice against a zero value
(slices only compare to nil — the single golden hunk), and aliased
array defaults rendered untyped element literals. Verified end to end
with a repro design (nested map-of-array, primitive unions both
directions, Any, aliased defaults, stream envelopes) whose generated
code now compiles.
Validation codegen no longer mutates the design tree. The new
expr.EffectiveValidation computes the validation that effectively
applies to an attribute — its own validation merged with the user type
alias chain's, in the exact order the old in-place flattening produced —
and returns a fully detached expression that shares no memory with the
design.

This deletes flattenValidations and both write-backs in validationCode,
which could leave att.Validation aliased to a user type's own
ValidationExpr so that later merges contaminated the shared validation
seen by every other consumer. hasValidations is now a side-effect-free
predicate.

New chained-alias golden fixtures pin the merge order and purity tests
assert the design tree is deep-equal before and after generation, with
zero churn in existing goldens.
HTTP: ServerTypeNames/ClientTypeNames are now write-once analysis
state (map[string]struct{}) — no file generator mutates them, and a new
regression test renders the full file set twice from one ServicesData
and asserts byte-identical output. PathInit ClientArgs get fresh
InitArgData/AttributeData copies so the client request-builder's
renames can no longer leak into ServerArgs (the latent
signature-vs-body mismatch when Unique() renamed). The redundant
DupAtt before makeHTTPType in buildResponses is gone.

gRPC: analyze no longer writes the protobuf-shaped request, streaming
request, response, and error messages back onto the design expressions;
the shaped attributes are locals threaded explicitly through the
convert/stream/error builders. The .proto generator and CLI read only
ServiceData, so nothing outside analyze depended on the rewrites. A
purity test snapshots the endpoint expressions and asserts analyze
leaves them untouched.

Generated output is byte-identical.
The protobuf-equivalent fix for HTTP: analyze no longer writes
alias-flattened/tagged bodies back onto e.Body, response bodies, or
error response bodies. Shaped bodies live in an unexported per-service
cache keyed by expression pointer, so every consumer (payload/response/
error builders, the body-attribute collection passes, websocket) shares
one shaped instance and the call-order-dependent example randomizer
stays byte-stable. The discarded makeHTTPType(e.StreamingBody) no-op is
gone; streaming bodies keep their never-flattened shipped behavior
(flattening them is a separate open decision).

openapi/v3 stops projecting static views into the design's response
body (the dup only copied the type, then wrote the projection into the
original attribute); projection now happens on a detached copy.

This fixes a real production divergence: OpenAPI specs generated after
the HTTP transport pass lost named alias schemas (v2 dropped
definitions and inlined enums, v3 dropped alias descriptions) relative
to what the OpenAPI golden tests — which run on pristine roots — pin. A
new order-independence test generates specs before and after building
transport data and asserts identity (4 of 7 cases failed pre-fix), and
a purity test snapshots body pointers, types, and struct:tag meta
across nine DSL shapes.

Two remaining design-tree mutations are documented for the final
normalization phase: service wrapObject (to be sanctioned in
NormalizeRoot) and json_schema's in-place projected-type prefixing.
codegen.NormalizeRoot is now the only sanctioned design mutation after
eval finalization: it owns the wrapping of anonymous object method
payloads/results in named user types (moved out of service analyze,
which now panics if it meets an unwrapped raw object), replaying the
exact per-service scope seeding so generated names are identical.
generator.Generate runs it once after plugin preparation; test harnesses
apply it in their shared helpers.

Two more mutations die: openapi's ResultTypeRefWithPrefix renamed
projected result types in place (now renames a copy, with an
instance-keyed registry preserving the sticky first-prefix-wins refs),
and makeHTTPTypeRecursive could rewrite the attribute type of the
process-global expr.Empty — found by the new purity test, not by
inspection.

The lasting guarantee is codegen/generator/purity_test.go: for designs
covering alias chains, views, WebSocket, SSE, JSON-RPC mixed transport,
and gRPC unions/streaming, it deep-snapshots every expression reachable
from the root, runs every generator (Service, Transport, OpenAPI,
Example), and asserts the design is bit-identical. expr's
GeneratedResultTypes append and the example randomizer cache remain
documented process-global exclusions.

End-to-end: goa gen + goa example output is byte-identical to HEAD~1
across HTTP/gRPC/streaming/viewed-collections including the
name-collision case.
Example values were drawn from a single sequential PRNG seeded by the
API name, so every value depended on how many draws happened earlier in
the generation run. Any change to generator internals — or to an
unrelated part of the design — shifted all downstream examples, which
is why regenerating with a new goa version churned examples even though
the designs were untouched.

Examples are now anchored to design identities. ExampleGenerator gains
Derived (path-relative streams), Rebased (absolute identity streams),
and Field (the identity of one element extracted from a payload or
result). User types draw from a stream seeded by their type ID no
matter where they are reached from, object fields derive from their
field name, array elements and map entries from their index, and union
members from their name. The transport walkers anchor param, header,
cookie, metadata, and security element examples to the payload field
they project, so a standalone element example now equals the value of
the same field in the composite payload example.

Generators built around caller-supplied Randomizers cannot re-seed and
keep the sequential behavior. The one-time golden re-keying this causes
is the last one: new unit tests pin order independence, draw-count
independence, field locality, and element/composite example agreement.
The JSON schema walk drew every node's example from the shared root
stream, so schema examples — including the tool schemas goa-ai builds
through this package — depended on generation order. The walk now
threads an example generator anchored at each design boundary: type
definitions rebase on the user type ID, object properties derive from
the property name, array items from the element index, map entries and
union members likewise. A schema's composite example now equals the
expr-level example of the same type, and exported signatures are
unchanged.
Path, query, header, cookie, and SSE Last-Event-ID parameter examples
drew from the shared root stream. They now anchor to the payload field
they project via ExampleGenerator.Field, matching the transport
walkers, so parameter examples agree with the composite payload example
and survive generator reorderings.
The v3 schemafier carried one example stream for the whole walk so
inline schema examples (notably the OpenAPI 3.2 event content schemas)
depended on generation order. The schemafier now forks a derived stream
per object property, array item, map entry, and union member, and
rebases on the user type identity at type boundaries — same anchoring
contract as the shared JSON schema package and the expr layer.
Anonymous body types (inline arrays, maps, primitives) have no type
identity, so their schema and content examples drew from the shared
sequential stream and changed whenever generation order did. Both the
schema walk and the media-type example initialization now rebase on a
service.endpoint.role identity, which also makes the schema-level and
content-level examples of the same body agree.

With this, a consumer-level perturbation test (consuming PRNG draws
before generation) leaves an entire real-world gen tree — transport
code, OpenAPI 3.0/3.2 documents, and goa-ai tool schemas — byte
identical: every generated example is now a pure function of the
design.
Keep normalized synthetic method result wrappers out of projected view output, and preserve primitive alias schema descriptions while retaining anchored examples.
@raphael raphael merged commit 11527bf into v3 Jun 13, 2026
5 checks passed
@raphael raphael deleted the codegen-hardening branch June 13, 2026 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant