-
Notifications
You must be signed in to change notification settings - Fork 74
Description
Summary
Decouple signature replication from the promotion pipeline by moving it to a dedicated periodic Prow job. This eliminates rate limit contention between cosign (which makes untracked HTTP requests during signing) and signature copy operations.
Background
PR #1713 split signing and replication into separate pipeline phases. However, both still run within the same promotion job and share rate limit budget. Since cosign's HTTP requests bypass our rate limiter entirely, the effective rate limits during signing are unreliable. A dedicated replication job would fully decouple the two workloads.
Proposal
- Add a
kpromo cip replicate-signaturessubcommand - It reads the existing promotion manifests to discover which images should exist in which registries
- For each image, it checks whether the signature tag exists in the primary registry and copies it to any mirror registries where it is missing
- The operation is fully idempotent; re-running when everything is already replicated is a series of fast existence checks
- Set up a Prow periodic job running every 30 minutes
Progress
- Add
kpromo cip replicate-signaturessubcommand (Add kpromo cip replicate-signatures subcommand #1715) - Release v4.3.0 with the new subcommand (Release k-sigs/promo-tools@v4.3.0 #1722)
- Merge
ci-k8sio-image-signature-replicationperiodic Prow job (Add periodic signature replication job for image promoter kubernetes/test-infra#36516) - Soak and verify
ci-k8sio-image-signature-replicationjob reliability (sig-release-releng-informing, sig-k8s-infra-k8sio) - Remove inline
replicatephase fromPromoteImages()inpromoter/image/promoter.go(pipeline goes from 8 phases to 7: setup → plan → provenance → validate → promote → sign → attest) - Release v4.4.0 and roll out to Prow jobs
- Pin the prow jobs back to v4.4.0, which has been using latest via Use staging latest image for image promotion jobs kubernetes/test-infra#36525 and Use staging latest image for signature replication job kubernetes/test-infra#36521
Until the final rollout, both the inline phase and the periodic job run in parallel, which is safe since replication is idempotent. This overlap period validates the periodic job before the inline phase is removed from production.
Trade-offs
Benefits:
- Signing and replication have completely independent rate budgets
- Replication failures don't block or fail the promotion job
- Self-healing: missed or failed replications are caught on the next run
- Replication concurrency can be tuned independently
Costs:
- Consistency window: mirrors may lack signatures for up to one period (30 min), acceptable since mirrors are already eventually consistent
- Additional Prow job to maintain