Skip to content

Fix/applyset prune uid precondition#1040

Open
WHOIM1205 wants to merge 1 commit intokubernetes-sigs:mainfrom
WHOIM1205:fix/applyset-prune-uid-precondition
Open

Fix/applyset prune uid precondition#1040
WHOIM1205 wants to merge 1 commit intokubernetes-sigs:mainfrom
WHOIM1205:fix/applyset-prune-uid-precondition

Conversation

@WHOIM1205
Copy link

applyset: prevent stale deletes during prune using UID preconditions

Summary

This PR fixes a time-of-check to time-of-act (TOCTOU) race in the ApplySet prune logic.

Between the LIST and DELETE steps, a resource can be deleted and recreated with the same name but a different UID. Without a UID precondition on DELETE, the newly recreated object could be incorrectly deleted.

This change adds a UID precondition to prune DELETE calls and correctly handles 409 Conflict responses.


Problem

The prune flow:

  1. LIST candidate resources.
  2. DELETE each resource by name.

If a resource is recreated between these steps, the DELETE may remove the replacement object because it only matches by name.

When a UID precondition is used, the API server returns a 409 Conflict if the UID no longer matches. This is expected and safe behavior, but it must not cause prune to fail.


Changes

applyset.go

In the prune method DELETE section:

  • Capture UID from the listed object.
  • Pass it via DeleteOptions.Preconditions.UID.
  • Treat both IsNotFound and IsConflict as safe-to-ignore.
  • Only append to Pruned results when err == nil (actual delete occurred).

This ensures:

  • Stale deletes are prevented.
  • Recreated resources are not deleted.
  • Prune does not fail on expected 409 Conflict.

applyset_test.go

Added TestPrune_UIDPrecondition with two subtests:

Subtest Scenario Expected Result
uid matches - resource pruned Normal prune 1 pruned, no error
uid mismatch - resource not pruned DELETE returns 409 Conflict 0 pruned, no error, resource still exists

The test verifies that:

  • UID match behaves normally.
  • UID mismatch safely skips delete.
  • No error is returned on 409.
  • The recreated resource remains intact.

Verification

Screenshot 2026-02-11 090315

Run:

go test ./pkg/controller/instance/applyset/... -v -count=1

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 11, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @WHOIM1205. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 11, 2026
@WHOIM1205 WHOIM1205 force-pushed the fix/applyset-prune-uid-precondition branch from 23aa286 to 5c92048 Compare February 11, 2026 03:42
@WHOIM1205
Copy link
Author

/assign @a-hilaly
Closed the previous PR after cleanup. This PR contains the corrected fix per review (409 handling + updated test).

Copy link
Member

@a-hilaly a-hilaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two tiny suggestions, thanks @WHOIM1205 !

Comment on lines +661 to +668
// Simulate: between LIST and DELETE, the resource was recreated with a new UID.
// The fake client has the recreated object (new UID), so when DELETE comes in
// with precondition for the old UID, it should fail with 409 Conflict.
recreated := newConfigMap("orphan-cm", "default")
recreated.SetLabels(map[string]string{
ApplysetPartOfLabel: applySetID,
})
recreated.SetUID(types.UID("new-uid"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this resource isn't used

@a-hilaly
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 11, 2026
@a-hilaly
Copy link
Member

/retest

@WHOIM1205 WHOIM1205 force-pushed the fix/applyset-prune-uid-precondition branch from 5c92048 to ef0e8fd Compare February 11, 2026 19:53
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: WHOIM1205
Once this PR has been reviewed and has the lgtm label, please ask for approval from a-hilaly. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@WHOIM1205
Copy link
Author

@a-hilaly
addressed the review comments:

  • Added a V(2) debug log for 409 Conflict (UID mismatch) cases.
  • Fixed the test to properly use the recreated object when simulating UID mismatch.

Tests passing locally.

Comment on lines 559 to 576
if err == nil {
mu.Lock()
results = append(results, PruneResultItem{Object: c.obj})
mu.Unlock()

a.log.V(2).Info("pruned resource",
"name", c.obj.GetName(),
"namespace", c.obj.GetNamespace(),
"gvr", c.gvr.String(),
)
a.log.V(2).Info("pruned resource",
"name", c.obj.GetName(),
"namespace", c.obj.GetNamespace(),
"gvr", c.gvr.String(),
)
}
if apierrors.IsConflict(err) {
a.log.V(2).Info("skipped prune due to UID mismatch (resource recreated)",
"name", c.obj.GetName(),
"namespace", c.obj.GetNamespace(),
"gvr", c.gvr.String(),
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if err == nil {
mu.Lock()
results = append(results, PruneResultItem{Object: c.obj})
mu.Unlock()
a.log.V(2).Info("pruned resource",
"name", c.obj.GetName(),
"namespace", c.obj.GetNamespace(),
"gvr", c.gvr.String(),
)
a.log.V(2).Info("pruned resource",
"name", c.obj.GetName(),
"namespace", c.obj.GetNamespace(),
"gvr", c.gvr.String(),
)
}
if apierrors.IsConflict(err) {
a.log.V(2).Info("skipped prune due to UID mismatch (resource recreated)",
"name", c.obj.GetName(),
"namespace", c.obj.GetNamespace(),
"gvr", c.gvr.String(),
)
}
if apierrors.IsConflict(err) {
a.log.V(2).Info("skipped prune due to UID mismatch (resource recreated)",
"name", c.obj.GetName(),
"namespace", c.obj.GetNamespace(),
"gvr", c.gvr.String(),
)
}
if err == nil {
mu.Lock()
results = append(results, PruneResultItem{Object: c.obj})
mu.Unlock()
a.log.V(2).Info("pruned resource",
"name", c.obj.GetName(),
"namespace", c.obj.GetNamespace(),
"gvr", c.gvr.String(),
)
}

@a-hilaly
Copy link
Member

@WHOIM1205 you haven't addressed all the provided feedback

…race

Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
@WHOIM1205 WHOIM1205 force-pushed the fix/applyset-prune-uid-precondition branch from ef0e8fd to 0705e25 Compare February 12, 2026 19:44
@WHOIM1205
Copy link
Author

WHOIM1205 commented Feb 12, 2026

@WHOIM1205 you haven't addressed all the provided feedback

Refactored the post-delete handling into a switch so the conflict, success, and error paths are clearly mutually exclusive. No behavior change just cleaner structure per feedback. All tests passing.

@WHOIM1205
Copy link
Author

@a-hilaly , @jakobmoellerdev is there anything i can change in this pr

}

if err != nil && !apierrors.IsNotFound(err) {
switch {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this switch case is a bit hard to understand. My understanding is that if the err != nil and it is not found or conflict, we are no longer appending the item to prune result. is that correct?

@jakobmoellerdev
Copy link
Member

@WHOIM1205 friendly reminder

@a-hilaly
Copy link
Member

friendly reminder @WHOIM1205

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants