Skip to content

Fix subscriber crash on Workflow ADD events (nil pointer dereference)#5406

Open
WHOIM1205 wants to merge 2 commits intolitmuschaos:masterfrom
WHOIM1205:fix/nil-pointer-workflow-handler
Open

Fix subscriber crash on Workflow ADD events (nil pointer dereference)#5406
WHOIM1205 wants to merge 2 commits intolitmuschaos:masterfrom
WHOIM1205:fix/nil-pointer-workflow-handler

Conversation

@WHOIM1205
Copy link

What this PR does

Fixes a nil pointer dereference in the ChaosCenter subscriber that caused the subscriber to crash during informer re-list on startup or restart when existing Workflow CRs were present.

Why this change is needed

Kubernetes informers deliver ADD events for all existing resources during initial List or after a restart.
WorkflowEventHandler was assuming oldObj was always non-nil and accessed oldObj.Status.Nodes, which is invalid for ADD events and resulted in a panic.

This caused the subscriber to enter a crash loop in clusters with active or completed chaos workflows.

What changed

  • Guarded access to oldObj using:
    • eventType == "UPDATE"
    • oldObj != nil
  • Preserved existing Pending → Running transition logic for UPDATE events
  • For ADD events, use the current chaos experiment status directly

Impact

  • Subscriber no longer crashes on startup or restart
  • Informer cache warms correctly
  • Chaos experiment updates resume immediately
  • No behavior change for UPDATE events

Risk

Low. The fix is a minimal defensive guard and does not alter business logic for valid update paths.

Test case:

  • Call WorkflowEventHandler(nil, workflowObj, "ADD", startTime)
  • Workflow contains a ChaosEngine node in Running phase
  • ChaosData is present (cd != nil)

Assertions:

  • Function does not panic
  • Returned workflow details use cd.ExperimentStatus
  • No access to old workflow state is attempted

This test validates correct handling of informer re-list behavior and prevents regression.

@WHOIM1205
Copy link
Author

Hi @ispeakc0de
This PR fixes a critical crash in the subscriber caused by informer ADD events on restart.

I’d appreciate a review when you have time.
Happy to make any changes or add tests if needed.

Thanks!

@ispeakc0de
Copy link
Member

@WHOIM1205, Changes looks good. Can you fix the PR checks?

Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
@WHOIM1205 WHOIM1205 force-pushed the fix/nil-pointer-workflow-handler branch from d782c8e to 58c6c2e Compare January 21, 2026 11:30
@WHOIM1205
Copy link
Author

@WHOIM1205, Changes looks good. Can you fix the PR checks?

Thanks for the review

The failing CI was due to a goimports ordering issue in a test file.
I’ve fixed it and pushed the update. All checks should be passing now.

Please let me know if anything else is needed. Thanks!

@WHOIM1205
Copy link
Author

Hi @Ispeakcode, thanks for the review!

All PR checks are passing now. Could you please take another look and let me know if this can be merged, or if anything else is needed from my side?

Thanks!

@WHOIM1205
Copy link
Author

hey @ispeakc0de is there anything which i can change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants