fix(graphql): prevent memory leak and deadlock in subscription resolvers#5397
Open
Sanchit2662 wants to merge 1 commit intolitmuschaos:masterfrom
Open
fix(graphql): prevent memory leak and deadlock in subscription resolvers#5397Sanchit2662 wants to merge 1 commit intolitmuschaos:masterfrom
Sanchit2662 wants to merge 1 commit intolitmuschaos:masterfrom
Conversation
- Add proper cleanup in GetInfraEvents to remove channels on disconnect - Use non-blocking sends in SendInfraEvent to prevent mutex deadlock - Add mutex protection to map deletes in GetPodLog, GetKubeObject, GetKubeNamespace Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
Author
|
Hi @PriteshKiri, @amityt , @SarthakJain26 Whenever you get a chance, I’d really appreciate a review. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes a critical concurrency issue in the ChaosCenter GraphQL subscription layer that could lead to unbounded memory growth and a process-wide deadlock under normal UI usage.
Specifically,
GetInfraEventssubscriptions were leaking channels after client disconnects, andSendInfraEventcould block indefinitely while holding a shared mutex. Over time, this caused the GraphQL server to become unresponsive with no crash logs or clear error signals.The fix ensures proper subscription cleanup, prevents blocking sends, and hardens related cleanup paths against concurrent map access.
Fix
1. Proper subscription cleanup on disconnect
Channels are now removed from the publisher slice when the subscription context is cancelled:
2. Non-blocking event delivery to prevent deadlocks
Event publishing no longer blocks on slow or disconnected subscribers:
This ensures one stalled subscription cannot block the entire system.
3. Thread-safe cleanup in related subscriptions
Cleanup paths in
GetPodLog,GetKubeObject, andGetKubeNamespacenow properly guard map deletes with the shared mutex, preventing concurrent map access panics.Impact
Types of changes
Checklist