Skip to content

feat: add scheduled workflow history cleaner#1122

Open
youngxpepp wants to merge 1 commit into
conductor-oss:mainfrom
youngxpepp:feat/workflow-history-cleaner
Open

feat: add scheduled workflow history cleaner#1122
youngxpepp wants to merge 1 commit into
conductor-oss:mainfrom
youngxpepp:feat/workflow-history-cleaner

Conversation

@youngxpepp
Copy link
Copy Markdown

Summary

Closes #1081.

Adds WorkflowHistoryCleaner, a Spring @Scheduled component that periodically deletes terminal workflows (COMPLETED/FAILED/TERMINATED/TIMED_OUT) older than the configured retention threshold.

  • Pages candidates with IndexDAO#searchArchivableWorkflows and delegates the actual removal to ExecutionDAOFacade#removeWorkflow, reusing the existing execution-store / index ordering guarantees.
  • Single-instance execution is enforced through the pluggable com.netflix.conductor.core.sync.Lock abstraction with non-blocking tryLock semantics — contenders exit immediately instead of queueing on the lock.
  • Cooperative shutdown via LifecycleAwareComponent: any in-flight run checks isRunning() between iterations and at workflow boundaries.
  • Safety rails: bounded iterations per day, fixed-size LRU of recently-processed ids (mitigates async index-delete lag), and configurable batch pauses.
  • Disabled by default to keep existing deployments unaffected; opt in with conductor.workflow-history-cleanup.enabled=true.

Rolling indices (conductor_task_log_*, conductor_event_*, conductor_message_*) are out of scope — manage their retention through index-level lifecycle policies (e.g. OpenSearch ISM).

Key Properties

Property Default Description
conductor.workflow-history-cleanup.enabled false Master switch
conductor.workflow-history-cleanup.cron 0 0 * * * * Schedule (top of the hour)
conductor.workflow-history-cleanup.zone UTC Cron time zone
conductor.workflow-history-cleanup.retention-days 30 Lower bound of the deletion window
conductor.workflow-history-cleanup.catch-up-days 7 Number of day-buckets walked per run
conductor.workflow-history-cleanup.max-iterations-per-day 200 Safety cap on the iteration loop
conductor.workflow-history-cleanup.processed-cache-size 5000 LRU size for recently-processed ids
conductor.workflow-history-cleanup.batch-pause 500ms Pause between iterations
conductor.workflow-history-cleanup.index-refresh-wait 2s Wait when the same ids keep coming back
conductor.workflow-history-cleanup.lock-id workflow-history-cleanup Lock identifier
conductor.workflow-history-cleanup.lock-lease-time 2h Lock TTL
conductor.workflow-history-cleanup.workflow-index-name (derived from index-prefix) Override only if needed

SchedulerConfiguration thread pool size bumped from 3 to 4 to accommodate the new scheduled job.

Test plan

  • ./gradlew :conductor-core:test — 18 new test cases in TestWorkflowHistoryCleaner cover lock contention, async index-delete lag, mid-run shutdown, exception handling, iteration caps, and null-result edge cases.
  • ./gradlew :conductor-core:spotlessCheck
  • ./gradlew :conductor-core:compileJava :conductor-core:compileTestJava
  • Manual smoke test against a real Conductor deployment with OpenSearch (reviewer to validate end-to-end behavior)

🤖 Generated with Claude Code

Adds WorkflowHistoryCleaner, a Spring @scheduled component that periodically
deletes terminal workflows older than the configured retention threshold.
The batch pages over candidates with IndexDAO#searchArchivableWorkflows and
delegates the actual removal to ExecutionDAOFacade#removeWorkflow, so it
reuses the existing execution-store / index ordering guarantees.

Single-instance execution is enforced through the pluggable
com.netflix.conductor.core.sync.Lock abstraction with non-blocking tryLock
semantics. Cooperative shutdown is handled via LifecycleAwareComponent.
The component is disabled by default; opt in with
conductor.workflow-history-cleanup.enabled=true.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@youngxpepp
Copy link
Copy Markdown
Author

@akhilpathivada Could you take a look when you get a chance? This PR addresses #1081 — based on your earlier comment on the issue, I'd appreciate your review on the initial scope (global TTL only; field-value-based retention left as a follow-up).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Add periodic batch job to clean up old terminal workflow history

1 participant