feat: Support Spark Streaming trigger modes (Trigger.ProcessingTime) for continuous per-model processing#28
Merged
Conversation
…for continuous per-model processing Add processing_time trigger mode that continuously loops models: discover files -> process batches -> sleep -> repeat. Fast models are no longer blocked by slow ones during long dbt runs. New files: - trigger_config.py: TriggerConfig dataclass + interval parser - test_trigger_config.py: 27 parsing/validation tests - test_trigger_lifecycle.py: 11 lifecycle/shutdown tests Modified: - constants.py: trigger-related constants (30-day timeout) - impl.py: wait_for_next_cycle(), parse_trigger(), signal handlers - incremental.sql: trigger config, dynamic loop cap, inter-cycle sleep - README.md: new Streaming / Continuous Processing section Closes #27 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dbt's Jinja sandbox blocks range() calls larger than 100,000. The previous value of 999,999,999 caused 'Range too big' errors. 99,999 cycles at 10s intervals = ~11.5 days, well within the 30-day timeout for processing_time models. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
processing_timetrigger mode that continuously loops models: discover files → process batches → sleep → repeat. Fast models are no longer blocked by slow ones during longdbt runsessions.Closes #27
Changes
trigger_config.pyTriggerConfigfrozen dataclass +parse_trigger_config()with interval parser (10s,1 minute, etc.)constants.pyDEFAULT_PROCESSING_TIME_TIMEOUT_SECONDS(30 days)impl.pywait_for_next_cycle()(interruptible sleep viathreading.Event),parse_trigger(),reset_cycle_count(),get_processing_time_timeout(), SIGTERM/SIGINT signal handlersincremental.sql999Mforprocessing_time), inter-cycle sleep + re-discover, auto 30-day timeout, cache clearing between cyclesREADME.mdtest_trigger_config.pytest_trigger_lifecycle.pyUsage
Design Decisions
processing_timeuses a very large loop cap (999,999,999). When files are exhausted, Jinja callsadapter.wait_for_next_cycle()which sleeps and returnsTrue/False.processing_time(table does full DELETE each time, incompatible with continuous processing).threading.Eventfor graceful shutdown — thread-safe, interruptible sleep viaevent.wait(timeout=interval).processing_timemodels (user can override viajob_timeout_seconds).Testing