Add Daily Dedupe Digest workflow#48244
Conversation
Adds .github/workflows/dedupe-digest.yml: a scheduled daily workflow (08:00 UTC, plus workflow_dispatch) that maintains a rolling "[Dedupe Digest] YYYY-MM-DD" issue assigned to niels9001(configurable via env) so duplicates can be reviewed manually in one place. Each run: - Aggregates carry-overs from all open digests via stable HTML-comment markers <!-- candidate:NEW=N ORIG=O -->. Drops candidates whose new or original issue is closed, or whose new issue already bears the 'duplicate' label. - Discovers fresh candidates by asking gpt-4o-mini to compare each new issue against a pool of recently-updated open issues (biased toward candidates sharing a Product-* label when present). - Categorizes candidates as AI-flagged (already labeled 'duplicate' by automatic-issue-deduplication.yml), needs-review, or low-confidence. - Creates the new digest, then comments "Superseded by #N" and closes ALL prior open digests. Bootstraps the dedupe-digest label idempotently on first run. Augments (does not replace) the existing automatic-issue-deduplication workflow; coexists with Fabric Bot's Resolution-Duplicate auto-close (which only fires on human /dup, not the AI-applied 'duplicate' label). Sanitizes attacker-controlled text (issue titles, model-produced reasons) to prevent injection of fake carry-over markers into future digests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@check-spelling-bot Report🔴 Please reviewSee the 📂 files view, the 📜action log, 👼 SARIF report, or 📝 job summary for details.Unrecognized words (2)dedup These words are not needed and should be removedDedup DWRITE LWIN nonstd VCENTER VREDRAWTo accept these unrecognized words as correct and remove the previously acknowledged and now absent words, you could run the following commands... in a clone of the git@github.com:niels9001/PowerToys.git repository curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/cfb6f7e75bbfc89c71eaa30366d0c166f1bd9c8c/apply.pl' |
perl - 'https://github.com/microsoft/PowerToys/actions/runs/26762891586/attempts/1' &&
git commit -m 'Update check-spelling metadata'Warnings
|
| Count | |
|---|---|
| 2 |
See
If the flagged items are 🤯 false positives
If items relate to a ...
-
binary file (or some other file you wouldn't want to check at all).
Please add a file path to the
excludes.txtfile matching the containing file.File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.
^refers to the file's path from the root of the repository, so^README\.md$would exclude README.md (on whichever branch you're using). -
well-formed pattern.
If you can write a pattern that would match it,
try adding it to thepatterns.txtfile.Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.
Note that patterns can't match multiline strings.
| cancel-in-progress: false | ||
|
|
||
| env: | ||
| DIGEST_ASSIGNEE: niels9001 |
| async function askModelForDupes(newIssue, pool) { | ||
| const token = process.env.GITHUB_TOKEN; | ||
| if (!token) { | ||
| console.log('GITHUB_TOKEN is not set; skipping AI dedup.'); |
Summary
Adds
.github/workflows/dedupe-digest.yml, a daily scheduled workflow (08:00 UTC +workflow_dispatch) that maintains exactly one open[Dedupe Digest] YYYY-MM-DDissue assigned to@niels9001. Goal: give triagers a single place to manually review duplicate candidates each day.Behavior
Each run:
dedupe-digestlabel AND the title prefix.<!-- candidate:NEW=N ORIG=O -->. Drops candidates where:duplicate, orProduct-*label when one exists.automatic-issue-deduplication.ymlalready applied theduplicatelabelduplicatelabel yet[Dedupe Digest] YYYY-MM-DD, labeleddedupe-digest, assigned to the configured user).Superseded by #Ncomment.If no carry-overs and no new candidates: skips creation; an open prior digest (if any) stays open until manually closed.
Coexistence with existing automation
automatic-issue-deduplication.yml. That workflow keeps applying theduplicatelabel per-issue; the digest just surfaces what it did (plus medium/low-confidence candidates that the per-issue action didn''t flag).resourceManagement.ymlfires only onResolution-Duplicate(set by human/dup), not on theduplicatelabel applied by the AI action. The digest does not touch either label.dedupe-digestlabel idempotently on first run.Configuration (workflow env)
Safety
Companion PR
The PR auto-labeler is in a separate PR for focused review.