Skip to content

[perf-improver] perf: fast path in HumanReadableDurationFormatter.Render for sub-hour durations#8861

Open
Evangelink wants to merge 1 commit into
mainfrom
perf-assist/duration-formatter-fast-path-fb8645e5dcd1ce84
Open

[perf-improver] perf: fast path in HumanReadableDurationFormatter.Render for sub-hour durations#8861
Evangelink wants to merge 1 commit into
mainfrom
perf-assist/duration-formatter-fast-path-fb8645e5dcd1ce84

Conversation

@Evangelink
Copy link
Copy Markdown
Member

🤖 This is an automated contribution from Perf Improver.

Goal and Rationale

HumanReadableDurationFormatter.Render is called multiple times per terminal progress-frame render tick — once per visible test-worker line for the "duration unchanged" fast-check, and once more inside AppendTestWorkerProgress/AppendTestWorkerDetail when the line needs a full re-render. For a 4-assembly run refreshing at ~5 fps, this adds up to ~40+ calls per second over the lifetime of the test run.

Each call currently allocates:

  1. A new StringBuilder()
  2. One or two intermediate strings via GetFormattedPart (e.g. "5s", "59s", " 05s")
  3. The final stringBuilder.ToString() result

That is 3–4 heap allocations per call — all for a tiny string like "(5s)" or "(2m 30s)".

Approach

On .NET 8+, use string.Create(IFormatProvider, Span<char>, ref DefaultInterpolatedStringHandler) with a stackalloc buffer. This overload uses the span as a scratch buffer and produces the final heap string in a single allocation — no StringBuilder, no intermediate GetFormattedPart strings.

The fast path activates when:

  • Days == 0 && Hours == 0 (covers virtually all test runs)
  • showMilliseconds == false (the default for all progress-frame callers)

Both conditions are true for every caller in AnsiTerminalTestProgressFrame and SimpleTerminalBase.

The slow path (days, hours, or showMilliseconds=true) is unchanged; it is rarely reached and is not in the render hot path.

Performance Evidence

Scenario Before After
Allocations per Render call (typical: < 1 min) 3–4 (StringBuilder + 1–2 GetFormattedPart strings + result) 1 (result only)
Allocations per Render call (> 1 hour) 3–4 3–4 (unchanged, slow path)
string.Create heap allocations n/a 1 (final string only; scratch buffer is on stack)

Methodology: code inspection + allocation analysis. HumanReadableDurationFormatter.Render is called ~5× per render frame; at 5 fps over a 5-minute run that is ~7 500 calls, saving ~15 000–22 500 small string allocations.

The change is #if NET8_0_OR_GREATER-guarded, so netstandard2.0 behaviour is completely unchanged.

Trade-offs

  • The nested ternary (wrapInParentheses ? (minutes == 0 ? ... : ...) : (minutes == 0 ? ... : ...)) is slightly dense but self-contained. The logic is simple and the four resulting strings are easy to validate visually.
  • No behaviour change for showMilliseconds=true, durations with hours, or durations with days.
  • netstandard2.0 uses the existing slow path as before.

Test Status

  • Microsoft.Testing.Platform.UnitTests (net8.0): 1086 passed, 0 failed, 3 skipped
  • Build (all TFMs: net8.0, net9.0, netstandard2.0): 0 warnings, 0 errors

Reproducibility

./build.sh
artifacts/bin/Microsoft.Testing.Platform.UnitTests/Debug/net8.0/Microsoft.Testing.Platform.UnitTests

Generated by Perf Improver

Generated by Perf Improver · sonnet46 8.8M ·

Add this agentic workflows to your repo

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/perf-improver.md@main

… durations

On .NET 8+, add a fast path that uses string.Create with a stackalloc
buffer for the most common case (duration < 1 hour, no milliseconds).

Before: each Render call allocated a StringBuilder + 1-2 intermediate
strings from GetFormattedPart + the final result string (3-4 allocations).

After (fast path): only the final result string is allocated (1 allocation).

This method is called on every progress-frame render tick (roughly 5 times
per frame) to format durations for each visible test-worker line. The
savings accumulate quickly during long-running parallel test runs.

All progress-frame callers (AnsiTerminalTestProgressFrame, SimpleTerminalBase)
use the default parameters (wrapInParentheses=true, showMilliseconds=false),
so they all benefit from the fast path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 5, 2026 14:40
@Evangelink Evangelink added area/performance Runtime / build performance / efficiency. type/automation Created or maintained by an agentic workflow. labels Jun 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes HumanReadableDurationFormatter.Render in Microsoft.Testing.Platform’s terminal output path by adding a .NET 8+ fast path for the common “sub-hour, no milliseconds” case, reducing per-call allocations during frequent progress-frame rendering.

Changes:

  • Added a #if NET8_0_OR_GREATER fast path using string.Create with a stackalloc scratch buffer for durations where Days == 0 && Hours == 0 and showMilliseconds == false.
  • Kept the existing StringBuilder-based formatting as the unchanged slow path for longer durations or when milliseconds are requested.
Show a summary per file
File Description
src/Platform/Microsoft.Testing.Platform/OutputDevice/Terminal/HumanReadableDurationFormatter.cs Adds a .NET 8+ allocation-reducing formatting fast path for the most common progress-duration rendering scenario.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 0

@Evangelink Evangelink marked this pull request as ready for review June 5, 2026 15:01
int minutes = duration.Value.Minutes;
return wrapInParentheses
? (minutes == 0
? string.Create(CultureInfo.InvariantCulture, stackalloc char[8], $"({seconds}s)")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is going to be significantly faster compared to string interpolation. This also seem to add complications without any proven gains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/performance Runtime / build performance / efficiency. type/automation Created or maintained by an agentic workflow.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants