Read HVM_THREADS at startup: runtime worker count for the C runtime by SiliconState · Pull Request #441 · HigherOrderCO/HVM2

SiliconState · 2026-06-10T20:36:15Z

Problem

The parallel C runtime fixes its worker count at compile time (TPC = 1 << TPC_L2). There is no way to choose the thread count at run time, so:

You cannot measure parallel scaling (speedup vs N threads) — the single most useful number for a runtime whose headline feature is automatic parallelism.
Users on shared machines / CI can't cap cores.
Benchmarking and regression tooling can't pin a thread count for reproducibility.

Design

An environment variable HVM_THREADS, read by the C runtime at startup. Minimal and fully backward-compatible: unset (or invalid) means exactly the current compile-time TPC behavior, and output stays byte-for-byte identical.

HVM_THREADS=N → use clamp(N, 1, TPC) active workers. Capping at the compiled TPC keeps the existing fixed-size thread pool / per-thread net partitioning intact; a follow-up could allow growing past TPC.
The Rust interpreter (hvm run) ignores it.

Patch (one file: `src/hvm.c`, +37/−7)

Because run.c does #include "hvm.c" and gen-c embeds the hvm.c source text, this single file covers both hvm run-c and standalone gen-c programs.

New globals next to the TPC define: hvm_tpc (active worker count) and hvm_tpc_from_env; hvm_tpc_init() reads getenv("HVM_THREADS") via strtol, rejecting unset/empty/trailing-garbage/< 1 values and clamping to [1, TPC]. Called at the top of hvm_c(), before any thread exists.
The active count replaces TPC at the four places that mean "worker count": the sync_threads() spin-barrier target, the evaluator() idle-counter init and halt check, and the normalize() spawn/join loops. Per-thread memory partitioning keeps the compiled TPC stride, so idle workers simply never start.
Steal-ring fix for non-power-of-two counts: (tm->tid - 1) % TPC relies on unsigned wrap and is only correct for power-of-two counts; it becomes (tm->tid + hvm_tpc - 1) % hvm_tpc (verified at 3/5/7 threads).
When the env var was set and valid, an extra - THREADS: N stats line is printed after MIPS, so tools can confirm the count took effect. When unset, output is unchanged.

Measurements

16 logical cores, WSL2, parallel_sum depth-18; ITRS = 5,898,185 at every count (same work, true scaling):

HVM_THREADS	TIME	MIPS
1	0.16s	35.8
2	0.11s	54.0
4	0.07s	87.0
8	0.05s	120.8
16	0.06s	100.6
32 (clamps to 16)	0.05s	110.2

Re-verified on this branch with examples/sum_rec at 1/3/7/16 threads: identical ITRS at every count, THREADS: N readback present when set, output unchanged when unset.

cargo test passes identically before and after the patch (the two snapshot failures on my machine are a pre-existing rustc panic-format artifact — newer rustc prints thread IDs in panic messages — present on pristine main).

Why this matters downstream

This unlocks a true parallel-efficiency curve for any benchmarking tool: run the same program with HVM_THREADS in {1,2,4,8,…} and compare TIME at identical ITRS. A companion Bend PR adds a --threads N convenience flag that just sets this variable on the spawned hvm process.

The parallel C runtime fixed its worker count at compile time (TPC = 1 << TPC_L2), so parallel scaling could not be measured and cores could not be capped on shared machines. This reads the HVM_THREADS environment variable once at startup and clamps it to [1, TPC]; the thread pool, spin barrier, idle halt check and work-stealing ring all use the active count, while per-thread memory partitioning keeps the compiled TPC stride, so idle workers simply never start. Unset or invalid values keep the compiled TPC and byte-identical output. When the variable is set, a '- THREADS: N' stats line is printed so tools can confirm the count took effect. The steal ring uses (tid + n - 1) % n instead of the (tid - 1) % TPC unsigned-wrap trick, which only works for power-of-two counts. Measured on 16 logical cores (parallel_sum, ITRS 5898185 at every count): 1T 35.8 MIPS, 2T 54.0, 4T 87.0, 8T 120.8; HVM_THREADS=32 clamps to 16. Covers both and standalone gen-c output.

This was referenced Jun 10, 2026

Add --threads N to run-c/run-cu (sets HVM_THREADS on the spawned hvm) HigherOrderCO/Bend#763

Open

Add --profile-json to run/run-c/run-cu: machine-readable run output HigherOrderCO/Bend#764

Open

SiliconState force-pushed the pr/hvm-threads branch from ab5bdcd to a6c01d7 Compare June 11, 2026 01:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read HVM_THREADS at startup: runtime worker count for the C runtime#441

Read HVM_THREADS at startup: runtime worker count for the C runtime#441
SiliconState wants to merge 1 commit into
HigherOrderCO:mainfrom
SiliconState:pr/hvm-threads

SiliconState commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SiliconState commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Design

Patch (one file: src/hvm.c, +37/−7)

Measurements

Why this matters downstream

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SiliconState commented Jun 10, 2026 •

edited

Loading

Patch (one file: `src/hvm.c`, +37/−7)