Releases: microsoft/bocpy
v0.9.0 - Main Pinned Cowns
Main-pinned cowns — a new PinnedCown subclass holds its
value as a plain PyObject * on the main interpreter, never
round-tripped through XIData. Behaviors whose request set contains
any pinned cown are routed by the scheduler to a single-consumer
main-thread queue and drained by the new pump entry point
(or implicitly by wait, which auto-pumps when pinned cowns
exist). Designed for objects that cannot survive cross-interpreter
shipping — pyglet shapes, Tk widgets, GPU contexts, open file
handles, ctypes pointers. The companion examples/boids.py
rewrite demonstrates the coarse-grained pinned-dispatch pattern:
per-cell physics stays on workers, and one @when(PinnedCown)
per frame batches the write-back into main-thread matrices.
Also in this release: quiesce, a non-tearing-down
checkpoint primitive.
New Features
quiesce(timeout=None, *, stats=False, noticeboard=False)—
blocks until every in-flight behavior completes, without tearing
down workers or the noticeboard thread. Implemented via a new
terminator_seed_incpeer ofterminator_seed_dec
(Pyrona-style seed-up / seed-down pairing) so quiescence becomes
a checkpoint rather than a shutdown. Useful for parallel-search
patterns that need to inspect a best-so-far cown between rounds
and for tests that must read a worker-producedsendqueue
before its producer interpreter is destroyed. Thestatsand
noticeboardflags mirrorwait: returnsNoneby
default, a per-worker statslist[dict]whenstats=True,
a noticeboarddict[str, Any]whennoticeboard=True, or a
WaitResultwhen both are set. RaisesTimeoutError
if quiescence is not reached withintimeout. Exported from
bocpy.__all__.PinnedCown(Cown[T])— a cown whose value lives
permanently on the main interpreter. Constructible only from the
main interpreter (raisesRuntimeErrorfrom workers);
the value is never picklable, never reified twice, and never
reconstructed in a worker. The capsule handle remains a
first-class cross-interpreter shareable — workers may hold it,
embed it in a regularCownvalue graph, and place it in
noticeboard entries, but only the main thread may acquire the
value. See the newpinned_cownspage for the full
contract and the coarse-grained-dispatch pattern.pump(deadline_ms=None, max_behaviors=None, raise_on_error=False)
— drains the main-thread queue of behaviors whose request sets
contain aPinnedCown. Call from your event loop's
idle / on-tick hook (pygletschedule_interval, Tkafter,
asyncio task, …); script-mode programs need not call it
explicitly becausewaitpumps internally. Non-preemptive:
deadline_msgates starting the next behavior, not
interrupting one already running. Body exceptions default to
landing on the result cown's.exception;
raise_on_error=Truere-raises the first body exception after
drain. Returns a newPumpResultNamedTuple
(executed,deadline_reached,raised).set_pump_watchdog(warn_ms=1000, raise_ms=None, on_starve=None)
— configure the pinned-queue starvation watchdog. Both thresholds
gate on queue-non-empty time, not raw last-pump time, so
programs running only unpinned work never trip them. Default is
warn-only; users opt into fail-fast via an explicitraise_ms
so interactive debugger sessions are not wedged by a breakpoint.set_wait_pump_poll(ms=50)— set the poll cadence for
wait's auto-pump loop. Re-read every iteration so a
concurrent call updates the active wait immediately.bocpy.PumpResult— three-fieldNamedTuplereturned by
pump.executedcounts pinned behaviors whose lifecycle
completed (including acquire-failure paths whose MCS chain still
drained).deadline_reachedisTrueonly when the
deadline_msbudget tripped before the queue drained.
raisedcounts only body exceptions captured to a result cown
(cleanup-path failures usePyErr_WriteUnraisableand do not
count). Exported frombocpy.__all__.- Coarse-grained pinned-dispatch
examples/boids.py— the
per-cellsend("update")/ main-threadreceive("update")
barrier is replaced by per-cell physics on workers plus one
pinned@whenper frame that captures every per-cell result
cown together with the two main-threadPinnedCownmatrices
and performs the batched write-back. Same visual output, fully
worker-parallel per-cell work, single main-thread touchpoint.
Public C ABI
bocpy_main_interpid()— newstatic inlinehelper in
<bocpy/bocpy.h>returningPyInterpreterState_GetID( PyInterpreterState_Main())pre-typed asint_least64_tto
matchbocpy_interpidfor owner-field equality checks.
Safe to call from a worker sub-interpreter for diagnostic /
assert use. Additive — existing consumers recompile unchanged;
BOCPY_ABIis unchanged at 1. The
templates/c_abi_consumerbocpy~=pin moves to
~=0.9to signal the new ABI surface it was authored against.
Improvements
@whenloop-variable snapshot via default arg — the
transpiler now acceptsdef b(c, i=i)as an explicit
loop-snapshot idiom in addition to the existing implicit form
(just reference the loop variable in the body). Trailing
positional parameters beyond the cown count are also
auto-captured by name (def b(c, factor)captures
factor).@whenalias decorators — the transpiler now recognises
from bocpy import when as boc_whenandimport bocpy [as alias]followed by@bocpy.when(...)or
@alias.when(...), provided the aliasing import is at module
level. Previously only the bare@whenform was detected.Behaviors.start()compiles the export module on main —
the transpiler's rewritten module is now also instantiated as an
in-memorytypes.ModuleTypeon the main thread (plus a
linecacheentry for traceback fidelity) sopumpcan
resolve__behavior__Nthe same way workers do via their
bootstrap.- Scheduler-owned behavior pre-header —
bq_nodeand the
newpinnedOR-fold byte moved out of the opaque
BOCBehaviorinto a scheduler-ownedboc_behavior_prehdr_t
allocated immediately before each behavior (CPython
_PyGC_Headstyle).boc_sched.cno longer needs any
knowledge ofBOCBehavior's internal layout; layout drift
between the scheduler and its users is impossible by
construction. terminator_wait_pumpable— new entry in
boc_terminator.{c,h}lets the auto-pump loop wake on either
count-zero or main-pinned-depth-becoming-non-zero, both wired
through the existing single condition variable. Single-pumper
enforcement on free-threaded builds (Py_GIL_DISABLED) lives
alongside via aMAIN_PUMP_THREADCAS that raises
RuntimeErrorif a second thread tries to pump
concurrently, cleared on every exit path including
BaseException.
Bug Fixes
- CWE-401: inheriting INCREF leak in
cown_decref_inline—
CownCapsule_reducepacks an encodedXIDatapayload by
taking an inheritingCOWN_INCREFper embedded
CownCapsule, normally balanced when the bytes are
unpickled inside a worker. On the orphan-death path (the
consumer side never deserialised the payload) the matching
COWN_DECREFs never fired and every embedded cown leaked.
cown_decref_inlinenow feeds the encoded bytes through
pickle.loadsand immediately drops the result, which lets
CPython's GC fire the matchingCOWN_DECREFs recursively.
Gated on thepickledflag so nativeXIDataround-trips
(e.g.Matrix) skip the work entirely. - Main-pump behavior reference leak — both
_core_main_pump_boundedand_core_main_pump_drain_all
popped aBehaviorCapsulefromMAIN_PINNED_QUEUEbut
never released the strong reference the capsule held on the
underlyingBOCBehavior. Each pinned behavior leaked
one reference until the runtime was torn down. The pump
helpers nowBEHAVIOR_DECREFthe behavior immediately after
the worker-equivalent cleanup runs. - MSVC
<stdatomic.h>compatibility — Microsoft's
<stdatomic.h>(used by CPython's headers on Windows) does
not expose the unsignedatomic_uint_least64_tor
atomic_uintptr_tforms that the pinned-pump bookkeeping
used.MAIN_PINNED_DEPTH,MAIN_PINNED_NONEMPTY_SINCE_NS,
LAST_PUMP_NS,WATCHDOG_WARN_MS,WATCHDOG_LAST_WARN_NS,
WATCHDOG_ON_STARVEandMAIN_PUMP_THREADare now
atomic_int_least64_t/atomic_intptr_t. Depth never
goes negative; pointer bits round-trip losslessly through the
signed atomic boundary. - CPython 3.10/3.11
PyErr_SetRaisedExceptionpolyfill —
added toinclude/bocpy/xidata.halongside the existing
PyErr_GetRaisedExceptionpolyfill so the public C ABI's
exception-stash pattern compiles on Python versions before
3.12.BOCPY_ABIis unchanged. - Portable
boc_max_align_t— added toboc_compat.has
a union of the most-strictly-aligned fundamental types
(long long,long double,void *, function pointer).
MSVC exposes the C11max_align_tonly under/std:c11,
which the CPython build does not pass; the
boc_behavior_prehdr_tsize assertion now uses
alignof(boc_max_align_t)so the alignment contract holds on
every supported toolchain. - PEP 678
add_note3.10 fallback — the new
Behaviors.quiesceexception-context shim attaches a note
describing the seed-inc / seed-dec balance on failure. CPython
3.10 predatesBaseException.add_note; the shim now
writes toBaseException.__notes__directly whenadd_note
is missing. - Transpiler
except ... as Xmis-classification —
ExceptHandlerbindsXon the handler node
itself rather than viaName`Stor...
v0.8.0 - Matrix/Vector methods and optimisation
Vector-oriented Matrix API — six new methods (vecdot,
cross, normalize, perpendicular, angle,
magnitude_squared), two new read-only properties (size,
length), and a unified in_place= keyword on every unary
method round out Matrix as a first-class vector and
batch-of-vectors type — plus an internal X-macro template refactor
of every _math.c op family that restores the compiler's
auto-vectoriser. 44 of 71 benched rows improved by ≥10%, with
representative wins of −50% to −88% on aggregates, broadcast
arithmetic, and normalize. The _math extension now ships
with -O3 (Linux/macOS) / /O2 (Windows) so end users pick
up the wins by default.
New Features
-
Vector-oriented
Matrixmethods — six new methods designed
for theNx2/2xN/Nx3/3xNvector and
batch-of-vectors shapes that show up inexamples/boids.pyand
similar simulation code:magnitude_squared(axis=None)— squared L2 norm without the
sqrtstep. Cheaper thanmagnitude()and safe for
sub-normal thresholding.vecdot(other, axis=None)— axis-aware inner product matching
numpy.linalg.vecdot. Not equivalent tonumpy.dot;
use@for matrix multiplication. Same-shape, row-broadcast
(1xNvsMxN), and column-broadcast (Mx1vsMxN)
operands are all supported.cross(other, axis=None)— 2D scalar z-component or 3D cross
product. Five shape paths share one method:1x2/2x1
returns a float;1x3/3x1returns a same-orientation
Matrix;Nx2/2xNbatches collect per-vector
scalars;Nx3/3xNbatches return same-shapeMatrix
results.axis=disambiguates the square2x2/3x3
shapes (default per-row).normalize(axis=None, in_place=False)— divide every element
by its magnitude. Zero-magnitude rows / columns are returned as
exact zeros (no NaN, no division by zero).axis=selects
per-row, per-column, or total normalisation.perpendicular(axis=None, in_place=False)— rotate every 2D
vector 90° counter-clockwise:(x, y) -> (-y, x). Accepts a
single 2D vector, anNx2row batch, or a2xNcolumn
batch.angle(axis=None)— polar angleatan2(y, x)of every 2D
vector. Returns a float for a single 2D vector input,
otherwise aMatrixof per-vector angles.
-
Matrix.sizeproperty — total element count
(rows * columns). Matchesnumpy.ndarray.size. -
Matrix.lengthproperty — Frobenius (L2) magnitude as a
read-only@propertyso vector-like code reads naturally
(direction.length,velocity.length) without the
parentheses of a method call. Equivalent tomagnitude()with
no axis argument. -
in_place=keyword on every unaryMatrixmethod —
transpose,ceil,floor,round,negate,
abs, plus the newnormalizeandperpendicularall
acceptin_place=Trueto mutateselfand return it.
Replaces the oldertranspose_in_place()method (see
Breaking Changes below). -
axis=keyword on aggregate methods —sum,mean,
min,max,magnitude, and the newmagnitude_squared
now share a tri-stateaxis=argument (None/0/1)
decoded through a single classifier. Negative axes (-1/
-2) accepted for NumPy parity.
Improvements
-
Auto-vectorised
_math.cop kernels — the binary,
aggregate, unary, and two-operand-aggregate op families inside
_math.care now stamped from per-family descriptor tables,
one kernel per (op, shape) combination. Each per-element body is
literally substituted into its own monomorphic inner loop,
restoring the precondition for GCC's / Clang's auto-vectoriser.
Representative wins (lower is better):Bench row 0.7.0 (ns) 0.8.0 (ns) Δ mean()shape=(1000, 100)44179.6 9001.6 −79.6% mean(1)shape=(1000, 100)51699.4 7058.5 −86.3% max(1)shape=(1000, 100)97184.2 11322.7 −88.3% magnitude()shape=(1000, 3)1098.2 306.8 −72.1% add col-bcastshape=(1000, 100)37823.4 20172.5 −46.7% div same-shapeshape=(1000, 100)80134.2 45458.9 −43.3% normalize()shape=(1000, 3) axis=None3644.6 1775.5 −51.3% Four rows in code paths untouched by the refactor regressed by
5–15% from layout drift (_math.so.textgrew +125% from
kernel specialisation); none are on a hot path. No behavioural
change;test_matrix.pypasses unchanged. -
-O3//O2onbocpy._math— the math extension now
sets per-platformextra_compile_argsinsetup.py
(-O3 -fno-plton Linux/macOS,/O2on Windows) so end-user
wheels and editable installs both pick up the auto-vectoriser
wins above. Otherbocpyextensions are unaffected. The SBOM
hash for_math.*.sowill drift accordingly — see
:doc:sbomfor the auditor-facing note.
Breaking Changes
Matrix.transpose_in_place()removed — superseded by
Matrix.transpose(in_place=True), which returnsselfand
so composes the same way every other unary method does.
Migration is mechanical: replacem.transpose_in_place()with
m.transpose(in_place=True).
Documentation
- New
MatrixAPI entries in :doc:apiforsize,length,
magnitude_squared,vecdot,cross,normalize,
perpendicular, andangle, plus updatedin_place=
keyword signatures on the existing unary methods.
Tests
- 234 new test cases for the new
Matrixmethods and
properties (1571 → 1805 passed). Coverage includes a stub-guard
test that greps__init__.pyifor every new C-level name and
in-cown coverage exercising each new method inside@when. - Portable overflow regex + cross 2x3/3x2 contract pinning —
the cross-product test for the doubly-valid2x3/3x2
shapes now pins the 2D-batch interpretation explicitly, locking
the documented behaviour.
Internal
scripts/bench_matrix.py— bench harness used to gate the
refactor:--jsonappend mode,--report-medianper-row
merge, 200 ms warmup, batch-size auto-tuning.scripts/validate_wheel.py+
scripts/_vendored_warehouse_wheel.py— stdlib-only wheel
RECORDvalidator and a vendored slice of Warehouse's wheel
parser; used by the PR gate to catchRECORDregressions
before PyPI does.
CI / build
cibuildwheelv3.4.0 → v3.4.1 andclang-format-action
pin normalised to the underlying commit SHA (Dependabot's
preferred format). Both pins move in lock-step with the
github-actions Dependabot group.idna3.16 → 3.17 inci/constraints-docs.txt. Five
other Dependabot proposals (docutils0.23,ruamel-yaml
0.19,sphinx-tabs3.4.7+,sphinx-toolbox4.2, and
standard-imghdr3.13) require Python ≥3.11 and so cannot
enter a universal lock that still includes Python 3.10; a
comment aboverequires-python = ">=3.10"in
pyproject.tomllists them for the post-3.10-EOL bump.flake8extend-excludefor.copilot/,build/,
sphinx/build/, and the scratch.env*venvs so the walker
no longer trips on generated or vendored Python files.
0.7.0 - SBOM and Dependency Auditing
Cown-lifecycle correctness fixes — three use-after-free paths in the
CownCapsule pickle / acquire / noticeboard machinery now hold the
inner BOCCown alive across the writer's wrapper drop — plus
supply-chain hardening: pinned and hash-verified Python dependencies,
SHA-pinned GitHub Actions, dependabot coverage, vulnerability scanning,
and PEP 770 SBOMs embedded in every wheel.
New Features
- PEP 770 SBOMs in every wheel — every wheel built by
.github/workflows/build_wheels.ymlnow embeds a
CycloneDX 1.6 <https://cyclonedx.org/specification/overview/>_
JSON SBOM under<dist>-<version>.dist-info/sboms/bocpy.cdx.json.
Generation runs inside cibuildwheel's repair step on every platform
(Linuxauditwheel, macOSdelocate, Windows direct injection)
via the new stdlib-onlyscripts/build_sbom.py. The
injectsubcommand rewrites the wheel'sRECORDatomically
(temp file + rename). - SBOM verification in CI — the new
verify_sbomsjob in
build_wheels.ymlre-downloads the extracted SBOM artifact and
runs two checks:scripts/validate_sbom.py(stdlib-only
structural validator pinning bocpy's wire format) and
grype <https://github.com/anchore/grype>_ (third-party SBOM
scanner) with--fail-on high. A separatesbomsartifact is
also uploaded by themergejob for downstream consumers. bocpy.__version__— a runtime version attribute derived
fromimportlib.metadata.version("bocpy"), with a
PackageNotFoundErrorfallback. Exported frombocpy.__all__
and documented in__init__.pyi.pyproject.tomlremains the
single source of truth for the version.- New documentation — :doc:
sbomwalk-through covering the
embedded SBOM format, extraction recipes, and verification commands. wait(noticeboard=True)final-state capture — :func:wait
now accepts anoticeboardkeyword that returns the final
noticeboard contents as a plaindictat shutdown (after the
noticeboard thread exits, before the entries are freed). Useful
for surfacing an early-stopping result, last error, or aggregated
counter that a behavior deposited just before the runtime
quiesced, replacing the oldersend/receivehandshake
that earlier examples used. Combined withstats=Trueit
returns a new :class:WaitResultNamedTuple(also exported
frombocpy.__all__) carrying both snapshots. The
examples/prime_factor.pyexample was migrated to the new
pattern.
Bug Fixes
- Cown-in-cown use-after-free — a
Cownembedded inside
another cown's value, a message-queue payload, or a noticeboard
snapshot was previously freed when the writer's local wrapper
dropped, because pickle bytes carry no refcount on their own.
CownCapsule_reducenow takes an inheritingCOWN_INCREFthat
_cown_capsule_from_pointer_inheritingconsumes on unpickle, so
the innerBOCCownsurvives until the consumer drops its
decoded wrapper. Affects every cross-cown reference shape — see
the newTestCownInCownclass for the full container-shape fuzz. - Acquire-failure poisoned-state — when
pickle.loadsfailed
partway throughcown_acquire, the cown was left in a
half-acquired state with the encoded bytes still in place. A retry
would re-run pickle against bytes whose embedded inherited refs
had already been partially consumed by pickle's error path,
risking dereferences of freedBOCCown*pointers. The cown's
xidatais now recycled on the failure path and a guard at the
top ofcown_acquirerejects any future acquire with a
deterministicRuntimeError; the worker recovery arm surfaces
it on the failing behavior's result cown. - Noticeboard hidden-cown audit — when a noticeboard value
reached aCownvia a route the pin walker cannot see — custom
__reduce__/__getstate__,copyreg.dispatch_table,
closure capture, module-level cache — the borrowing reconstructor
produced a token whose innerBOCCownwas not held alive by
the entry's pin set, leaving the next reader to UAF after the
writer's wrapper dropped. A per-thread borrowing context
(BOC_NB_CTX) now audits everyCownCapsule_reduceagainst
the caller's pin set during the noticeboard write pickle and
fails the wholenotice_write/notice_updateclosed if
any cown is unaccounted for. UnicodeDecodeErroron non-UTF-8 Windows locales —
Behaviors.startreadworker.pywithopen(path), which
picks uplocale.getpreferredencoding(False). On cp1252
(English Windows) the UTF-8 em-dashes in the worker source were
silently mojibake-d; on cp949 (Korean Windows) the read failed
withUnicodeDecodeError: 'cp949' codec can't decode byte 0xe2
andbocpycould not start at all (reported in
#14 <https://github.com/microsoft/bocpy/issues/14>_ by
@Forthoney <https://github.com/Forthoney>_). Fixed by passing
encoding="utf-8"explicitly inBehaviors.start, and the
same fix was applied to every otheropen()site in the repo
that reads or writes text known to contain non-ASCII bytes
(sphinx/source/conf.py,examples/sketches.pyx2,
export_module.py).- Silent worker-startup failures —
Behaviors.start_workers
raninterpreters.create()andinterpreters.run_string()
on the worker thread without a try/except, so a failure in either
killed the thread without ever replying onboc_behavior. The
parent's boundedreceive()then timed out with no diagnostic.
Both calls are now wrapped, and every failure path sends a
formatted traceback overboc_behaviorso the parent sees a
structured error instead of a timeout. - Silent worker bootstrap import failures — the generated
bootstrap script that loads the user module into each worker
sub-interpreter is now wrapped in a top-level try/except. Any
BaseExceptionis formatted with the user module name and sent
overboc_behavior(falls back tosys.stderrif the
message-queuesenditself raises), then re-raised so
run_stringreports it as well. Module-import failures that
previously surfaced only as a worker-startup timeout now arrive
as a proper traceback. boc_sched_worker_pop_slowskippedpopped_local— the
slow-path pending-fallback and WSQ-dequeue branches returned
work without bumpingpopped_local(the fast path always
did), so the documented producer/consumer identity in
:c:type:boc_sched_stats_twas violated whenever the fairness
arm fired or a worker entered the slow path directly. Both
branches now incrementpopped_localand reset the batch
budget, matching the fast path. The header's reconciliation
paragraph was also tightened to a "near-identity" that explicitly
accounts for fairness-token pops (which are re-enqueued via raw
boc_wsq_enqueuerather thanboc_sched_dispatch, leaving
consumer-side counters without a matching producer-side bump).
Supply Chain
- Hashed and pinned Python dependencies — every CI dependency is
resolved into aci/constraints-<extra>.txtfile via
uv pip compile --universal --generate-hashesand installed with
pip install --require-hashes. Covers thetest,linting,
docs, and newauditextras.bocpyitself is then
installed viapip install -e . --no-depsso an editable build
cannot smuggle in an unpinned transitive dependency. - Vulnerability scanning — new
auditjob inpr_gate.yml
runspip-audit --strictagainst every constraints file on every
PR.pip-audititself is pinned viaci/constraints-audit.txt
and self-checked. A new.github/workflows/nightly_audit.yml
re-runs the audit nightly againstmain. - SHA-pinned GitHub Actions — every
uses:line in
.github/workflows/is now pinned to a full 40-char commit SHA
with a trailing# vX.Y.Zcomment. - Dependabot coverage — new
.github/dependabot.ymlcovers
three ecosystems (piprooted at/ci,github-actions
rooted at/,piprooted at
/templates/c_abi_consumer), grouped weekly per ecosystem. - Downstream template pinned —
templates/c_abi_consumer
pinsbocpy~=MAJOR.MINORas both a build requirement and a
runtime dependency. Thefinalize-prskill bumps it in
lock-step with the root version. - New
SUPPLY_CHAIN.md— top-level policy doc describing
everything above with the exact regeneration commands.
Documentation
- Cown pickle-leak note — :class:
Cownnow documents that
pickle.dumpson a cown produces bytes that carry one strong
reference per embedded cown; orphan bytes (never unpickled in the
producing process) leak one strong ref per byte string. The bocpy
runtime never produces orphan bytes; the leak surface only
applies to third-party code that callspickle.dumps(cown)
directly. - Noticeboard cown-lifetime guarantee — :func:
notice_writeand
:func:notice_updatenow document that values may embed
:class:Cownreferences and that the noticeboard keeps each
embedded cown alive for as long as the entry remains. The new
paragraph in :doc:noticeboardmirrors this guarantee for
readers. - Noticeboard final-state capture guide — :doc:
noticeboard
gained a "Reading the Final State at Shutdown" section covering
thewait(noticeboard=True)contract, the combined
wait(stats=True, noticeboard=True)form returning
:class:WaitResult, the empty-dict fallbacks for the
never-started and never-written cases, and the recommendation
to usesnap.get(key)since :func:waitquiesces as soon as
every behavior ...
v0.6.0 - C ABI
Public C ABI for downstream extensions, enabling C-level participation
in behavior-oriented concurrency across worker sub-interpreters.
New Features
- Decorator composition with
@when— decorators stacked below
@whenare now preserved on the generated behavior function and
compose with the behavior body on the worker. Decorators placed
above@whenraise aSyntaxErrorat transpile time with
actionable guidance.async deffunctions with@whenare
also explicitly rejected. - Public C ABI (
<bocpy/bocpy.h>) — downstream C extensions can
now link against bocpy to register custom Python types as
cross-interpreter shareable so :class:Cowncan carry instances of
them across worker interpreters. The header is C-only, version-gated
via theBOCPY_ABImacro, and bumped on any incompatible change
tobocpy.horxidata.h. Wheels remain CPython-version-tagged
so a runtime ABI mismatch cannot occur. bocpy.get_include()/bocpy.get_sources()— Python-level
helpers that downstreamsetup.pyfiles use to locate the bocpy
headers and the small set of C sources that must be compiled into
the consuming extension.templates/c_abi_consumer/— a ready-to-copy template for
building a C extension against the bocpy ABI, including a
setup.py, a probe extension exercising the public surface, and
a pytest suite (test_public_c_abi.py) that validates the ABI
end-to-end.- C source reorganisation — the per-subsystem translation units
introduced in 0.5.0 have been renamed with aboc_prefix
(boc_compat.[ch],boc_sched.[ch],boc_tags.[ch],
boc_terminator.[ch],boc_noticeboard.[ch],boc_cown.h)
to give the public ABI a stable, namespaced identity.xidata.h
has moved underinclude/bocpy/alongsidebocpy.h.
Documentation
- New :doc:
c_abi, :doc:messaging, and :doc:noticeboardpages
in the Sphinx site; the API reference has been expanded to cover
the public ABI surface.
Breaking Changes
noticeboard_versionremoved — the global monotonic version
counter introduced in 0.4.0 has been removed. It exposed an
implementation detail of the snapshot cache that did not survive
the C ABI review and had no use case that was not better served
bynotice_syncplus an explicitnoticeboard()read.
v0.5.0
Highlights
This release delivers a Verona-RT-style work-stealing scheduler, a global noticeboard (shared key-value store), removal of the central scheduler thread in favour of direct dispatch, and a major C source refactor into per-subsystem translation units with a portable atomics layer.
New Features
- Work-stealing scheduler — the single behavior queue is replaced with a distributed scheduler. Each worker owns an MPMC behavior queue, pops locally first, and steals from peers when idle. Idle workers park on per-worker condition variables and are signalled directly by producer/victim.
- Per-worker fairness tokens — a token node advances through each worker's queue so long-running behaviors cannot monopolise dispatch slots; also drives cooperative shutdown.
- Noticeboard — a shared key-value store (up to 64 keys) readable/writable without acquiring cowns. Writes are non-blocking; reads return a cached per-behavior snapshot. Includes
notice_write,notice_read,notice_update,notice_delete,notice_sync,noticeboard_version, and theREMOVEDsentinel. - Distributed scheduler — two-phase locking, request linking, and dispatch run directly on the caller's thread in C; cown release runs on the executing worker. MCS-style intrusive linked list per cown for zero-bounce handoff.
Cown.exceptionproperty — indicates whether the held value is from an unhandled exception.compat.h/compat.cportability layer — uniformBOCMutex,BOCCond,boc_atomic_*_explicit, monotonic-time, and sleep primitives across MSVC, pthreads, and C11<threads.h>.xidata.hcross-interpreter shim — centralised_PyXIData_*/_PyCrossInterpreterData_*version ladders for CPython 3.12–3.15 (including free-threaded builds).fanout_benchmarkexample — fan-out/fan-in benchmark exercising scheduler throughput under heavy producer load.- Prime factor example (
examples/prime_factor.py) — parallel factorisation via Pollard's rho with noticeboard-coordinated early termination. - Benchmark harness (
examples/benchmark.py) — micro-benchmarks for scheduling throughput, message-queue latency, and noticeboard contention.
Bug Fixes
- Transpiler aliased imports —
visit_Import/visit_ImportFromnow track alias names (import X as Y), preventing spurious "name not found" errors and duplicatewhencallinjection. - Global variable capture —
@whenclosure capture falls back toframe.f_globalswhen a name is not in any local scope, fixingNameErrorfor module-level variables.
Improvements
- In-memory transpiled-module loading — workers
execthe transpiled source from a string literal instead of writing to disk, eliminating filesystem round-trips and leftover.pyfiles. - Nested
@whencapture — the transpiler recurses into nested@when-decorated functions when computing outer captures, so child behaviors can close over the outer frame. - C extension split —
_core.creduced from ~5,000 to ~3,500 lines by extractingsched.{c,h},noticeboard.{c,h},terminator.{c,h},tags.{c,h},cown.h,compat.{c,h}, andxidata.h. - Direct dispatch on cown release —
behavior_release_allhands resolved successors directly to workers viaboc_sched_dispatch, removing one queue hop per handoff. - Cooperative worker shutdown —
boc_sched_worker_request_stop_all/boc_sched_unpause_allprovide a clean stop/drain protocol. - Matrix docstrings — all
MatrixC methods now carry built-in docstrings. - Examples package relocated — moved to top-level
examples/directory (still importable asbocpy.examples). - Filtered PyPI README —
setup.pystrips<!-- pypi-skip-start -->regions before publishing. - Documentation refresh — expanded coverage of noticeboard, distributed scheduler, and new APIs.
Internal Test Modules (opt-in via BOCPY_BUILD_INTERNAL_TESTS=1)
_internal_test_atomics— correctness tests forcompat.htyped-atomics._internal_test_bq— torture tests for the MPMC behavior queue._internal_test_wsq— tests for work-stealing primitives (fast pop, slow pop, steal, park/unpark).
Test Suite
test_noticeboard.py— snapshot semantics,notice_updateatomicity,REMOVED,notice_sync, version monotonicity.test_scheduler_integration.py,test_scheduler_stats.py,test_scheduler_steal.py— end-to-end and per-primitive scheduler tests.test_compat_atomics.py— portable atomics smoke tests.test_stop_retry_composition.py—stop()/start()/wait()retry composition.test_scheduling_stress.py— expanded with fan-out, work-stealing, and shutdown stress scenarios.test_transpiler.py— AST extraction, capture rewriting, aliased imports, module export.
Full changelog: v0.3.1...v0.5.0
v0.3.1
CownCapsule serialization support for nested cowns.
Bug Fixes
- Removed the ownership check in
_cown_sharedthat prevented a
CownCapsulefrom being serialized to XIData when it was the value
of anotherCown. The check was unnecessary —_cown_sharedonly
stores a pointer and ownership is enforced at acquire time.
Improvements
- Added
CownCapsule.__reduce__withCOWN_INCREFpinning so that a
CownCapsuleembedded in a container (dict, list, etc.) can survive
the pickle round-trip used byobject_to_xidata. A module-level
reconstructor (_cown_capsule_from_pointer) inherits the pin without
a redundantCOWN_INCREF, and validates the process ID on unpickle to
guard against cross-process misuse.
v0.3.0
Improvements
- Added
CownCapsule.disown()— abandons a cown's value without
serializing it and resets ownership toNO_OWNER. Used during worker
cleanup to safely discard orphan cowns before the owning interpreter
is destroyed, preventing dangling Python object references. - Rewrote
receiveto use a two-phase spin-then-park strategy for
single-tag untimed receives. Phase 1 spins forBOC_SPIN_COUNT
iterations; Phase 2 parks the thread on a per-queue condvar, eliminating
busy-wait CPU burn. Timed receives and multi-tag receives use
spin-then-backoff with exponential sleep (1 µs → 1 ms cap). - Added platform-abstracted condvar primitives (
BOCParkMutex/
BOCParkCond) with implementations for Windows (SRWLOCK /
CONDITION_VARIABLE), macOS (pthreads), and Linux (C11 threads). - Each
BOCQueuenow carries awaiterscounter,park_mutex, and
park_cond. Producers signal parked receivers after enqueue;
drainandset_tagsbroadcast to wake all parked threads. - Replaced the fixed
thrd_sleepinsendwith asched_yield/
SwitchToThread, reducing send-side latency. - Refactored the monolithic
_core_receiveintoreceive_single_tag
andreceive_multi_tag, each with its own backoff/parking logic. - Moved the
BOC_QUEUE_DISABLEDcheck earlier inget_queue_for_tag
so callers skip disabled queues instead of returning NULL after
tag resolution. - Added Windows-compatible
atomic_load_explicit/
atomic_fetch_add_explicit/atomic_fetch_sub_explicitmacros
usingInterlockedExchangeAdd64. - Declared
Py_mod_gil = Py_MOD_GIL_NOT_USEDin both_coreand
_mathC extensions so that importing bocpy on a free-threaded
Python build (3.13t+) does not re-enable the GIL. - Replaced
PyDict_GetItem(borrowed reference) with
PyDict_GetItemRef(strong reference) inBOCRecycleQueue_recycle
on Python 3.13+, improving forward-compatibility with free-threaded
builds.
Bug Fixes
- Fixed a deadlock when the same cown is passed multiple times to
@when
(e.g.@when(c, c)). Duplicate requests for the same cown caused the
MCS-queue-based two-phase locking to spin-wait on itself. Requests are
now deduplicated by target cown inBehavior.__init__, with
compensatingresolve_onecalls to maintain the behavior count
invariant.
Tests
TestLostWakeStress: single-producer random delays, bursty producer,
and repeated single-message wake to detect lost-wake races.TestMultiTagBackoff: multi-tag receive correctness — second-tag hit,
delayed arrival, per-tag FIFO ordering, timeout, and interleaved
producers.TestTimeoutAccuracy: lower-bound / upper-bound wall-clock checks and
zero-timeout immediacy.- Added tests for duplicate cowns in
@when: same cown twice, thrice,
non-adjacent duplicates, duplicates within a group, and mutation
aliasing semantics.
CI
- Added a
free-threadedCI job that tests against Python 3.13t and
3.14t on Linux, with explicit assertions that the GIL remains disabled
after import.
Full Changelog: v0.2.2...v0.3.0
v0.2.2
Improvements
- Added an ASAN/UBSAN CI job that builds CPython 3.14.2 from source with AddressSanitizer and UndefinedBehaviorSanitizer, then runs the full test suite against instrumented builds of bocpy.
- Updated GitHub Actions to latest versions (
actions/checkout@v6,actions/setup-python@v5).
Bug Fixes
- Fixed a false positive warning message for deallocation of xidata on the main
interpreter after module shutdown. - Changed the clear logic when recycling
v0.2.0
Bugfix release including some minor improvements.
Improvements
- Examples are now included in the package, with script entrypoints for each.
- The
drainlow-level API function is now exposed at the package level wait()will now acquire frame-localCownobjects before shutting down the workers
Dev Tools
- Added an internal cown and behavior reference tracking utility
Bug Fixes
- Fixed a reference counting bug with cown lists
- Fixed an issue where the boids example did not run on windows due a font
setting.
v0.1.0 - Initial Release
Signed-off-by: Matthew A Johnson <matjoh@microsoft.com>