Skip to content

Releases: microsoft/bocpy

v0.9.0 - Main Pinned Cowns

06 Jun 11:21
6b9e1af

Choose a tag to compare

Main-pinned cowns — a new PinnedCown subclass holds its
value as a plain PyObject * on the main interpreter, never
round-tripped through XIData. Behaviors whose request set contains
any pinned cown are routed by the scheduler to a single-consumer
main-thread queue and drained by the new pump entry point
(or implicitly by wait, which auto-pumps when pinned cowns
exist). Designed for objects that cannot survive cross-interpreter
shipping — pyglet shapes, Tk widgets, GPU contexts, open file
handles, ctypes pointers. The companion examples/boids.py
rewrite demonstrates the coarse-grained pinned-dispatch pattern:
per-cell physics stays on workers, and one @when(PinnedCown)
per frame batches the write-back into main-thread matrices.
Also in this release: quiesce, a non-tearing-down
checkpoint primitive.

New Features

  • quiesce(timeout=None, *, stats=False, noticeboard=False)
    blocks until every in-flight behavior completes, without tearing
    down workers or the noticeboard thread. Implemented via a new
    terminator_seed_inc peer of terminator_seed_dec
    (Pyrona-style seed-up / seed-down pairing) so quiescence becomes
    a checkpoint rather than a shutdown. Useful for parallel-search
    patterns that need to inspect a best-so-far cown between rounds
    and for tests that must read a worker-produced send queue
    before its producer interpreter is destroyed. The stats and
    noticeboard flags mirror wait: returns None by
    default, a per-worker stats list[dict] when stats=True,
    a noticeboard dict[str, Any] when noticeboard=True, or a
    WaitResult when both are set. Raises TimeoutError
    if quiescence is not reached within timeout. Exported from
    bocpy.__all__.
  • PinnedCown(Cown[T]) — a cown whose value lives
    permanently on the main interpreter. Constructible only from the
    main interpreter (raises RuntimeError from workers);
    the value is never picklable, never reified twice, and never
    reconstructed in a worker. The capsule handle remains a
    first-class cross-interpreter shareable — workers may hold it,
    embed it in a regular Cown value graph, and place it in
    noticeboard entries, but only the main thread may acquire the
    value. See the new pinned_cowns page for the full
    contract and the coarse-grained-dispatch pattern.
  • pump(deadline_ms=None, max_behaviors=None, raise_on_error=False)
    — drains the main-thread queue of behaviors whose request sets
    contain a PinnedCown. Call from your event loop's
    idle / on-tick hook (pyglet schedule_interval, Tk after,
    asyncio task, …); script-mode programs need not call it
    explicitly because wait pumps internally. Non-preemptive:
    deadline_ms gates starting the next behavior, not
    interrupting one already running. Body exceptions default to
    landing on the result cown's .exception;
    raise_on_error=True re-raises the first body exception after
    drain. Returns a new PumpResult NamedTuple
    (executed, deadline_reached, raised).
  • set_pump_watchdog(warn_ms=1000, raise_ms=None, on_starve=None)
    — configure the pinned-queue starvation watchdog. Both thresholds
    gate on queue-non-empty time, not raw last-pump time, so
    programs running only unpinned work never trip them. Default is
    warn-only; users opt into fail-fast via an explicit raise_ms
    so interactive debugger sessions are not wedged by a breakpoint.
  • set_wait_pump_poll(ms=50) — set the poll cadence for
    wait's auto-pump loop. Re-read every iteration so a
    concurrent call updates the active wait immediately.
  • bocpy.PumpResult — three-field NamedTuple returned by
    pump. executed counts pinned behaviors whose lifecycle
    completed (including acquire-failure paths whose MCS chain still
    drained). deadline_reached is True only when the
    deadline_ms budget tripped before the queue drained.
    raised counts only body exceptions captured to a result cown
    (cleanup-path failures use PyErr_WriteUnraisable and do not
    count). Exported from bocpy.__all__.
  • Coarse-grained pinned-dispatch examples/boids.py — the
    per-cell send("update") / main-thread receive("update")
    barrier is replaced by per-cell physics on workers plus one
    pinned @when per frame that captures every per-cell result
    cown together with the two main-thread PinnedCown matrices
    and performs the batched write-back. Same visual output, fully
    worker-parallel per-cell work, single main-thread touchpoint.

Public C ABI

  • bocpy_main_interpid() — new static inline helper in
    <bocpy/bocpy.h> returning PyInterpreterState_GetID( PyInterpreterState_Main()) pre-typed as int_least64_t to
    match bocpy_interpid for owner-field equality checks.
    Safe to call from a worker sub-interpreter for diagnostic /
    assert use. Additive — existing consumers recompile unchanged;
    BOCPY_ABI is unchanged at 1. The
    templates/c_abi_consumer bocpy~= pin moves to
    ~=0.9 to signal the new ABI surface it was authored against.

Improvements

  • @when loop-variable snapshot via default arg — the
    transpiler now accepts def b(c, i=i) as an explicit
    loop-snapshot idiom in addition to the existing implicit form
    (just reference the loop variable in the body). Trailing
    positional parameters beyond the cown count are also
    auto-captured by name (def b(c, factor) captures
    factor).
  • @when alias decorators — the transpiler now recognises
    from bocpy import when as boc_when and import bocpy [as alias] followed by @bocpy.when(...) or
    @alias.when(...), provided the aliasing import is at module
    level. Previously only the bare @when form was detected.
  • Behaviors.start() compiles the export module on main
    the transpiler's rewritten module is now also instantiated as an
    in-memory types.ModuleType on the main thread (plus a
    linecache entry for traceback fidelity) so pump can
    resolve __behavior__N the same way workers do via their
    bootstrap.
  • Scheduler-owned behavior pre-headerbq_node and the
    new pinned OR-fold byte moved out of the opaque
    BOCBehavior into a scheduler-owned boc_behavior_prehdr_t
    allocated immediately before each behavior (CPython
    _PyGC_Head style). boc_sched.c no longer needs any
    knowledge of BOCBehavior's internal layout; layout drift
    between the scheduler and its users is impossible by
    construction.
  • terminator_wait_pumpable — new entry in
    boc_terminator.{c,h} lets the auto-pump loop wake on either
    count-zero or main-pinned-depth-becoming-non-zero, both wired
    through the existing single condition variable. Single-pumper
    enforcement on free-threaded builds (Py_GIL_DISABLED) lives
    alongside via a MAIN_PUMP_THREAD CAS that raises
    RuntimeError if a second thread tries to pump
    concurrently, cleared on every exit path including
    BaseException.

Bug Fixes

  • CWE-401: inheriting INCREF leak in cown_decref_inline
    CownCapsule_reduce packs an encoded XIData payload by
    taking an inheriting COWN_INCREF per embedded
    CownCapsule, normally balanced when the bytes are
    unpickled inside a worker. On the orphan-death path (the
    consumer side never deserialised the payload) the matching
    COWN_DECREFs never fired and every embedded cown leaked.
    cown_decref_inline now feeds the encoded bytes through
    pickle.loads and immediately drops the result, which lets
    CPython's GC fire the matching COWN_DECREFs recursively.
    Gated on the pickled flag so native XIData round-trips
    (e.g. Matrix) skip the work entirely.
  • Main-pump behavior reference leak — both
    _core_main_pump_bounded and _core_main_pump_drain_all
    popped a BehaviorCapsule from MAIN_PINNED_QUEUE but
    never released the strong reference the capsule held on the
    underlying BOCBehavior. Each pinned behavior leaked
    one reference until the runtime was torn down. The pump
    helpers now BEHAVIOR_DECREF the behavior immediately after
    the worker-equivalent cleanup runs.
  • MSVC <stdatomic.h> compatibility — Microsoft's
    <stdatomic.h> (used by CPython's headers on Windows) does
    not expose the unsigned atomic_uint_least64_t or
    atomic_uintptr_t forms that the pinned-pump bookkeeping
    used. MAIN_PINNED_DEPTH, MAIN_PINNED_NONEMPTY_SINCE_NS,
    LAST_PUMP_NS, WATCHDOG_WARN_MS, WATCHDOG_LAST_WARN_NS,
    WATCHDOG_ON_STARVE and MAIN_PUMP_THREAD are now
    atomic_int_least64_t / atomic_intptr_t. Depth never
    goes negative; pointer bits round-trip losslessly through the
    signed atomic boundary.
  • CPython 3.10/3.11 PyErr_SetRaisedException polyfill
    added to include/bocpy/xidata.h alongside the existing
    PyErr_GetRaisedException polyfill so the public C ABI's
    exception-stash pattern compiles on Python versions before
    3.12. BOCPY_ABI is unchanged.
  • Portable boc_max_align_t — added to boc_compat.h as
    a union of the most-strictly-aligned fundamental types
    (long long, long double, void *, function pointer).
    MSVC exposes the C11 max_align_t only under /std:c11,
    which the CPython build does not pass; the
    boc_behavior_prehdr_t size assertion now uses
    alignof(boc_max_align_t) so the alignment contract holds on
    every supported toolchain.
  • PEP 678 add_note 3.10 fallback — the new
    Behaviors.quiesce exception-context shim attaches a note
    describing the seed-inc / seed-dec balance on failure. CPython
    3.10 predates BaseException.add_note; the shim now
    writes to BaseException.__notes__ directly when add_note
    is missing.
  • Transpiler except ... as X mis-classification
    ExceptHandler binds X on the handler node
    itself rather than via Name `Stor...
Read more

v0.8.0 - Matrix/Vector methods and optimisation

03 Jun 14:26
283081f

Choose a tag to compare

Vector-oriented Matrix API — six new methods (vecdot,
cross, normalize, perpendicular, angle,
magnitude_squared), two new read-only properties (size,
length), and a unified in_place= keyword on every unary
method round out Matrix as a first-class vector and
batch-of-vectors type — plus an internal X-macro template refactor
of every _math.c op family that restores the compiler's
auto-vectoriser. 44 of 71 benched rows improved by ≥10%, with
representative wins of −50% to −88% on aggregates, broadcast
arithmetic, and normalize. The _math extension now ships
with -O3 (Linux/macOS) / /O2 (Windows) so end users pick
up the wins by default.

New Features

  • Vector-oriented Matrix methods — six new methods designed
    for the Nx2 / 2xN / Nx3 / 3xN vector and
    batch-of-vectors shapes that show up in examples/boids.py and
    similar simulation code:

    • magnitude_squared(axis=None) — squared L2 norm without the
      sqrt step. Cheaper than magnitude() and safe for
      sub-normal thresholding.
    • vecdot(other, axis=None) — axis-aware inner product matching
      numpy.linalg.vecdot. Not equivalent to numpy.dot;
      use @ for matrix multiplication. Same-shape, row-broadcast
      (1xN vs MxN), and column-broadcast (Mx1 vs MxN)
      operands are all supported.
    • cross(other, axis=None) — 2D scalar z-component or 3D cross
      product. Five shape paths share one method: 1x2 / 2x1
      returns a float; 1x3 / 3x1 returns a same-orientation
      Matrix; Nx2 / 2xN batches collect per-vector
      scalars; Nx3 / 3xN batches return same-shape Matrix
      results. axis= disambiguates the square 2x2 / 3x3
      shapes (default per-row).
    • normalize(axis=None, in_place=False) — divide every element
      by its magnitude. Zero-magnitude rows / columns are returned as
      exact zeros (no NaN, no division by zero). axis= selects
      per-row, per-column, or total normalisation.
    • perpendicular(axis=None, in_place=False) — rotate every 2D
      vector 90° counter-clockwise: (x, y) -> (-y, x). Accepts a
      single 2D vector, an Nx2 row batch, or a 2xN column
      batch.
    • angle(axis=None) — polar angle atan2(y, x) of every 2D
      vector. Returns a float for a single 2D vector input,
      otherwise a Matrix of per-vector angles.
  • Matrix.size property — total element count
    (rows * columns). Matches numpy.ndarray.size.

  • Matrix.length property — Frobenius (L2) magnitude as a
    read-only @property so vector-like code reads naturally
    (direction.length, velocity.length) without the
    parentheses of a method call. Equivalent to magnitude() with
    no axis argument.

  • in_place= keyword on every unary Matrix method
    transpose, ceil, floor, round, negate,
    abs, plus the new normalize and perpendicular all
    accept in_place=True to mutate self and return it.
    Replaces the older transpose_in_place() method (see
    Breaking Changes below).

  • axis= keyword on aggregate methodssum, mean,
    min, max, magnitude, and the new magnitude_squared
    now share a tri-state axis= argument (None / 0 / 1)
    decoded through a single classifier. Negative axes (-1 /
    -2) accepted for NumPy parity.

Improvements

  • Auto-vectorised _math.c op kernels — the binary,
    aggregate, unary, and two-operand-aggregate op families inside
    _math.c are now stamped from per-family descriptor tables,
    one kernel per (op, shape) combination. Each per-element body is
    literally substituted into its own monomorphic inner loop,
    restoring the precondition for GCC's / Clang's auto-vectoriser.
    Representative wins (lower is better):

    Bench row 0.7.0 (ns) 0.8.0 (ns) Δ
    mean() shape=(1000, 100) 44179.6 9001.6 −79.6%
    mean(1) shape=(1000, 100) 51699.4 7058.5 −86.3%
    max(1) shape=(1000, 100) 97184.2 11322.7 −88.3%
    magnitude() shape=(1000, 3) 1098.2 306.8 −72.1%
    add col-bcast shape=(1000, 100) 37823.4 20172.5 −46.7%
    div same-shape shape=(1000, 100) 80134.2 45458.9 −43.3%
    normalize() shape=(1000, 3) axis=None 3644.6 1775.5 −51.3%

    Four rows in code paths untouched by the refactor regressed by
    5–15% from layout drift (_math.so .text grew +125% from
    kernel specialisation); none are on a hot path. No behavioural
    change; test_matrix.py passes unchanged.

  • -O3 / /O2 on bocpy._math — the math extension now
    sets per-platform extra_compile_args in setup.py
    (-O3 -fno-plt on Linux/macOS, /O2 on Windows) so end-user
    wheels and editable installs both pick up the auto-vectoriser
    wins above. Other bocpy extensions are unaffected. The SBOM
    hash for _math.*.so will drift accordingly — see
    :doc:sbom for the auditor-facing note.

Breaking Changes

  • Matrix.transpose_in_place() removed — superseded by
    Matrix.transpose(in_place=True), which returns self and
    so composes the same way every other unary method does.
    Migration is mechanical: replace m.transpose_in_place() with
    m.transpose(in_place=True).

Documentation

  • New Matrix API entries in :doc:api for size, length,
    magnitude_squared, vecdot, cross, normalize,
    perpendicular, and angle, plus updated in_place=
    keyword signatures on the existing unary methods.

Tests

  • 234 new test cases for the new Matrix methods and
    properties (1571 → 1805 passed). Coverage includes a stub-guard
    test that greps __init__.pyi for every new C-level name and
    in-cown coverage exercising each new method inside @when.
  • Portable overflow regex + cross 2x3/3x2 contract pinning
    the cross-product test for the doubly-valid 2x3 / 3x2
    shapes now pins the 2D-batch interpretation explicitly, locking
    the documented behaviour.

Internal

  • scripts/bench_matrix.py — bench harness used to gate the
    refactor: --json append mode, --report-median per-row
    merge, 200 ms warmup, batch-size auto-tuning.
  • scripts/validate_wheel.py +
    scripts/_vendored_warehouse_wheel.py
    — stdlib-only wheel
    RECORD validator and a vendored slice of Warehouse's wheel
    parser; used by the PR gate to catch RECORD regressions
    before PyPI does.

CI / build

  • cibuildwheel v3.4.0 → v3.4.1 and clang-format-action
    pin normalised to the underlying commit SHA (Dependabot's
    preferred format). Both pins move in lock-step with the
    github-actions Dependabot group.
  • idna 3.16 → 3.17 in ci/constraints-docs.txt. Five
    other Dependabot proposals (docutils 0.23, ruamel-yaml
    0.19, sphinx-tabs 3.4.7+, sphinx-toolbox 4.2, and
    standard-imghdr 3.13) require Python ≥3.11 and so cannot
    enter a universal lock that still includes Python 3.10; a
    comment above requires-python = ">=3.10" in
    pyproject.toml lists them for the post-3.10-EOL bump.
  • flake8 extend-exclude for .copilot/, build/,
    sphinx/build/, and the scratch .env* venvs so the walker
    no longer trips on generated or vendored Python files.

0.7.0 - SBOM and Dependency Auditing

02 Jun 11:11
41f14e8

Choose a tag to compare

Cown-lifecycle correctness fixes — three use-after-free paths in the
CownCapsule pickle / acquire / noticeboard machinery now hold the
inner BOCCown alive across the writer's wrapper drop — plus
supply-chain hardening: pinned and hash-verified Python dependencies,
SHA-pinned GitHub Actions, dependabot coverage, vulnerability scanning,
and PEP 770 SBOMs embedded in every wheel.

New Features

  • PEP 770 SBOMs in every wheel — every wheel built by
    .github/workflows/build_wheels.yml now embeds a
    CycloneDX 1.6 <https://cyclonedx.org/specification/overview/>_
    JSON SBOM under <dist>-<version>.dist-info/sboms/bocpy.cdx.json.
    Generation runs inside cibuildwheel's repair step on every platform
    (Linux auditwheel, macOS delocate, Windows direct injection)
    via the new stdlib-only scripts/build_sbom.py. The
    inject subcommand rewrites the wheel's RECORD atomically
    (temp file + rename).
  • SBOM verification in CI — the new verify_sboms job in
    build_wheels.yml re-downloads the extracted SBOM artifact and
    runs two checks: scripts/validate_sbom.py (stdlib-only
    structural validator pinning bocpy's wire format) and
    grype <https://github.com/anchore/grype>_ (third-party SBOM
    scanner) with --fail-on high. A separate sboms artifact is
    also uploaded by the merge job for downstream consumers.
  • bocpy.__version__ — a runtime version attribute derived
    from importlib.metadata.version("bocpy"), with a
    PackageNotFoundError fallback. Exported from bocpy.__all__
    and documented in __init__.pyi. pyproject.toml remains the
    single source of truth for the version.
  • New documentation — :doc:sbom walk-through covering the
    embedded SBOM format, extraction recipes, and verification commands.
  • wait(noticeboard=True) final-state capture — :func:wait
    now accepts a noticeboard keyword that returns the final
    noticeboard contents as a plain dict at shutdown (after the
    noticeboard thread exits, before the entries are freed). Useful
    for surfacing an early-stopping result, last error, or aggregated
    counter that a behavior deposited just before the runtime
    quiesced, replacing the older send / receive handshake
    that earlier examples used. Combined with stats=True it
    returns a new :class:WaitResult NamedTuple (also exported
    from bocpy.__all__) carrying both snapshots. The
    examples/prime_factor.py example was migrated to the new
    pattern.

Bug Fixes

  • Cown-in-cown use-after-free — a Cown embedded inside
    another cown's value, a message-queue payload, or a noticeboard
    snapshot was previously freed when the writer's local wrapper
    dropped, because pickle bytes carry no refcount on their own.
    CownCapsule_reduce now takes an inheriting COWN_INCREF that
    _cown_capsule_from_pointer_inheriting consumes on unpickle, so
    the inner BOCCown survives until the consumer drops its
    decoded wrapper. Affects every cross-cown reference shape — see
    the new TestCownInCown class for the full container-shape fuzz.
  • Acquire-failure poisoned-state — when pickle.loads failed
    partway through cown_acquire, the cown was left in a
    half-acquired state with the encoded bytes still in place. A retry
    would re-run pickle against bytes whose embedded inherited refs
    had already been partially consumed by pickle's error path,
    risking dereferences of freed BOCCown* pointers. The cown's
    xidata is now recycled on the failure path and a guard at the
    top of cown_acquire rejects any future acquire with a
    deterministic RuntimeError; the worker recovery arm surfaces
    it on the failing behavior's result cown.
  • Noticeboard hidden-cown audit — when a noticeboard value
    reached a Cown via a route the pin walker cannot see — custom
    __reduce__ / __getstate__, copyreg.dispatch_table,
    closure capture, module-level cache — the borrowing reconstructor
    produced a token whose inner BOCCown was not held alive by
    the entry's pin set, leaving the next reader to UAF after the
    writer's wrapper dropped. A per-thread borrowing context
    (BOC_NB_CTX) now audits every CownCapsule_reduce against
    the caller's pin set during the noticeboard write pickle and
    fails the whole notice_write / notice_update closed if
    any cown is unaccounted for.
  • UnicodeDecodeError on non-UTF-8 Windows locales
    Behaviors.start read worker.py with open(path), which
    picks up locale.getpreferredencoding(False). On cp1252
    (English Windows) the UTF-8 em-dashes in the worker source were
    silently mojibake-d; on cp949 (Korean Windows) the read failed
    with UnicodeDecodeError: 'cp949' codec can't decode byte 0xe2
    and bocpy could not start at all (reported in
    #14 <https://github.com/microsoft/bocpy/issues/14>_ by
    @Forthoney <https://github.com/Forthoney>_). Fixed by passing
    encoding="utf-8" explicitly in Behaviors.start, and the
    same fix was applied to every other open() site in the repo
    that reads or writes text known to contain non-ASCII bytes
    (sphinx/source/conf.py, examples/sketches.py x2,
    export_module.py).
  • Silent worker-startup failuresBehaviors.start_workers
    ran interpreters.create() and interpreters.run_string()
    on the worker thread without a try/except, so a failure in either
    killed the thread without ever replying on boc_behavior. The
    parent's bounded receive() then timed out with no diagnostic.
    Both calls are now wrapped, and every failure path sends a
    formatted traceback over boc_behavior so the parent sees a
    structured error instead of a timeout.
  • Silent worker bootstrap import failures — the generated
    bootstrap script that loads the user module into each worker
    sub-interpreter is now wrapped in a top-level try/except. Any
    BaseException is formatted with the user module name and sent
    over boc_behavior (falls back to sys.stderr if the
    message-queue send itself raises), then re-raised so
    run_string reports it as well. Module-import failures that
    previously surfaced only as a worker-startup timeout now arrive
    as a proper traceback.
  • boc_sched_worker_pop_slow skipped popped_local — the
    slow-path pending-fallback and WSQ-dequeue branches returned
    work without bumping popped_local (the fast path always
    did), so the documented producer/consumer identity in
    :c:type:boc_sched_stats_t was violated whenever the fairness
    arm fired or a worker entered the slow path directly. Both
    branches now increment popped_local and reset the batch
    budget, matching the fast path. The header's reconciliation
    paragraph was also tightened to a "near-identity" that explicitly
    accounts for fairness-token pops (which are re-enqueued via raw
    boc_wsq_enqueue rather than boc_sched_dispatch, leaving
    consumer-side counters without a matching producer-side bump).

Supply Chain

  • Hashed and pinned Python dependencies — every CI dependency is
    resolved into a ci/constraints-<extra>.txt file via
    uv pip compile --universal --generate-hashes and installed with
    pip install --require-hashes. Covers the test, linting,
    docs, and new audit extras. bocpy itself is then
    installed via pip install -e . --no-deps so an editable build
    cannot smuggle in an unpinned transitive dependency.
  • Vulnerability scanning — new audit job in pr_gate.yml
    runs pip-audit --strict against every constraints file on every
    PR. pip-audit itself is pinned via ci/constraints-audit.txt
    and self-checked. A new .github/workflows/nightly_audit.yml
    re-runs the audit nightly against main.
  • SHA-pinned GitHub Actions — every uses: line in
    .github/workflows/ is now pinned to a full 40-char commit SHA
    with a trailing # vX.Y.Z comment.
  • Dependabot coverage — new .github/dependabot.yml covers
    three ecosystems (pip rooted at /ci, github-actions
    rooted at /, pip rooted at
    /templates/c_abi_consumer), grouped weekly per ecosystem.
  • Downstream template pinnedtemplates/c_abi_consumer
    pins bocpy~=MAJOR.MINOR as both a build requirement and a
    runtime dependency. The finalize-pr skill bumps it in
    lock-step with the root version.
  • New SUPPLY_CHAIN.md — top-level policy doc describing
    everything above with the exact regeneration commands.

Documentation

  • Cown pickle-leak note — :class:Cown now documents that
    pickle.dumps on a cown produces bytes that carry one strong
    reference per embedded cown; orphan bytes (never unpickled in the
    producing process) leak one strong ref per byte string. The bocpy
    runtime never produces orphan bytes; the leak surface only
    applies to third-party code that calls pickle.dumps(cown)
    directly.
  • Noticeboard cown-lifetime guarantee — :func:notice_write and
    :func:notice_update now document that values may embed
    :class:Cown references and that the noticeboard keeps each
    embedded cown alive for as long as the entry remains. The new
    paragraph in :doc:noticeboard mirrors this guarantee for
    readers.
  • Noticeboard final-state capture guide — :doc:noticeboard
    gained a "Reading the Final State at Shutdown" section covering
    the wait(noticeboard=True) contract, the combined
    wait(stats=True, noticeboard=True) form returning
    :class:WaitResult, the empty-dict fallbacks for the
    never-started and never-written cases, and the recommendation
    to use snap.get(key) since :func:wait quiesces as soon as
    every behavior ...
Read more

v0.6.0 - C ABI

15 May 19:50
06fc4fe

Choose a tag to compare

Public C ABI for downstream extensions, enabling C-level participation
in behavior-oriented concurrency across worker sub-interpreters.

New Features

  • Decorator composition with @when — decorators stacked below
    @when are now preserved on the generated behavior function and
    compose with the behavior body on the worker. Decorators placed
    above @when raise a SyntaxError at transpile time with
    actionable guidance. async def functions with @when are
    also explicitly rejected.
  • Public C ABI (<bocpy/bocpy.h>) — downstream C extensions can
    now link against bocpy to register custom Python types as
    cross-interpreter shareable so :class:Cown can carry instances of
    them across worker interpreters. The header is C-only, version-gated
    via the BOCPY_ABI macro, and bumped on any incompatible change
    to bocpy.h or xidata.h. Wheels remain CPython-version-tagged
    so a runtime ABI mismatch cannot occur.
  • bocpy.get_include() / bocpy.get_sources() — Python-level
    helpers that downstream setup.py files use to locate the bocpy
    headers and the small set of C sources that must be compiled into
    the consuming extension.
  • templates/c_abi_consumer/ — a ready-to-copy template for
    building a C extension against the bocpy ABI, including a
    setup.py, a probe extension exercising the public surface, and
    a pytest suite (test_public_c_abi.py) that validates the ABI
    end-to-end.
  • C source reorganisation — the per-subsystem translation units
    introduced in 0.5.0 have been renamed with a boc_ prefix
    (boc_compat.[ch], boc_sched.[ch], boc_tags.[ch],
    boc_terminator.[ch], boc_noticeboard.[ch], boc_cown.h)
    to give the public ABI a stable, namespaced identity. xidata.h
    has moved under include/bocpy/ alongside bocpy.h.

Documentation

  • New :doc:c_abi, :doc:messaging, and :doc:noticeboard pages
    in the Sphinx site; the API reference has been expanded to cover
    the public ABI surface.

Breaking Changes

  • noticeboard_version removed — the global monotonic version
    counter introduced in 0.4.0 has been removed. It exposed an
    implementation detail of the snapshot cache that did not survive
    the C ABI review and had no use case that was not better served
    by notice_sync plus an explicit noticeboard() read.

v0.5.0

05 May 10:28
d9116c2

Choose a tag to compare

Highlights

This release delivers a Verona-RT-style work-stealing scheduler, a global noticeboard (shared key-value store), removal of the central scheduler thread in favour of direct dispatch, and a major C source refactor into per-subsystem translation units with a portable atomics layer.


New Features

  • Work-stealing scheduler — the single behavior queue is replaced with a distributed scheduler. Each worker owns an MPMC behavior queue, pops locally first, and steals from peers when idle. Idle workers park on per-worker condition variables and are signalled directly by producer/victim.
  • Per-worker fairness tokens — a token node advances through each worker's queue so long-running behaviors cannot monopolise dispatch slots; also drives cooperative shutdown.
  • Noticeboard — a shared key-value store (up to 64 keys) readable/writable without acquiring cowns. Writes are non-blocking; reads return a cached per-behavior snapshot. Includes notice_write, notice_read, notice_update, notice_delete, notice_sync, noticeboard_version, and the REMOVED sentinel.
  • Distributed scheduler — two-phase locking, request linking, and dispatch run directly on the caller's thread in C; cown release runs on the executing worker. MCS-style intrusive linked list per cown for zero-bounce handoff.
  • Cown.exception property — indicates whether the held value is from an unhandled exception.
  • compat.h / compat.c portability layer — uniform BOCMutex, BOCCond, boc_atomic_*_explicit, monotonic-time, and sleep primitives across MSVC, pthreads, and C11 <threads.h>.
  • xidata.h cross-interpreter shim — centralised _PyXIData_* / _PyCrossInterpreterData_* version ladders for CPython 3.12–3.15 (including free-threaded builds).
  • fanout_benchmark example — fan-out/fan-in benchmark exercising scheduler throughput under heavy producer load.
  • Prime factor example (examples/prime_factor.py) — parallel factorisation via Pollard's rho with noticeboard-coordinated early termination.
  • Benchmark harness (examples/benchmark.py) — micro-benchmarks for scheduling throughput, message-queue latency, and noticeboard contention.

Bug Fixes

  • Transpiler aliased importsvisit_Import / visit_ImportFrom now track alias names (import X as Y), preventing spurious "name not found" errors and duplicate whencall injection.
  • Global variable capture@when closure capture falls back to frame.f_globals when a name is not in any local scope, fixing NameError for module-level variables.

Improvements

  • In-memory transpiled-module loading — workers exec the transpiled source from a string literal instead of writing to disk, eliminating filesystem round-trips and leftover .py files.
  • Nested @when capture — the transpiler recurses into nested @when-decorated functions when computing outer captures, so child behaviors can close over the outer frame.
  • C extension split_core.c reduced from ~5,000 to ~3,500 lines by extracting sched.{c,h}, noticeboard.{c,h}, terminator.{c,h}, tags.{c,h}, cown.h, compat.{c,h}, and xidata.h.
  • Direct dispatch on cown releasebehavior_release_all hands resolved successors directly to workers via boc_sched_dispatch, removing one queue hop per handoff.
  • Cooperative worker shutdownboc_sched_worker_request_stop_all / boc_sched_unpause_all provide a clean stop/drain protocol.
  • Matrix docstrings — all Matrix C methods now carry built-in docstrings.
  • Examples package relocated — moved to top-level examples/ directory (still importable as bocpy.examples).
  • Filtered PyPI READMEsetup.py strips <!-- pypi-skip-start --> regions before publishing.
  • Documentation refresh — expanded coverage of noticeboard, distributed scheduler, and new APIs.

Internal Test Modules (opt-in via BOCPY_BUILD_INTERNAL_TESTS=1)

  • _internal_test_atomics — correctness tests for compat.h typed-atomics.
  • _internal_test_bq — torture tests for the MPMC behavior queue.
  • _internal_test_wsq — tests for work-stealing primitives (fast pop, slow pop, steal, park/unpark).

Test Suite

  • test_noticeboard.py — snapshot semantics, notice_update atomicity, REMOVED, notice_sync, version monotonicity.
  • test_scheduler_integration.py, test_scheduler_stats.py, test_scheduler_steal.py — end-to-end and per-primitive scheduler tests.
  • test_compat_atomics.py — portable atomics smoke tests.
  • test_stop_retry_composition.pystop()/start()/wait() retry composition.
  • test_scheduling_stress.py — expanded with fan-out, work-stealing, and shutdown stress scenarios.
  • test_transpiler.py — AST extraction, capture rewriting, aliased imports, module export.

Full changelog: v0.3.1...v0.5.0

v0.3.1

07 Apr 12:50
5eaf8fc

Choose a tag to compare

CownCapsule serialization support for nested cowns.

Bug Fixes

  • Removed the ownership check in _cown_shared that prevented a
    CownCapsule from being serialized to XIData when it was the value
    of another Cown. The check was unnecessary — _cown_shared only
    stores a pointer and ownership is enforced at acquire time.

Improvements

  • Added CownCapsule.__reduce__ with COWN_INCREF pinning so that a
    CownCapsule embedded in a container (dict, list, etc.) can survive
    the pickle round-trip used by object_to_xidata. A module-level
    reconstructor (_cown_capsule_from_pointer) inherits the pin without
    a redundant COWN_INCREF, and validates the process ID on unpickle to
    guard against cross-process misuse.

v0.3.0

01 Apr 22:42
7e52702

Choose a tag to compare

Improvements

  • Added CownCapsule.disown() — abandons a cown's value without
    serializing it and resets ownership to NO_OWNER. Used during worker
    cleanup to safely discard orphan cowns before the owning interpreter
    is destroyed, preventing dangling Python object references.
  • Rewrote receive to use a two-phase spin-then-park strategy for
    single-tag untimed receives. Phase 1 spins for BOC_SPIN_COUNT
    iterations; Phase 2 parks the thread on a per-queue condvar, eliminating
    busy-wait CPU burn. Timed receives and multi-tag receives use
    spin-then-backoff with exponential sleep (1 µs → 1 ms cap).
  • Added platform-abstracted condvar primitives (BOCParkMutex /
    BOCParkCond) with implementations for Windows (SRWLOCK /
    CONDITION_VARIABLE), macOS (pthreads), and Linux (C11 threads).
  • Each BOCQueue now carries a waiters counter, park_mutex, and
    park_cond. Producers signal parked receivers after enqueue;
    drain and set_tags broadcast to wake all parked threads.
  • Replaced the fixed thrd_sleep in send with a sched_yield /
    SwitchToThread, reducing send-side latency.
  • Refactored the monolithic _core_receive into receive_single_tag
    and receive_multi_tag, each with its own backoff/parking logic.
  • Moved the BOC_QUEUE_DISABLED check earlier in get_queue_for_tag
    so callers skip disabled queues instead of returning NULL after
    tag resolution.
  • Added Windows-compatible atomic_load_explicit /
    atomic_fetch_add_explicit / atomic_fetch_sub_explicit macros
    using InterlockedExchangeAdd64.
  • Declared Py_mod_gil = Py_MOD_GIL_NOT_USED in both _core and
    _math C extensions so that importing bocpy on a free-threaded
    Python build (3.13t+) does not re-enable the GIL.
  • Replaced PyDict_GetItem (borrowed reference) with
    PyDict_GetItemRef (strong reference) in BOCRecycleQueue_recycle
    on Python 3.13+, improving forward-compatibility with free-threaded
    builds.

Bug Fixes

  • Fixed a deadlock when the same cown is passed multiple times to @when
    (e.g. @when(c, c)). Duplicate requests for the same cown caused the
    MCS-queue-based two-phase locking to spin-wait on itself. Requests are
    now deduplicated by target cown in Behavior.__init__, with
    compensating resolve_one calls to maintain the behavior count
    invariant.

Tests

  • TestLostWakeStress: single-producer random delays, bursty producer,
    and repeated single-message wake to detect lost-wake races.
  • TestMultiTagBackoff: multi-tag receive correctness — second-tag hit,
    delayed arrival, per-tag FIFO ordering, timeout, and interleaved
    producers.
  • TestTimeoutAccuracy: lower-bound / upper-bound wall-clock checks and
    zero-timeout immediacy.
  • Added tests for duplicate cowns in @when: same cown twice, thrice,
    non-adjacent duplicates, duplicates within a group, and mutation
    aliasing semantics.

CI

  • Added a free-threaded CI job that tests against Python 3.13t and
    3.14t on Linux, with explicit assertions that the GIL remains disabled
    after import.

Full Changelog: v0.2.2...v0.3.0

v0.2.2

18 Mar 13:12
81ede4b

Choose a tag to compare

Improvements

  • Added an ASAN/UBSAN CI job that builds CPython 3.14.2 from source with AddressSanitizer and UndefinedBehaviorSanitizer, then runs the full test suite against instrumented builds of bocpy.
  • Updated GitHub Actions to latest versions (actions/checkout@v6, actions/setup-python@v5).

Bug Fixes

  • Fixed a false positive warning message for deallocation of xidata on the main
    interpreter after module shutdown.
  • Changed the clear logic when recycling

v0.2.0

06 Mar 01:23
cc32479

Choose a tag to compare

Bugfix release including some minor improvements.

Improvements

  • Examples are now included in the package, with script entrypoints for each.
  • The drain low-level API function is now exposed at the package level
  • wait() will now acquire frame-local Cown objects before shutting down the workers

Dev Tools

  • Added an internal cown and behavior reference tracking utility

Bug Fixes

  • Fixed a reference counting bug with cown lists
  • Fixed an issue where the boids example did not run on windows due a font
    setting.

v0.1.0 - Initial Release

02 Mar 02:29
d5d5eb2

Choose a tag to compare

Signed-off-by: Matthew A Johnson <matjoh@microsoft.com>