Skip to content

[Wasm RyuJIT] Spill live ref/byref values to pinned stack slots at calls#129059

Draft
kg wants to merge 5 commits into
dotnet:mainfrom
kg:wasm-spill-refs
Draft

[Wasm RyuJIT] Spill live ref/byref values to pinned stack slots at calls#129059
kg wants to merge 5 commits into
dotnet:mainfrom
kg:wasm-spill-refs

Conversation

@kg
Copy link
Copy Markdown
Member

@kg kg commented Jun 5, 2026

No description provided.

Copilot AI review requested due to automatic review settings June 5, 2026 22:58
@kg kg added arch-wasm WebAssembly architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Jun 5, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Wasm-specific mechanism intended to make ref/byref values GC-visible across call sites by injecting a new IR node (GT_WASM_SPILL_REF) and a new Wasm phase (WasmSpillRefs) that inserts these nodes and allocates pinned stack spill slots used during Wasm codegen.

Changes:

  • Add GT_WASM_SPILL_REF node kind and operand iteration support.
  • Add Compiler::WasmSpillRefs phase to insert spill nodes around calls and allocate spill locals.
  • Extend Wasm codegen/regalloc to track a spill index and (temporarily) force-enregister a scratch “splash zone” local.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/coreclr/jit/regallocwasm.cpp Forces the “splash zone” local to be treated as a reg candidate.
src/coreclr/jit/gtlist.h Adds new Wasm node WASM_SPILL_REF.
src/coreclr/jit/gentree.cpp Updates operand-edge iterator to treat GT_WASM_SPILL_REF as unary.
src/coreclr/jit/fgwasm.cpp Implements Compiler::WasmSpillRefs to insert spill nodes and allocate spill locals.
src/coreclr/jit/compphases.h Adds PHASE_WASM_SPILL_REFS.
src/coreclr/jit/compmemkind.h Adds WasmSpillRefs memory kind.
src/coreclr/jit/compiler.h Adds m_wasmSpillSlots field and WasmSpillRefs declaration.
src/coreclr/jit/compiler.cpp Wires WasmSpillRefs into the Wasm compilation pipeline.
src/coreclr/jit/codegenwasm.cpp Emits Wasm for GT_WASM_SPILL_REF and resets spill index at calls.
src/coreclr/jit/codegenlinear.cpp Resets spill index at block boundaries on Wasm.
src/coreclr/jit/codegen.h Adds wasmSpillRefIndex state.

Comment thread src/coreclr/jit/fgwasm.cpp
Comment thread src/coreclr/jit/fgwasm.cpp Outdated
Comment thread src/coreclr/jit/fgwasm.cpp Outdated
Comment thread src/coreclr/jit/fgwasm.cpp Outdated
Comment on lines +1736 to +1740
varDsc->lvHasExplicitInit = true;
varDsc->lvImplicitlyReferenced = true;
// If we don't make this var tracked, regalloc will crash when allocating a register for it
varDsc->lvTracked = true;
m_wasmSpillSlots->at(0) = varNum;
Comment thread src/coreclr/jit/regallocwasm.cpp Outdated
Comment on lines +193 to +197
// HACK: Ensure that we always enregister the splash zone, even if we are not enregistering other locals
if (m_compiler->m_wasmSpillSlots && m_compiler->m_wasmSpillSlots->size() && m_compiler->m_wasmSpillSlots->at(0) == lclNum)
{
varIsRegCandidate = true;
}
Copy link
Copy Markdown
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good.

I'm curious how this intersects/overlaps with LSRA's spill temp mechanism. Not saying we should use that here, but it likely serves a similar purpose.

Comment thread src/coreclr/jit/fgwasm.cpp Outdated
if (!op->TypeIs(TYP_REF, TYP_BYREF))
return GenTree::VisitResult::Continue;

for (size_t i = defs.size(); i > 0; i--)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth commenting what this is doing (removing active defs once we find their use, and keeping the defs collection compact).

Comment thread src/coreclr/jit/gtlist.h

GTNODE(WASM_JEXCEPT , GenTree ,0,0,GTK_LEAF|GTK_NOVALUE|DBK_NOTHIR) // Special jump for Wasm exception handling
GTNODE(WASM_THROW_REF , GenTree ,0,0,GTK_LEAF|GTK_NOVALUE|DBK_NOTHIR) // Wasm rethrow host exception (exception is an implicit operand)
GTNODE(WASM_SPILL_REF , GenTreeOp ,0,0,GTK_UNOP|DBK_NOTHIR)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a new node instead of just using normal stores?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STORE_LCL_VAR doesn't return a value, does it? For this to work, the value has to 'flow through' the spill into the call consuming it

Comment thread src/coreclr/jit/regallocwasm.cpp Outdated
varIsRegCandidate = false;
}

// HACK: Ensure that we always enregister the splash zone, even if we are not enregistering other locals
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these need to be enregistered?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to change the shape of the generated code to have the splash zone live in memory, and it would be much worse CQ

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand – if we were doing similar transformation in any other place in the JIT we would just create new locals and store to them. Like gtSplitTree or async do already. What makes that less desirable and why can't it be handled as well as this can automatically by the wasm backend?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context: https://gist.github.com/kg/cba44f4907f5320966058446fa01c25f

Essentially, the stack shape required to do a memory store on wasm is
[dst, value]

so if I have a list of call args like
[a, b, c, d]
I can't insert a node after b that does a memory store for a spill without first inserting a LCL_ADDR node before b.

If I have a local that's guaranteed to be enregistered, I can insert a tee.local opcode inline that will make a copy of b, and then I can do anything I want as long as I leave b on the stack afterwards (which is easy thanks to the local). By default in debug we don't enregister anything, so I need this hack to ensure I have the ability to tee.local.

Let me know if that doesn't make sense and I can explain in more detail.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JIT is quite aggressive in rewriting nested calls to use locals. For your example C# code from the gist I see this rewrite happen in morph:

               [000011] -ACXG+-----                         └──▌  CALL      ref    Program:SpillTest(System.String,int):System.String
               [000020] DACXG+----- arg1 setup                 ├──▌  STORE_LCL_VAR ref    V04 tmp1         
               [000007] -ACXG+-----                              └──▌  CALL      ref    Program:Glue(System.String,System.String):System.String
               [000018] DACXG+----- arg2 setup                      ├──▌  STORE_LCL_VAR ref    V03 loc0         
               [000006] --CXG+-----                                   └──▌  CALL      ref    System.Int32:ToString():System.String:this
               [000017] -----+----- wasm sp $0                           ├──▌  LCL_VAR   int    V00 SP           
               [000005] -----+----- this $1                              └──▌  LCL_ADDR  int    V02 arg1         [+0]
               [000016] -----+----- wasm sp $0                      ├──▌  LCL_VAR   int    V00 SP           
               [000004] -----+----- arg1 $1                         ├──▌  LCL_VAR   ref    V01 arg0         
               [000019] -----+----- arg2 $2                         └──▌  LCL_VAR   ref    V03 loc0         
               [000015] -----+----- wasm sp $0                 ├──▌  LCL_VAR   int    V00 SP           
               [000021] -----+----- arg1 $1                    ├──▌  LCL_VAR   ref    V04 tmp1         
               [000010] ----G+----- arg2 $2                    └──▌  SUB       int   
               [000008] ----G+-----                               ├──▌  LCL_VAR   int   (AX) V02 arg1         
               [000009] -----+-----                               └──▌  CNS_INT   int    1

I would not expect any LIR edges need to be live across a call here, so I would not expect the handling in the PR to even do anything. Am I wrong?

How does the WASM backend handle this actual case with locals live across the calls instead of LIR edges? The gist mentions that the design with the splash zone is for code size reasons, but I think LIR edges live across a call are going to be a drop in the bucket compared to more usual use of locals.

I am not a fan of the handling for LIR edges deviating from the general handling of locals if it can be avoided. That introduces spurious regressions when you make innocent changes in other phases in the JIT (introduce a local for something unrelated and suddenly you have a regression because the backend does not handle it well). I would much rather see something like this rewritten to use locals and then having the backend optimize the use of short-lived locals to use pinning if that is a valuable code size reduction.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your scenario with locals live across the calls, the locals will live in linear memory on the stack since as GC refs we can't persistently home them in Wasm locals, I think. So they won't need spilling.

For the test I'm playing with here, in a debug build I see stuff like this:

N003 (???,???) [000012] --CXG+-----                   t12 = *  CALL r2r_ind ref    System.Int32:ToString():System.String:this
                                                            /--*  t32    int
                                                            +--*  t12    ref
               [000116] sACXG------                         *  STOREIND  ref
N009 (???,???) [000034] DACXG+-----                   t34 =    LCL_ADDR  int    V07 tmp3         [+0]
N005 (???,???) [000030] -----+-----                   t30 =    PHYSREG   int    $0
N006 (???,???) [000014] -----+-----                   t14 =    LCL_VAR   ref    V05 tmp1
                                                            /--*  t14    ref
               [000101] -----------                  t101 = *  WASM_SPILL_REF ref
N007 (???,???) [000033] -----+-----                   t33 =    LCL_VAR   ref    V06 tmp2
                                                            /--*  t33    ref
               [000102] -----------                  t102 = *  WASM_SPILL_REF ref
               [000063] DA--G------                   t63 =    LCL_ADDR  int    V13 tmp9         [+0]
N001 (???,???) [000061] H----------                   t61 =    CNS_INT(h) int    0x4200E0 ftn
                                                            /--*  t61    int
N002 (???,???) [000062] n---G------                   t62 = *  IND       int
                                                            /--*  t63    int
                                                            +--*  t62    int
               [000117] sA--G------                         *  STOREIND  int
               [000065] -----------                   t65 =    LCL_VAR   int    V13 tmp9
               [000064] -----------                   t64 =    LCL_VAR   int    V13 tmp9
                                                            /--*  t64    int
               [000066] ---XG------                   t66 = *  IND       int
                                                            /--*  t30    int    wasm sp $0
                                                            +--*  t101   ref    arg1 $1
                                                            +--*  t102   ref    arg2 $2
                                                            +--*  t65    int    wasm pep $3
                                                            +--*  t66    int    control expr
N008 (???,???) [000015] -ACXG+-----                   t15 = *  CALL r2r_ind ref    Program:Glue(System.String,System.String):System.String

It's possible that in release optimized builds the scenarios where we need to spill are really rare. Because we need these spills for correctness, we can't rely on Release mode behavior though, unless we could turn this specific optimization (always using temp locals) on for all configurations.

Comment thread src/coreclr/jit/fgwasm.cpp Outdated
@kg kg force-pushed the wasm-spill-refs branch from b614c8f to 9a2f3da Compare June 7, 2026 17:07
Copilot AI review requested due to automatic review settings June 8, 2026 18:24
@kg
Copy link
Copy Markdown
Member Author

kg commented Jun 8, 2026

Still working on this, going to try moving to STORE_LCL_VAR/LCL_VAR like Jakob suggested.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Comment on lines +1796 to +1800
varDsc->lvType = TYP_BYREF;
varDsc->lvPinned = true;
varDsc->lvImplicitlyReferenced = true;
varDsc->lvMustInit = true;
lvaSetVarDoNotEnregister(varNum, DoNotEnregisterReason::WasmGCVisibility);
Comment on lines +1702 to +1713
GenTreeUnOp* spill = gtNewOperNode(GT_WASM_SPILL_REF, def->TypeGet(), def);
LIR::Use use;
noway_assert(LIR::AsRange(block).TryGetUse(def, &use));
use.ReplaceWith(spill);
LIR::AsRange(block).InsertAfter(def, spill);
if (def->gtLIRFlags & LIR::Flags::MultiplyUsed)
{
JITDUMP("Transferring multiply-used flag from [%06u] to [%06u] for spill\n", Compiler::dspTreeID(def), Compiler::dspTreeID(spill));
def->gtLIRFlags &= ~LIR::Flags::MultiplyUsed;
spill->gtLIRFlags |= LIR::Flags::MultiplyUsed;
}
anyChanges = true;
Comment on lines +1780 to +1784
const unsigned varNum = lvaGrabTemp(false DEBUGARG("WasmSpillRefs splash zone"));
LclVarDsc* const varDsc = lvaGetDesc(varNum);
// HACK: Make this TYP_I_IMPL because if we make it a REF or BYREF that may block enregistration
varDsc->lvType = TYP_I_IMPL;
varDsc->lvHasExplicitInit = true;
Comment on lines +961 to +966
GetEmitter()->emitIns_I(INS_local_get, EA_PTRSIZE, GetFramePointerRegIndex());
m_compiler->lvaFrameAddress(spillTargetVar, &FPBased);
// TODO-WASM: Emit this offset as the memarg of the store, below.
GetEmitter()->emitIns_S(INS_I_const, EA_PTRSIZE, spillTargetVar, 0);
GetEmitter()->emitIns(INS_I_add);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch-wasm WebAssembly architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants