[Wasm RyuJIT] Spill live ref/byref values to pinned stack slots at calls#129059
[Wasm RyuJIT] Spill live ref/byref values to pinned stack slots at calls#129059kg wants to merge 5 commits into
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR introduces a Wasm-specific mechanism intended to make ref/byref values GC-visible across call sites by injecting a new IR node (GT_WASM_SPILL_REF) and a new Wasm phase (WasmSpillRefs) that inserts these nodes and allocates pinned stack spill slots used during Wasm codegen.
Changes:
- Add
GT_WASM_SPILL_REFnode kind and operand iteration support. - Add
Compiler::WasmSpillRefsphase to insert spill nodes around calls and allocate spill locals. - Extend Wasm codegen/regalloc to track a spill index and (temporarily) force-enregister a scratch “splash zone” local.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/regallocwasm.cpp | Forces the “splash zone” local to be treated as a reg candidate. |
| src/coreclr/jit/gtlist.h | Adds new Wasm node WASM_SPILL_REF. |
| src/coreclr/jit/gentree.cpp | Updates operand-edge iterator to treat GT_WASM_SPILL_REF as unary. |
| src/coreclr/jit/fgwasm.cpp | Implements Compiler::WasmSpillRefs to insert spill nodes and allocate spill locals. |
| src/coreclr/jit/compphases.h | Adds PHASE_WASM_SPILL_REFS. |
| src/coreclr/jit/compmemkind.h | Adds WasmSpillRefs memory kind. |
| src/coreclr/jit/compiler.h | Adds m_wasmSpillSlots field and WasmSpillRefs declaration. |
| src/coreclr/jit/compiler.cpp | Wires WasmSpillRefs into the Wasm compilation pipeline. |
| src/coreclr/jit/codegenwasm.cpp | Emits Wasm for GT_WASM_SPILL_REF and resets spill index at calls. |
| src/coreclr/jit/codegenlinear.cpp | Resets spill index at block boundaries on Wasm. |
| src/coreclr/jit/codegen.h | Adds wasmSpillRefIndex state. |
| varDsc->lvHasExplicitInit = true; | ||
| varDsc->lvImplicitlyReferenced = true; | ||
| // If we don't make this var tracked, regalloc will crash when allocating a register for it | ||
| varDsc->lvTracked = true; | ||
| m_wasmSpillSlots->at(0) = varNum; |
| // HACK: Ensure that we always enregister the splash zone, even if we are not enregistering other locals | ||
| if (m_compiler->m_wasmSpillSlots && m_compiler->m_wasmSpillSlots->size() && m_compiler->m_wasmSpillSlots->at(0) == lclNum) | ||
| { | ||
| varIsRegCandidate = true; | ||
| } |
AndyAyersMS
left a comment
There was a problem hiding this comment.
Generally looks good.
I'm curious how this intersects/overlaps with LSRA's spill temp mechanism. Not saying we should use that here, but it likely serves a similar purpose.
| if (!op->TypeIs(TYP_REF, TYP_BYREF)) | ||
| return GenTree::VisitResult::Continue; | ||
|
|
||
| for (size_t i = defs.size(); i > 0; i--) |
There was a problem hiding this comment.
Probably worth commenting what this is doing (removing active defs once we find their use, and keeping the defs collection compact).
|
|
||
| GTNODE(WASM_JEXCEPT , GenTree ,0,0,GTK_LEAF|GTK_NOVALUE|DBK_NOTHIR) // Special jump for Wasm exception handling | ||
| GTNODE(WASM_THROW_REF , GenTree ,0,0,GTK_LEAF|GTK_NOVALUE|DBK_NOTHIR) // Wasm rethrow host exception (exception is an implicit operand) | ||
| GTNODE(WASM_SPILL_REF , GenTreeOp ,0,0,GTK_UNOP|DBK_NOTHIR) |
There was a problem hiding this comment.
Why a new node instead of just using normal stores?
There was a problem hiding this comment.
STORE_LCL_VAR doesn't return a value, does it? For this to work, the value has to 'flow through' the spill into the call consuming it
| varIsRegCandidate = false; | ||
| } | ||
|
|
||
| // HACK: Ensure that we always enregister the splash zone, even if we are not enregistering other locals |
There was a problem hiding this comment.
Why do these need to be enregistered?
There was a problem hiding this comment.
We would have to change the shape of the generated code to have the splash zone live in memory, and it would be much worse CQ
There was a problem hiding this comment.
Not sure I understand – if we were doing similar transformation in any other place in the JIT we would just create new locals and store to them. Like gtSplitTree or async do already. What makes that less desirable and why can't it be handled as well as this can automatically by the wasm backend?
There was a problem hiding this comment.
For context: https://gist.github.com/kg/cba44f4907f5320966058446fa01c25f
Essentially, the stack shape required to do a memory store on wasm is
[dst, value]
so if I have a list of call args like
[a, b, c, d]
I can't insert a node after b that does a memory store for a spill without first inserting a LCL_ADDR node before b.
If I have a local that's guaranteed to be enregistered, I can insert a tee.local opcode inline that will make a copy of b, and then I can do anything I want as long as I leave b on the stack afterwards (which is easy thanks to the local). By default in debug we don't enregister anything, so I need this hack to ensure I have the ability to tee.local.
Let me know if that doesn't make sense and I can explain in more detail.
There was a problem hiding this comment.
The JIT is quite aggressive in rewriting nested calls to use locals. For your example C# code from the gist I see this rewrite happen in morph:
[000011] -ACXG+----- └──▌ CALL ref Program:SpillTest(System.String,int):System.String
[000020] DACXG+----- arg1 setup ├──▌ STORE_LCL_VAR ref V04 tmp1
[000007] -ACXG+----- │ └──▌ CALL ref Program:Glue(System.String,System.String):System.String
[000018] DACXG+----- arg2 setup │ ├──▌ STORE_LCL_VAR ref V03 loc0
[000006] --CXG+----- │ │ └──▌ CALL ref System.Int32:ToString():System.String:this
[000017] -----+----- wasm sp $0 │ │ ├──▌ LCL_VAR int V00 SP
[000005] -----+----- this $1 │ │ └──▌ LCL_ADDR int V02 arg1 [+0]
[000016] -----+----- wasm sp $0 │ ├──▌ LCL_VAR int V00 SP
[000004] -----+----- arg1 $1 │ ├──▌ LCL_VAR ref V01 arg0
[000019] -----+----- arg2 $2 │ └──▌ LCL_VAR ref V03 loc0
[000015] -----+----- wasm sp $0 ├──▌ LCL_VAR int V00 SP
[000021] -----+----- arg1 $1 ├──▌ LCL_VAR ref V04 tmp1
[000010] ----G+----- arg2 $2 └──▌ SUB int
[000008] ----G+----- ├──▌ LCL_VAR int (AX) V02 arg1
[000009] -----+----- └──▌ CNS_INT int 1I would not expect any LIR edges need to be live across a call here, so I would not expect the handling in the PR to even do anything. Am I wrong?
How does the WASM backend handle this actual case with locals live across the calls instead of LIR edges? The gist mentions that the design with the splash zone is for code size reasons, but I think LIR edges live across a call are going to be a drop in the bucket compared to more usual use of locals.
I am not a fan of the handling for LIR edges deviating from the general handling of locals if it can be avoided. That introduces spurious regressions when you make innocent changes in other phases in the JIT (introduce a local for something unrelated and suddenly you have a regression because the backend does not handle it well). I would much rather see something like this rewritten to use locals and then having the backend optimize the use of short-lived locals to use pinning if that is a valuable code size reduction.
There was a problem hiding this comment.
For your scenario with locals live across the calls, the locals will live in linear memory on the stack since as GC refs we can't persistently home them in Wasm locals, I think. So they won't need spilling.
For the test I'm playing with here, in a debug build I see stuff like this:
N003 (???,???) [000012] --CXG+----- t12 = * CALL r2r_ind ref System.Int32:ToString():System.String:this
/--* t32 int
+--* t12 ref
[000116] sACXG------ * STOREIND ref
N009 (???,???) [000034] DACXG+----- t34 = LCL_ADDR int V07 tmp3 [+0]
N005 (???,???) [000030] -----+----- t30 = PHYSREG int $0
N006 (???,???) [000014] -----+----- t14 = LCL_VAR ref V05 tmp1
/--* t14 ref
[000101] ----------- t101 = * WASM_SPILL_REF ref
N007 (???,???) [000033] -----+----- t33 = LCL_VAR ref V06 tmp2
/--* t33 ref
[000102] ----------- t102 = * WASM_SPILL_REF ref
[000063] DA--G------ t63 = LCL_ADDR int V13 tmp9 [+0]
N001 (???,???) [000061] H---------- t61 = CNS_INT(h) int 0x4200E0 ftn
/--* t61 int
N002 (???,???) [000062] n---G------ t62 = * IND int
/--* t63 int
+--* t62 int
[000117] sA--G------ * STOREIND int
[000065] ----------- t65 = LCL_VAR int V13 tmp9
[000064] ----------- t64 = LCL_VAR int V13 tmp9
/--* t64 int
[000066] ---XG------ t66 = * IND int
/--* t30 int wasm sp $0
+--* t101 ref arg1 $1
+--* t102 ref arg2 $2
+--* t65 int wasm pep $3
+--* t66 int control expr
N008 (???,???) [000015] -ACXG+----- t15 = * CALL r2r_ind ref Program:Glue(System.String,System.String):System.String
It's possible that in release optimized builds the scenarios where we need to spill are really rare. Because we need these spills for correctness, we can't rely on Release mode behavior though, unless we could turn this specific optimization (always using temp locals) on for all configurations.
|
Still working on this, going to try moving to STORE_LCL_VAR/LCL_VAR like Jakob suggested. |
| varDsc->lvType = TYP_BYREF; | ||
| varDsc->lvPinned = true; | ||
| varDsc->lvImplicitlyReferenced = true; | ||
| varDsc->lvMustInit = true; | ||
| lvaSetVarDoNotEnregister(varNum, DoNotEnregisterReason::WasmGCVisibility); |
| GenTreeUnOp* spill = gtNewOperNode(GT_WASM_SPILL_REF, def->TypeGet(), def); | ||
| LIR::Use use; | ||
| noway_assert(LIR::AsRange(block).TryGetUse(def, &use)); | ||
| use.ReplaceWith(spill); | ||
| LIR::AsRange(block).InsertAfter(def, spill); | ||
| if (def->gtLIRFlags & LIR::Flags::MultiplyUsed) | ||
| { | ||
| JITDUMP("Transferring multiply-used flag from [%06u] to [%06u] for spill\n", Compiler::dspTreeID(def), Compiler::dspTreeID(spill)); | ||
| def->gtLIRFlags &= ~LIR::Flags::MultiplyUsed; | ||
| spill->gtLIRFlags |= LIR::Flags::MultiplyUsed; | ||
| } | ||
| anyChanges = true; |
| const unsigned varNum = lvaGrabTemp(false DEBUGARG("WasmSpillRefs splash zone")); | ||
| LclVarDsc* const varDsc = lvaGetDesc(varNum); | ||
| // HACK: Make this TYP_I_IMPL because if we make it a REF or BYREF that may block enregistration | ||
| varDsc->lvType = TYP_I_IMPL; | ||
| varDsc->lvHasExplicitInit = true; |
| GetEmitter()->emitIns_I(INS_local_get, EA_PTRSIZE, GetFramePointerRegIndex()); | ||
| m_compiler->lvaFrameAddress(spillTargetVar, &FPBased); | ||
| // TODO-WASM: Emit this offset as the memarg of the store, below. | ||
| GetEmitter()->emitIns_S(INS_I_const, EA_PTRSIZE, spillTargetVar, 0); | ||
| GetEmitter()->emitIns(INS_I_add); | ||
|
|
No description provided.