[feat] support opd rl by hjh0119 · Pull Request #9641 · modelscope/ms-swift

hjh0119 · 2026-06-25T02:56:53Z

No description provided.

gemini-code-assist

Code Review

This pull request implements Megatron On-Policy Distillation as RL (OPD-RL) by integrating teacher KL as a GRPO advantage across local and Ray-based GKD and GRPO trainers. It also introduces OpenEnvScheduler and OpenEnvWrapper to support multi-turn rollouts in OpenEnv environments. The review feedback highlights several critical issues, including missing imports and potential AttributeErrors in gkd_helpers.py, a rank-guarding mismatch in teacher_mixin.py that could cause runtime failures, and synchronous blocking calls in OpenEnvScheduler that should be run in separate threads to avoid blocking the asyncio event loop. Additionally, defensive checks are recommended to prevent potential IndexError, StopIteration, and type promotion issues.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-25T02:59:09Z

+            # Teacher returned logprobs for the full sequence (prompt + response + end tokens).
+            # Locate the response portion by matching token IDs from the end.
+            rti = response_token_ids[i]
+            if isinstance(rti[0], list):


If rti is empty (e.g., if the model generated an empty response), accessing rti[0] will raise an IndexError. We should check if rti is non-empty before accessing its first element.

Suggested change

if isinstance(rti[0], list):

if rti and isinstance(rti[0], list):

gemini-code-assist · 2026-06-25T02:59:09Z

+            tid, info = next(iter(pos_lp.items()))
+            lps.append([info['logprob']])
+            ixs.append([int(tid)])


If pos_lp is empty, next(iter(pos_lp.items())) will raise a StopIteration exception. We should handle the empty case defensively to avoid crashes.

Suggested change

tid, info = next(iter(pos_lp.items()))

lps.append([info['logprob']])

ixs.append([int(tid)])

if pos_lp:

tid, info = next(iter(pos_lp.items()))

lps.append([info['logprob']])

ixs.append([int(tid)])

else:

lps.append([0.0])

ixs.append([0])

gemini-code-assist · 2026-06-25T02:59:09Z

+    """
+    d = teacher_per_token_logps - policy_per_token_logps
+    per_token = torch.exp(d) - d - 1
+    return per_token * completion_mask


To prevent potential type promotion or precision mismatch errors (especially when using bfloat16 or float16 for logprobs and bool/long for the completion mask), we should explicitly cast completion_mask to the dtype of d before multiplication.

Suggested change

return per_token * completion_mask

return per_token * completion_mask.to(d.dtype)

hjh0119 added 2 commits June 23, 2026 21:53

init

ead42fc

support num_generation=1

7ad3148

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

hjh0119 added 5 commits June 25, 2026 10:59

clean

b94f148

Merge branch 'main' into opd-rl

a8f112c

remove teacher mixin

15d7963

update

cf4528f

Merge branch 'main' into opd-rl

5f849bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] support opd rl#9641

[feat] support opd rl#9641
hjh0119 wants to merge 7 commits into
modelscope:mainfrom
hjh0119:opd-rl

hjh0119 commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if isinstance(rti[0], list):
	if rti and isinstance(rti[0], list):

	return per_token * completion_mask
	return per_token * completion_mask.to(d.dtype)

Uh oh!

Conversation

hjh0119 commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant