ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation

This repository contains the implementation of ReNIO.

Installation

conda env create -f environment.yml
conda activate opsd

pip install flash-attn==2.8.3 --no-build-isolation

Data

For Math task, the data can be download from here. For Coding task, the data can be download from here, we sample 30k code domain data from it.

Please put the training data in data/.

Overview

Training

We provide the training shells in scripts/, change the model_name_or_path to your real model path to use them.

GRPO

See scripts/run_grpo.sh.

OPD

See scripts/run_opd_1b.sh.

OPSD

See

scripts/run_opsd_1b.sh. scripts/run_opsd_4b.sh. scripts/run_opsd_8b.sh.

To use renio, you can try

CLIP=2.5 \
IMP=0.8 \
RENIO=True \
bash scripts/run_opsd_1b.sh

for math task OPSD training on qwen3-1.7B. And use

DATA="data/openthoughts/openthoughts_coding_30k.jsonl" \
TASK="coding" \
CLIP=2.5 \
IMP=0.8 \
RENIO=True \
bash scripts/run_opsd_1b.sh

for coding task training.

Here RENIO=True enables ReNIO for training, CLIP and IMP controls the student-teacher log ratio clip range and the threshold for key token selection.

Evaluation

Math Task

See eval\run_eval.sh.

Coding Task

See eval\run_eval_code.sh

Acknowledgements

Our implementation builds on OPSD.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
eval		eval
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
accelerate.yaml		accelerate.yaml
data_collator.py		data_collator.py
environment.yml		environment.yml
grpo_train.py		grpo_train.py
opsd_train.py		opsd_train.py
opsd_trainer.py		opsd_trainer.py
sft_train.py		sft_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation

Installation

Data

Overview

Training

GRPO

OPD

OPSD

Evaluation

Math Task

Coding Task

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation

Installation

Data

Overview

Training

GRPO

OPD

OPSD

Evaluation

Math Task

Coding Task

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages