Skip to content

MMLU-Pro incorrect prompt format #1265

Description

@RasmusHoier

Describe the bug

As far as I can tell MMLU-Pro is prompting models incorrectly.
MMLU-Pro questions have 10 options but the prompt reads (link):

Answer the following multiple choice question. The last line of your response should be of the following format: 'Answer: $LETTER' (without quotes) where LETTER is one of ABCD. Think step by step before answering.

NOTE: It would be worthwhile to also check the format of the fewshot examples is consistent with the official harness. The pipeline is a bit messy to trace, but it looks to me like this is also different from how the official harness does it.

To Reproduce

I don't have a nice MWE for this. I was benchmarking DeepSeek-V4-Flash via VLLM using the official Tiger AI Lab harness and LightEval and noticed a 15% drop in performance for the lightEval experiment when disabling reasoning. Interestingly enabling reasoning seems to make the model able to deal with the quirks of the LightEval implementation.

Expected behavior

The model should be given the same prompts in the LightEval harness as in the original Tiger Labs harness. As it is now the task is more difficult in LightEval.

Version info

I used LightEval v 0.13

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions