Skip to content

Bug: error raised when trying to consume the eos token in final state #227

@RobinPicard

Description

@RobinPicard

Describe the issue as clearly as possible:

When the guide's current state is the final state, an error is raised when calling guide.advance with the eos token as an argument. This stands in contradiction to the transitions map that always indicates that the eos token can be consumed when in the final state (it leads to the same current state).

This is a serious problem when doing batch generation with a regex that contains several final states (for instance [1][{1,3}). In this case, if one of the generations reaches a an early final state, the logits processor will stop advancing through the guide (to avoid the bug), but that means that the model may keep generating tokens that would lead to a later final state! Since the current state is not updated, it will generate a string typically too long that does not respect the regex.

Steps/code to reproduce the bug:

import outlines
import transformers
from outlines.types import Regex
from outlines_core import Guide

model = outlines.from_transformers(
    transformers.AutoModelForCausalLM.from_pretrained("erwanf/gpt2-mini"),
    transformers.AutoTokenizer.from_pretrained("erwanf/gpt2-mini"),
)

generator = outlines.Generator(model, Regex(r"[1]{1,3}"))
index = generator.logits_processor.index
print(index)
guide = Guide(index)

for i in range(3):
    guide.advance(token_id=16, return_tokens=False)
guide.advance(token_id=50256, return_tokens=False)

Expected result:

No error raised

Error message:

Index object with transitions:
16 -> {
    50256: 16,
}
12 -> {
    16: 16,
    50256: 12,
}
28 -> {
    1157: 16,
    16: 12,
    50256: 28,
}
24 -> {
    1157: 12,
    16243: 16,
    16: 28,
}

Traceback (most recent call last):
  File "/home/robinpicard/outlines/.idea/dsl.py", line 34, in <module>
    guide.advance(token_id=50256, return_tokens=False)
ValueError: No next state found for the current state: 16 with token ID: 50256

Outlines/Python version information:

outlines-core 0.2.11
python 3.12

Context for the issue:

This makes the implementation of outlines-core in outlines buggy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions