-
Notifications
You must be signed in to change notification settings - Fork 57
Description
Describe the issue as clearly as possible:
When the guide's current state is the final state, an error is raised when calling guide.advance with the eos token as an argument. This stands in contradiction to the transitions map that always indicates that the eos token can be consumed when in the final state (it leads to the same current state).
This is a serious problem when doing batch generation with a regex that contains several final states (for instance [1][{1,3}). In this case, if one of the generations reaches a an early final state, the logits processor will stop advancing through the guide (to avoid the bug), but that means that the model may keep generating tokens that would lead to a later final state! Since the current state is not updated, it will generate a string typically too long that does not respect the regex.
Steps/code to reproduce the bug:
import outlines
import transformers
from outlines.types import Regex
from outlines_core import Guide
model = outlines.from_transformers(
transformers.AutoModelForCausalLM.from_pretrained("erwanf/gpt2-mini"),
transformers.AutoTokenizer.from_pretrained("erwanf/gpt2-mini"),
)
generator = outlines.Generator(model, Regex(r"[1]{1,3}"))
index = generator.logits_processor.index
print(index)
guide = Guide(index)
for i in range(3):
guide.advance(token_id=16, return_tokens=False)
guide.advance(token_id=50256, return_tokens=False)Expected result:
No error raisedError message:
Index object with transitions:
16 -> {
50256: 16,
}
12 -> {
16: 16,
50256: 12,
}
28 -> {
1157: 16,
16: 12,
50256: 28,
}
24 -> {
1157: 12,
16243: 16,
16: 28,
}
Traceback (most recent call last):
File "/home/robinpicard/outlines/.idea/dsl.py", line 34, in <module>
guide.advance(token_id=50256, return_tokens=False)
ValueError: No next state found for the current state: 16 with token ID: 50256Outlines/Python version information:
outlines-core 0.2.11
python 3.12
Context for the issue:
This makes the implementation of outlines-core in outlines buggy.