-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
[French morphologizer] Mislabelisation of Mood=Imp|Number=Sing|Tense=Present #8131
Description
Mislabelisation of Mood=Imp|Number=Sing|Tense=Present
Hello,
I am using spaCy as part of an ungoing project on French textbooks and I have noticed that verbs found in Imperative, with Singular Number and in Present Tense (ie: "Mange ton repas", "Fais tes devoirs", etc.) are almost systematiccally misclassified as NOUN.
Am I the first one to notice this or is this a well-known biasi of the french pre-trained models ?
After investigating the dataset Sequoia, I think this is because every example of Imperative verbs are given in Plural Number (ie: "Mangez votre repas", "Faites vos devoirs", etc.).
Moreover, Imperative verbs in present tense and singular number are often homographs of NOUNs which makes the task even more difficult (ie: "Forme", "Danse", "Place", etc.)
Anyway, I know this is probably a specific issue to the kind of texts I'm manipulating, that heavily rely on Singular Present Imperative verbs to express the instructions of exercices, and hence a difficult one to adress in a pre-trained model, but I thought it was worth mentioning.
Idea: one idea to explore to fix this would be to do data augmentation from the instances in Plural Present Imperative form.
How to reproduce the behaviour
import spacy
nlp = spacy.load("fr_core_news_sm")
doc = nlp("Entoure les verbes au présent.")
for token in doc:
print(token.text, token.pos_, token.morph)
Info about spaCy
- spaCy version: 3.0.6
- Platform: Windows-10-10.0.19041-SP0
- Python version: 3.7.5
- Pipelines: fr_core_news_sm (3.0.0)