Skip to content

[French morphologizer] Mislabelisation of Mood=Imp|Number=Sing|Tense=Present #8131

@LedaguenelArthur

Description

@LedaguenelArthur

Mislabelisation of Mood=Imp|Number=Sing|Tense=Present

Hello,

I am using spaCy as part of an ungoing project on French textbooks and I have noticed that verbs found in Imperative, with Singular Number and in Present Tense (ie: "Mange ton repas", "Fais tes devoirs", etc.) are almost systematiccally misclassified as NOUN.

Am I the first one to notice this or is this a well-known biasi of the french pre-trained models ?

After investigating the dataset Sequoia, I think this is because every example of Imperative verbs are given in Plural Number (ie: "Mangez votre repas", "Faites vos devoirs", etc.).
Moreover, Imperative verbs in present tense and singular number are often homographs of NOUNs which makes the task even more difficult (ie: "Forme", "Danse", "Place", etc.)

Anyway, I know this is probably a specific issue to the kind of texts I'm manipulating, that heavily rely on Singular Present Imperative verbs to express the instructions of exercices, and hence a difficult one to adress in a pre-trained model, but I thought it was worth mentioning.

Idea: one idea to explore to fix this would be to do data augmentation from the instances in Plural Present Imperative form.

How to reproduce the behaviour

import spacy
nlp = spacy.load("fr_core_news_sm")
doc = nlp("Entoure les verbes au présent.")

for token in doc:
    print(token.text, token.pos_, token.morph)

Info about spaCy

  • spaCy version: 3.0.6
  • Platform: Windows-10-10.0.19041-SP0
  • Python version: 3.7.5
  • Pipelines: fr_core_news_sm (3.0.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementFeature requests and improvementslang / frFrench language data and models

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions