Labels and Models not harmonized between languages #13908
Unanswered
cmahnke
asked this question in
Help: Model Advice
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I try to use spaCy to extract entities from a german text and it's englich translation to compare the results. This is suprisingly hard to do since it appears that neither the models nor the labels are harmonizes across languages.
Models:
These are offered at [Trained Models & Pipelines(https://spacy.io/models):
en_core_web_smen_core_web_mden_core_web_lgen_core_web_trfBut there are also models like
en_core_news_trfdownlaodable, not sure if the can be consider recent.de_core_news_smde_core_news_mdde_core_news_lgde_dep_news_trfWhen is use the install assistent to select models I get this combination (for using accurate models):
From my point of view it's a reasonable expectation that this combination behaves similar. But it doesn't:
de_dep_news_trfdoesn't work at allde_core_news_lgworks, but has an annoying habit of using different labels then the English model like usingPERinstead ofPERSONand using aMISClabel instead of multiple more granular ones from the English model.Since the main issue hasn't been made clear already:
One of the consequences, is that in German many things are dumped into one category /label which make spacy (or at least
de_core_news_lg) almost unusable compared to the enflisch model:Ideally things should be harmonized a bit more, like having a similar models use the same labels..
Why is that the case, should it considered to be a bug?
Beta Was this translation helpful? Give feedback.
All reactions