Conversation
…versions (except p3.6)
| lang_probability = float(language_detection_object[1]) | ||
| return (lang_id, lang_probability) | ||
|
|
||
| def _cld_detection(self, doc: AnyStr) -> (AnyStr, float): |
| nlp = spacy.blank(language) # spaCy language without models (https://spacy.io/usage/models) | ||
| # spacy 3.x requires explicit lemmatizer component for blank languages | ||
| # Not all languages have lookup data, so we wrap in try/except | ||
| if spacy.about.__version__.startswith("3"): |
There was a problem hiding this comment.
Core change for spacy
nicolasdalsass
left a comment
There was a problem hiding this comment.
Python3.6 needs fixing
Great job on resurrecting unit tests with a proper setup for both local runs and github actions 👍
| # spacy 3.x requires explicit lemmatizer component for blank languages | ||
| # Not all languages have lookup data, so we wrap in try/except | ||
| if spacy.about.__version__.startswith("3"): | ||
| try: | ||
| nlp.add_pipe("lemmatizer", config={"mode": "lookup"}) | ||
| nlp.initialize() | ||
| except Exception: | ||
| # Language doesn't support lookup lemmatization, continue without it | ||
| if "lemmatizer" in nlp.pipe_names: | ||
| nlp.remove_pipe("lemmatizer") |
There was a problem hiding this comment.
I don't understand the logic behind "spacy require a lemmatizer, but on the other hand, we can just remove it if something goes wrong" ?
There was a problem hiding this comment.
So it's a bit tricky and was raised by the unit tests.
- spacy 2.x: Lemmatization was automatically built into the tokenizer
- spacy 3.x: Lemmatization requires an explicit lemmatizer pipe
This is only required for blank languages (languages without specific pre trained model that we can load), or when we decide use_models=False to keep it light weight on the memory.
We use lookup to automatically find a lemmatization for the language, however not all languages have lookup data, and when they don't it's raising an exception, so we fall back to no lemmatizer.
| spacy[lookups,ja,th]==3.8.11; python_version >= '3.10' | ||
| symspellpy==6.7.0 | ||
| tqdm==4.60.0 | ||
| tqdm==4.66.3 |
There was a problem hiding this comment.
The wheel doesn't install for python3.6. Let's add a conditional requirement so that it actually installs properly on 3.6 since we keep supporting it.
There was a problem hiding this comment.
============================= 40 passed in 41.93s ==============================
[DONE] Python 3.6 tests completed
Good catch
Add Python 3.10 to 3.13 support by introducing version-conditional dependencies for packages that have breaking changes or lack support across Python versions
Changes
pycld3withpycld2for language detection on Python >= 3.10pycld3language ID mappingspacy3.x supportspacy3.X for python >= 3.10iteritems()→items()np.NaN→np.nanmakecommand to execute unit tests within debian containers across Python 3.6-3.13Local unit tests
Run unit tests on all supported python versions
Execute unit tests on a single python version
Verification
pycld3andpycld2are only used for text longer than 140 chars, otherwise existing code usinglangidis executed