-
Notifications
You must be signed in to change notification settings - Fork 86
Open
Description
Hello,
I am facing the following UnicodeDecodeError error:
File "/usr/src/app/server.py", line 188, in <module>
application = make_app(args)
File "/usr/src/app/server.py", line 166, in make_app
worker_pool = initialize_workers(services)
File "/usr/src/app/server.py", line 147, in initialize_workers
worker_pool[lang_pair] = TranslatorInterface(
File "/usr/src/app/server.py", line 17, in __init__
self.contentprocessor = ContentProcessor(
File "/usr/src/app/content_processor.py", line 18, in __init__
self.bpe_source = BPE(BPEcodes)
File "/usr/src/app/apply_bpe.py", line 37, in __init__
firstline = codes.readline()
File "/usr/local/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 54: invalid start byte
for the following models:
"it-en" : "https://object.pouta.csc.fi/OPUS-MT-models/it-en/opus-2019-12-18.zip" # SentencePiece
"ja-en" : "https://object.pouta.csc.fi/OPUS-MT-models/ja-en/opus-2019-12-18.zip" # SentencePiece
"id-en" : "https://object.pouta.csc.fi/OPUS-MT-models/id-en/opus-2019-12-18.zip" # SentencePiece
"bn-en" : "https://object.pouta.csc.fi/OPUS-MT-models/bn-en/opus-2020-02-11.zip" # SentencePiece
"et-en" : "https://object.pouta.csc.fi/OPUS-MT-models/et-en/opus-2019-12-18.zip" # SentencePiece
"lv-en" : "https://object.pouta.csc.fi/OPUS-MT-models/lv-en/opus-2019-12-18.zip" # SentencePiece
"th-en" : "https://object.pouta.csc.fi/OPUS-MT-models/th-en/opus-2020-01-16.zip" # SentencePiece
"uk-en" : "https://object.pouta.csc.fi/OPUS-MT-models/uk-en/opus-2020-01-16.zip" # SentencePiece
For most of them (except "lv-en") the error goes away when I switch to the BPE model. However, SentencePiece models are the ones with better translation performance as per the shared metrics.
Please let me know if I am doing something wrong.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels