Skip to content

Configure option to default hfst-ospell to only N suggestions #16

@hfst-importer

Description

@hfst-importer

There is a considerable speed difference between speller runs on the same text depending on whether hfst-ospell is allowed to give all suggestions or just a few:

tf-hsl-m0020:hfst smo036$ time preprocess test.txt | hfst-ospell -S tools/spellcheckers/fstbased/hfst/kl.zhfst > test.res.hsp-all.txt

real    0m40.156s
user    0m40.152s
sys 0m0.046s

tf-hsl-m0020:hfst smo036$ time preprocess test.txt | hfst-ospell -S -n5 tools/spellcheckers/fstbased/hfst/kl.zhfst > test.res.hsp-5.txt

real    0m10.123s
user    0m10.132s
sys 0m0.039s

tf-hsl-m0020:hfst smo036$ time preprocess test.txt | hfst-ospell -S -n10 tools/spellcheckers/fstbased/hfst/kl.zhfst > test.res.hsp-10.txt

real    0m11.897s
user    0m11.897s
sys 0m0.043s

At the same time voikkospell (which only gives 5 suggestions maximum - always) is markedly slower than hfst-ospell:

$ time preprocess test.txt | voikkospell -s -d kl -p tools/spellcheckers/fstbased/hfst/ > test.res.vk.txt 

real    0m16.588s
user    0m16.334s
sys 0m0.305s

I don't know the details of libvoikko's interactions with libhfstospell, but since there is no built-in configure-time/compile-time option to limit the number of suggestions in hfst-ospell, could it be that hfst-ospell is generating a lot of suggestions in the background that are never used? Please note that there would be fewer "misspellings" comming from voikkospell, since voikkospell handles upper/lower casing automatically, whereas hfst-ospell (at least with the tested fst) only accepts lexical case. This difference might be one explanation for voikko being faster than the all-suggestion call to hfst-ospell (but still 1,5 slower than the corresponding hfst-ospell with only 5 suggestions).

In any case I believe that being able to set a default number of suggestions at compile time is an easy way to ensure that hfst-ospell is not slower than needed.

Reported by: snomos

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions