Faster unicode string compare by christopherlam · Pull Request #2180 · Gnucash/gnucash

christopherlam · 2026-02-16T01:45:14Z

I still think that unicode collator needs to be cached; it's heavily used by ui and report code everywhere... Here's benchmark comparing 5M strings. Without caching:

24: Test command: /home/chris/sources/gnucash/build-stable/bin/test-gnc-unicode
24: Working Directory: /home/chris/sources/gnucash/build-stable/libgnucash/core-utils/test
24: Environment variables: 
24:  GNC_UNINSTALLED=YES
24:  GNC_BUILDDIR=/home/chris/sources/gnucash/build-stable
24: Test timeout computed to be: 10000000
24: Running main() from /tmp/guix-build-googletest-1.17.0.drv-0/source/googletest/src/gtest_main.cc
24: [==========] Running 8 tests from 1 test suite.
24: [----------] Global test environment set-up.
24: [----------] 8 tests from GncUnicode
24: [ RUN      ] GncUnicode.test_ss_base_chars
24: [       OK ] GncUnicode.test_ss_base_chars (3 ms)
24: [ RUN      ] GncUnicode.test_ss_accented
24: [       OK ] GncUnicode.test_ss_accented (0 ms)
24: [ RUN      ] GncUnicode.test_ss_accented_case_sensitive
24: [       OK ] GncUnicode.test_ss_accented_case_sensitive (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_literal
24: [       OK ] GncUnicode.test_german_ss_literal (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_base_chars_nocap
24: [       OK ] GncUnicode.test_german_ss_decorated_base_chars_nocap (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_accented_cap
24: [       OK ] GncUnicode.test_german_ss_decorated_accented_cap (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_accented_nocap
24: [       OK ] GncUnicode.test_german_ss_decorated_accented_nocap (0 ms)
24: [ RUN      ] GncUnicode.test_simple_identical
24: [       OK ] GncUnicode.test_simple_identical (24931 ms)
24: [----------] 8 tests from GncUnicode (24935 ms total)
24: 
24: [----------] Global test environment tear-down
24: [==========] 8 tests from 1 test suite ran. (24935 ms total)
24: [  PASSED  ] 8 tests.
1/1 Test #24: test-gnc-unicode .................   Passed   24.94 sec

With caching:

24: Test command: /home/chris/sources/gnucash/build-stable/bin/test-gnc-unicode
24: Working Directory: /home/chris/sources/gnucash/build-stable/libgnucash/core-utils/test
24: Environment variables: 
24:  GNC_UNINSTALLED=YES
24:  GNC_BUILDDIR=/home/chris/sources/gnucash/build-stable
24: Test timeout computed to be: 10000000
24: Running main() from /tmp/guix-build-googletest-1.17.0.drv-0/source/googletest/src/gtest_main.cc
24: [==========] Running 8 tests from 1 test suite.
24: [----------] Global test environment set-up.
24: [----------] 8 tests from GncUnicode
24: [ RUN      ] GncUnicode.test_ss_base_chars
24: [       OK ] GncUnicode.test_ss_base_chars (2 ms)
24: [ RUN      ] GncUnicode.test_ss_accented
24: [       OK ] GncUnicode.test_ss_accented (0 ms)
24: [ RUN      ] GncUnicode.test_ss_accented_case_sensitive
24: [       OK ] GncUnicode.test_ss_accented_case_sensitive (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_literal
24: [       OK ] GncUnicode.test_german_ss_literal (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_base_chars_nocap
24: [       OK ] GncUnicode.test_german_ss_decorated_base_chars_nocap (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_accented_cap
24: [       OK ] GncUnicode.test_german_ss_decorated_accented_cap (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_accented_nocap
24: [       OK ] GncUnicode.test_german_ss_decorated_accented_nocap (0 ms)
24: [ RUN      ] GncUnicode.test_simple_identical
24: [       OK ] GncUnicode.test_simple_identical (1428 ms)
24: [----------] 8 tests from GncUnicode (1431 ms total)
24: 
24: [----------] Global test environment tear-down
24: [==========] 8 tests from 1 test suite ran. (1431 ms total)
24: [  PASSED  ] 8 tests.
1/1 Test #24: test-gnc-unicode .................   Passed    1.44 sec

Also instead of returning -99 how about logging an ICU failure and returning g_strcmp0 (one,two)?

jralls · 2026-02-16T04:20:32Z

Micro benchmarks like that only show that the micro benchmark benefits from the change. They don't demonstrate anything about the program. Profile GnuCash and show where creating a collator makes a hot path.

~~GnuCash can't change locale after startup so there's no point in having a collection of collators.~~ Sorry, the collection is on strength. But you could cache a single collator and change the strength on each call.

Don't use kvp as a variable name. It has special meaning in GnuCash and using it outside of that special meaning is confusing.

christopherlam added 2 commits February 16, 2026 09:40

[gnc-unicode] cache icu collator

10c0e5e

[benchmark] compare strings 5M times

58dbebb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster unicode string compare#2180

Faster unicode string compare#2180
christopherlam wants to merge 2 commits intoGnucash:stablefrom
christopherlam:faster-unicode-compare

christopherlam commented Feb 16, 2026

Uh oh!

jralls commented Feb 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

christopherlam commented Feb 16, 2026

Uh oh!

jralls commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jralls commented Feb 16, 2026 •

edited

Loading