Skip to content

Faster unicode string compare#2180

Draft
christopherlam wants to merge 2 commits intoGnucash:stablefrom
christopherlam:faster-unicode-compare
Draft

Faster unicode string compare#2180
christopherlam wants to merge 2 commits intoGnucash:stablefrom
christopherlam:faster-unicode-compare

Conversation

@christopherlam
Copy link
Contributor

I still think that unicode collator needs to be cached; it's heavily used by ui and report code everywhere... Here's benchmark comparing 5M strings. Without caching:

24: Test command: /home/chris/sources/gnucash/build-stable/bin/test-gnc-unicode
24: Working Directory: /home/chris/sources/gnucash/build-stable/libgnucash/core-utils/test
24: Environment variables: 
24:  GNC_UNINSTALLED=YES
24:  GNC_BUILDDIR=/home/chris/sources/gnucash/build-stable
24: Test timeout computed to be: 10000000
24: Running main() from /tmp/guix-build-googletest-1.17.0.drv-0/source/googletest/src/gtest_main.cc
24: [==========] Running 8 tests from 1 test suite.
24: [----------] Global test environment set-up.
24: [----------] 8 tests from GncUnicode
24: [ RUN      ] GncUnicode.test_ss_base_chars
24: [       OK ] GncUnicode.test_ss_base_chars (3 ms)
24: [ RUN      ] GncUnicode.test_ss_accented
24: [       OK ] GncUnicode.test_ss_accented (0 ms)
24: [ RUN      ] GncUnicode.test_ss_accented_case_sensitive
24: [       OK ] GncUnicode.test_ss_accented_case_sensitive (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_literal
24: [       OK ] GncUnicode.test_german_ss_literal (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_base_chars_nocap
24: [       OK ] GncUnicode.test_german_ss_decorated_base_chars_nocap (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_accented_cap
24: [       OK ] GncUnicode.test_german_ss_decorated_accented_cap (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_accented_nocap
24: [       OK ] GncUnicode.test_german_ss_decorated_accented_nocap (0 ms)
24: [ RUN      ] GncUnicode.test_simple_identical
24: [       OK ] GncUnicode.test_simple_identical (24931 ms)
24: [----------] 8 tests from GncUnicode (24935 ms total)
24: 
24: [----------] Global test environment tear-down
24: [==========] 8 tests from 1 test suite ran. (24935 ms total)
24: [  PASSED  ] 8 tests.
1/1 Test #24: test-gnc-unicode .................   Passed   24.94 sec

With caching:

24: Test command: /home/chris/sources/gnucash/build-stable/bin/test-gnc-unicode
24: Working Directory: /home/chris/sources/gnucash/build-stable/libgnucash/core-utils/test
24: Environment variables: 
24:  GNC_UNINSTALLED=YES
24:  GNC_BUILDDIR=/home/chris/sources/gnucash/build-stable
24: Test timeout computed to be: 10000000
24: Running main() from /tmp/guix-build-googletest-1.17.0.drv-0/source/googletest/src/gtest_main.cc
24: [==========] Running 8 tests from 1 test suite.
24: [----------] Global test environment set-up.
24: [----------] 8 tests from GncUnicode
24: [ RUN      ] GncUnicode.test_ss_base_chars
24: [       OK ] GncUnicode.test_ss_base_chars (2 ms)
24: [ RUN      ] GncUnicode.test_ss_accented
24: [       OK ] GncUnicode.test_ss_accented (0 ms)
24: [ RUN      ] GncUnicode.test_ss_accented_case_sensitive
24: [       OK ] GncUnicode.test_ss_accented_case_sensitive (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_literal
24: [       OK ] GncUnicode.test_german_ss_literal (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_base_chars_nocap
24: [       OK ] GncUnicode.test_german_ss_decorated_base_chars_nocap (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_accented_cap
24: [       OK ] GncUnicode.test_german_ss_decorated_accented_cap (0 ms)
24: [ RUN      ] GncUnicode.test_german_ss_decorated_accented_nocap
24: [       OK ] GncUnicode.test_german_ss_decorated_accented_nocap (0 ms)
24: [ RUN      ] GncUnicode.test_simple_identical
24: [       OK ] GncUnicode.test_simple_identical (1428 ms)
24: [----------] 8 tests from GncUnicode (1431 ms total)
24: 
24: [----------] Global test environment tear-down
24: [==========] 8 tests from 1 test suite ran. (1431 ms total)
24: [  PASSED  ] 8 tests.
1/1 Test #24: test-gnc-unicode .................   Passed    1.44 sec

Also instead of returning -99 how about logging an ICU failure and returning g_strcmp0 (one,two)?

@jralls
Copy link
Member

jralls commented Feb 16, 2026

Micro benchmarks like that only show that the micro benchmark benefits from the change. They don't demonstrate anything about the program. Profile GnuCash and show where creating a collator makes a hot path.

GnuCash can't change locale after startup so there's no point in having a collection of collators. Sorry, the collection is on strength. But you could cache a single collator and change the strength on each call.

Don't use kvp as a variable name. It has special meaning in GnuCash and using it outside of that special meaning is confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants