Skip to content

BUG: distinguish bool from int in object-dtype hash table#64639

Open
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:bug-62888
Open

BUG: distinguish bool from int in object-dtype hash table#64639
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:bug-62888

Conversation

@jbrockmendel
Copy link
Member

Summary

  • In pyobject_cmp() (khash_python.h), added a type guard that returns not-equal when exactly one of the two objects is a Python bool (PyBool_Check). This correctly distinguishes int(0) from bool(False) and int(1) from bool(True) in the khash equality function used by PyObjectHashTable.
  • Fixes factorize, unique, duplicated, isin, value_counts, groupby, and Index.get_loc for object-dtype data containing mixed bool/int values.
  • No measurable perf impact: the check is a single PyBool_Check (reads the type pointer) on each side, executed only when types already differ.

Test plan

  • New tests in test_algos.py for factorize, unique, isin, value_counts, duplicated
  • New tests in test_indexing.py for get_indexer, get_loc, is_unique
  • New test in test_groupby.py for groupby with mixed bool/int keys
  • Existing test suites pass with no regressions

🤖 Generated with Claude Code

…#62888)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: factorize with objects differentiate 0/1 from False, True

1 participant