Skip to content

[BugFix] Make SQLAlchemy reflection dataclasses hashable#71302

Open
VijayShekhawat7 wants to merge 1 commit intoStarRocks:mainfrom
VijayShekhawat7:fix/sqlalchemy-unhashable-dataclasses
Open

[BugFix] Make SQLAlchemy reflection dataclasses hashable#71302
VijayShekhawat7 wants to merge 1 commit intoStarRocks:mainfrom
VijayShekhawat7:fix/sqlalchemy-unhashable-dataclasses

Conversation

@VijayShekhawat7
Copy link
Copy Markdown

@VijayShekhawat7 VijayShekhawat7 commented Apr 4, 2026

Why I'm doing:

Fixes #70733

The ReflectedTableKeyInfo, ReflectedPartitionInfo, ReflectedDistributionInfo, and ReflectedRefreshInfo dataclasses in the StarRocks Python client lack __hash__ implementations. SQLAlchemy's reflection cache uses these as dict keys, causing TypeError: unhashable type at runtime.

What I'm doing:

Three changes to make all four dataclasses safely hashable:

  1. Custom __hash__ for list-column classes: ReflectedTableKeyInfo and ReflectedDistributionInfo have a columns field typed Optional[Union[List[str], str]]. Python's unsafe_hash=True cannot handle List[str] (unhashable type). These classes now define explicit __hash__ methods that convert list-typed columns to tuples before hashing. ReflectedRefreshInfo and ReflectedPartitionInfo keep unsafe_hash=True since all their fields are already hashable types (strings).

  2. Non-mutating __str__ methods: The original __str__ methods on ReflectedTableKeyInfo, ReflectedPartitionInfo, and ReflectedDistributionInfo mutated self.type and self.columns in-place (e.g., self.type = self.type.upper()). With hashing enabled, this would cause hash instability — the hash changes after calling str(). Refactored all __str__ methods to use local variables instead.

  3. Dead code removal: Removed unused type_str variable in ReflectedPartitionInfo.__str__.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

The ReflectedTableKeyInfo, ReflectedPartitionInfo,
ReflectedDistributionInfo, and ReflectedRefreshInfo dataclasses were
not hashable, which caused TypeError when SQLAlchemy's reflection
cache tried to use them as dict keys.

Three changes:
1. Add custom __hash__ methods to ReflectedTableKeyInfo and
   ReflectedDistributionInfo that convert list-typed `columns` fields
   to tuples before hashing, since List[str] is not hashable.
   ReflectedRefreshInfo and ReflectedPartitionInfo keep unsafe_hash=True
   as all their fields are already hashable types.
2. Refactor __str__ methods to use local variables instead of mutating
   self attributes, preventing hash instability after str() calls.
3. Remove dead variable `type_str` in ReflectedPartitionInfo.__str__.

Fixes StarRocks#70733

Signed-off-by: Vijay Shekhawat <vijayshekhawat1995@gmail.com>
Made-with: Cursor
@VijayShekhawat7 VijayShekhawat7 force-pushed the fix/sqlalchemy-unhashable-dataclasses branch from e2080b5 to a3b776d Compare April 4, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeError: unhashable type: 'ReflectedPartitionInfo'` in Python SQLAlchemy dialect breaks Superset dataset creation

1 participant