Skip to content

PERF: fix slow repr for Series/DataFrame with third-party array-like objects#64638

Open
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-xarray
Open

PERF: fix slow repr for Series/DataFrame with third-party array-like objects#64638
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-xarray

Conversation

@jbrockmendel
Copy link
Member

Summary

  • Replaces the duck-type is_sequence() check in pprint_thing() with an explicit isinstance allowlist of types that _pprint_seq was designed to handle.
  • Previously, any object with __iter__ and __len__ (e.g. xarray DataArray) would be recursively iterated element-by-element, causing ~20,000 expensive repr calls for a single large DataArray stored in an object-dtype column.
  • Now only built-in Python sequences, numpy arrays, and pandas types are iterated; everything else falls through to str(), using the object's own repr.
  • ~1500x speedup for the reproducer in BUG: Pandas Series with Xarray slow print time. #61809 (3s → 0.002s).

Closes #61809

Test plan

  • Existing tests in test_printing.py, test_format.py, test_formats.py, test_repr.py, test_groupby.py, test_sorting.py, test_to_string.py all pass
  • Verified basic sequence types (list, tuple, set, range, ndarray, np.record, Index, DataFrame) preserve existing formatting behavior
  • Verified xarray DataArray repr is fast and produces clean truncated output

🤖 Generated with Claude Code

…objects

Replace the duck-type is_sequence() check in pprint_thing() with an
explicit isinstance allowlist. Previously, any object with __iter__ and
__len__ (e.g. xarray DataArray) would be recursively iterated
element-by-element via _pprint_seq, causing ~20,000 expensive repr
calls for a single DataArray. Now only types that _pprint_seq was
designed to handle are iterated; everything else uses str() directly.

Closes pandas-dev#61809

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Pandas Series with Xarray slow print time.

1 participant