REGR/PERF: Fix perf regression in _LocationIndexer._post_expansion_casting by rhshadrach · Pull Request #64388 · pandas-dev/pandas

rhshadrach · 2026-03-03T22:13:49Z

Ref: #62523
Ref: pandas-dev/asv-runner#104

Details

dtype = "Float64"
inplace = True
N, M = 10000, 100
if dtype in ("datetime64[ns]", "datetime64[ns, tz]", "timedelta64[ns]"):
    data = {
        "datetime64[ns]": date_range("2011-01-01", freq="h", periods=N),
        "datetime64[ns, tz]": date_range(
            "2011-01-01", freq="h", periods=N, tz="Asia/Tokyo"
        ),
        "timedelta64[ns]": timedelta_range(start="1 day", periods=N, freq="1D"),
    }
    df = DataFrame({f"col_{i}": data[dtype] for i in range(M)})
    df[::2] = None
else:
    values = np.random.randn(N, M)
    values[::2] = np.nan
    if dtype == "Int64":
        values = values.round()
        values = values.astype(object)
        values[::2] = NA
    df = DataFrame(values, dtype=dtype)
fill_values = df.iloc[df.first_valid_index()].to_dict()

df.fillna(value=fill_values, inplace=inplace)
# 1.2s <-- main
# 120 ms <-- PR

Still a large regression after this (~25ms -> 120ms), but at a glance doesn't seem easy to clawback the rest.

cc @jbrockmendel

…sion_casting

jbrockmendel · 2026-03-03T23:44:45Z

pandas/core/indexing.py

-                            orig_obj.iloc[:, i].array, self.obj.iloc[:, i]._values
-                        )
-                        self.obj.isetitem(i, new_arr)
+                mask = (self.obj.dtypes != orig_obj.dtypes).reset_index(drop=True)


i dont understand why the reset_index. why not ._values?

and an enumerate instead of .items() in the next line

ok i guess they're equivalent. never mind.

Yea, lots of different cases will have one being more performant than the other.

def setup(size, prob): return pd.Series(random.choices([True, False], [prob, 1.0 - prob], k=size)) def foo(): mask = ser.reset_index(drop=True) for i, value in mask[mask].items(): pass def bar(): for i, value in enumerate(ser.tolist()): if value: pass ser = setup(size=10000, prob=0.01) %timeit foo() # 130 μs ± 243 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) %timeit bar() # 291 μs ± 9.07 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) ser = setup(size=100, prob=0.01) %timeit foo() # 97.1 μs ± 78.8 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) %timeit bar() # 2.75 μs ± 16.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Happy to go either way here.

Any way we could leverage the self.obj._mgr.get_dtypes() with the self.obj._mgr.blknos to more quickly deduce which columns locations have different dtypes now?

modest prerence for ._values and enumerate since i find it more immediately obvious what it is doing, but not gonna harp on it.

Updated, these shaved off another 50ms.

jbrockmendel · 2026-03-03T23:48:49Z

i suspect the long term solution is to change the concat-path to reindex instead of concat (xref #62369) which in conjunction with #53910 should make the cast-back logic unnecessary

mroeschke · 2026-03-05T01:43:04Z

pandas/core/indexing.py

+                for i, changed_dtype in enumerate(changed_dtypes):
+                    if changed_dtype:


You can use for i in np.flatnonzero(changed_dtypes) to find the indices where the value is True as well

REGR/PERF: Fix performance regression in _LocationIndexer._post_expan…

5158433

…sion_casting

rhshadrach added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version labels Mar 3, 2026

rhshadrach added this to the 3.1 milestone Mar 3, 2026

jbrockmendel reviewed Mar 3, 2026

View reviewed changes

Use _mgr.get_dtypes

9c67ed5

mroeschke requested a review from jbrockmendel March 5, 2026 01:38

mroeschke reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

REGR/PERF: Fix perf regression in _LocationIndexer._post_expansion_casting#64388

REGR/PERF: Fix perf regression in _LocationIndexer._post_expansion_casting#64388
rhshadrach wants to merge 2 commits intopandas-dev:mainfrom
rhshadrach:regr_post_expansion_casting

rhshadrach commented Mar 3, 2026 •

edited

Loading

Uh oh!

jbrockmendel Mar 3, 2026

Uh oh!

jbrockmendel Mar 3, 2026

Uh oh!

jbrockmendel Mar 3, 2026

Uh oh!

rhshadrach Mar 4, 2026

Uh oh!

mroeschke Mar 4, 2026

Uh oh!

jbrockmendel Mar 4, 2026

Uh oh!

rhshadrach Mar 4, 2026

Uh oh!

jbrockmendel commented Mar 3, 2026

Uh oh!

mroeschke Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		for i, changed_dtype in enumerate(changed_dtypes):
		if changed_dtype:

Uh oh!

Conversation

rhshadrach commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Mar 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rhshadrach commented Mar 3, 2026 •

edited

Loading