Skip to content

REGR/PERF: Fix perf regression in _LocationIndexer._post_expansion_casting#64388

Open
rhshadrach wants to merge 2 commits intopandas-dev:mainfrom
rhshadrach:regr_post_expansion_casting
Open

REGR/PERF: Fix perf regression in _LocationIndexer._post_expansion_casting#64388
rhshadrach wants to merge 2 commits intopandas-dev:mainfrom
rhshadrach:regr_post_expansion_casting

Conversation

@rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Mar 3, 2026

Ref: #62523
Ref: pandas-dev/asv-runner#104

Details
dtype = "Float64"
inplace = True
N, M = 10000, 100
if dtype in ("datetime64[ns]", "datetime64[ns, tz]", "timedelta64[ns]"):
    data = {
        "datetime64[ns]": date_range("2011-01-01", freq="h", periods=N),
        "datetime64[ns, tz]": date_range(
            "2011-01-01", freq="h", periods=N, tz="Asia/Tokyo"
        ),
        "timedelta64[ns]": timedelta_range(start="1 day", periods=N, freq="1D"),
    }
    df = DataFrame({f"col_{i}": data[dtype] for i in range(M)})
    df[::2] = None
else:
    values = np.random.randn(N, M)
    values[::2] = np.nan
    if dtype == "Int64":
        values = values.round()
        values = values.astype(object)
        values[::2] = NA
    df = DataFrame(values, dtype=dtype)
fill_values = df.iloc[df.first_valid_index()].to_dict()

df.fillna(value=fill_values, inplace=inplace)
# 1.2s <-- main
# 120 ms <-- PR

Still a large regression after this (~25ms -> 120ms), but at a glance doesn't seem easy to clawback the rest.

cc @jbrockmendel

@rhshadrach rhshadrach added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version labels Mar 3, 2026
@rhshadrach rhshadrach added this to the 3.1 milestone Mar 3, 2026
orig_obj.iloc[:, i].array, self.obj.iloc[:, i]._values
)
self.obj.isetitem(i, new_arr)
mask = (self.obj.dtypes != orig_obj.dtypes).reset_index(drop=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont understand why the reset_index. why not ._values?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and an enumerate instead of .items() in the next line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i guess they're equivalent. never mind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, lots of different cases will have one being more performant than the other.

def setup(size, prob):
    return pd.Series(random.choices([True, False], [prob, 1.0 - prob], k=size))

def foo():
    mask = ser.reset_index(drop=True)
    for i, value in mask[mask].items():
        pass

def bar():
    for i, value in enumerate(ser.tolist()):
        if value:
            pass

ser = setup(size=10000, prob=0.01)
%timeit foo()
# 130 μs ± 243 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit bar()
# 291 μs ± 9.07 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

ser = setup(size=100, prob=0.01)
%timeit foo()
# 97.1 μs ± 78.8 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit bar()
# 2.75 μs ± 16.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Happy to go either way here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way we could leverage the self.obj._mgr.get_dtypes() with the self.obj._mgr.blknos to more quickly deduce which columns locations have different dtypes now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modest prerence for ._values and enumerate since i find it more immediately obvious what it is doing, but not gonna harp on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, these shaved off another 50ms.

@jbrockmendel
Copy link
Member

i suspect the long term solution is to change the concat-path to reindex instead of concat (xref #62369) which in conjunction with #53910 should make the cast-back logic unnecessary

@mroeschke mroeschke requested a review from jbrockmendel March 5, 2026 01:38
Comment on lines +961 to +962
for i, changed_dtype in enumerate(changed_dtypes):
if changed_dtype:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use for i in np.flatnonzero(changed_dtypes) to find the indices where the value is True as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants