fix: Add fallback for incompatible string concenation in pandas by MarcoGorelli · Pull Request #3548 · narwhals-dev/narwhals

MarcoGorelli · 2026-04-14T14:01:12Z

closes #3546

Description

What type of PR is this? (check all applicable)

Related issues

Related issue #<issue number>
Closes #<issue number>

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

dangotbanned · 2026-04-14T15:12:12Z

Note

Edited to increase the range

Have you thought about handling the dtypes somewhere around here instead?

narwhals/narwhals/_pandas_like/namespace.py

Lines 320 to 328 in 2eaaa03

    
           def concat_str( 
        
               self, *exprs: PandasLikeExpr, separator: str, ignore_nulls: bool 
        
           ) -> PandasLikeExpr: 
        
               string = self._version.dtypes.String() 
        
               def func(df: PandasLikeDataFrame) -> list[PandasLikeSeries]: 
        
                   expr_results = [s for _expr in exprs for s in _expr(df)] 
        
                   series = [s.cast(string) for s in expr_results]

E.g. you could do 2 passes where:

Collect all the native dtypes
Decide on what's needed to make them compatible & then do the casts at the native-level

This is reminding me a bit of #3398 (comment) 😄

MarcoGorelli · 2026-04-14T15:14:52Z

not sure about doing it for all ops tbh, but for string concatenation at least it seems to be necessary so i'd suggest starting with that, we can make it more generally available later if necessary

dangotbanned · 2026-04-14T15:24:26Z

not sure about doing it for all ops tbh, but for string concatenation at least it seems to be necessary so i'd suggest starting with that, we can make it more generally available later if necessary

Sorry if I wasn't clear, I was talking about concat_str - have edited (#3548 (comment))

MarcoGorelli · 2026-04-14T15:26:04Z

we'd then need to do the same casts for __add__, so i think it's cleaner to do just it in one place?

dangotbanned · 2026-04-14T15:43:27Z

we'd then need to do the same casts for __add__

I was a bit confused by this, I had no idea polars did it too 😂

import polars as pl

>>> pl.Series(["a", "b", "c"]) + pl.Series(["d", "e", "f"])
shape: (3,)
Series: '' [str]
[
	"ad"
	"be"
	"cf"
]

…arisons

MarcoGorelli · 2026-05-04T10:48:49Z

I've aligned this with the upstream pandas behaviour in pandas-dev/pandas#65220 (comment)

any objections?

FBruzzesi

Thanks @MarcoGorelli - I am mostly trusting the test suite on this!

Also added a tiny suggestion

FBruzzesi

Thanks @MarcoGorelli ! No further questions from my side

FBruzzesi · 2026-05-04T20:52:42Z

        return self.native.iloc[-1] if len(self.native) else None

-    def _with_binary(self, op: Callable[..., PandasLikeSeries], other: Any) -> Self:
+    def _with_binary(self, op: Callable[..., pd.Series], other: Any) -> Self:


Nice catch here :)

camriddell

Overall, this is a good compatibility bridge. With the current implementation I think we should:

Discuss the costs of re-trying all failed __add__ operations.
Remove the written out test cases for a combinatoric solution and property derived dtype detection.

camriddell · 2026-05-04T22:07:35Z

+            pd.ArrowDtype(pa.large_string()),
+            pd.ArrowDtype(pa.large_string()),
+            pd.ArrowDtype(pa.large_string()),
+        ),


The cases are serve effectively as a handwritten truth table.

Can this be refactored to

parameterize from itertools.product

embed the logic for deriving the appropriate datatype into the test?

left_dtype is a pandas.StringDtype -> left_dtype

left_dtype is a pandas.ArrowDtype and right_dtype is pandas.StringDtype -> pd.ArrowDtype(pa.large_string()

both left/right are ArrowDtype: pa.string if all inputs are pa.string otherwise pa.large_string

This reduces the length of the parameters spec and codifies the expected rules for returning a datatype.

DTYPES = [ pd.StringDtype("pyarrow", na_value=np.nan), # type: ignore[call-arg] pd.StringDtype("pyarrow", na_value=pd.NA), # type: ignore[call-arg] pd.ArrowDtype(pa.string()), pd.ArrowDtype(pa.large_string()), ] @pytest.mark.parametrize( ("left_dtype", "right_dtype"), [*product(DTYPES, repeat=2)], ) def test_pandas_str_types(...): ... assert_equal_data(res, expected) result_dtype = res.to_native()["concat_col"].dtype if isinstance(left_dtype, pd.StringDtype): expected_dtype = left_dtype elif isinstance(left_dtype, pd.ArrowDtype) and isinstance(right_dtype, pd.StringDtype): expected_dtype = pd.ArrowDtype(pa.large_string()) elif isinstance(left_dtype, pd.ArrowDtype) and isinstance(right_dtype, pd.ArrowDtype): left_type = left_dtype.pyarrow_dtype right_type = right_dtype.pyarrow_dtype if pa.types.is_large_string(left_type) or pa.types.is_large_string(right_type): expected_dtype = pd.ArrowDtype(pa.large_string()) else: expected_dtype = pd.ArrowDtype(pa.string()) assert result_dtype == expected_dtype

camriddell · 2026-05-04T22:20:37Z

-        ).alias(self.name)
+        try:
+            res = op(ser, other_native)
+        except Exception:


Can this be more specific (e.g. TypeError rather than Exception)? Observing the test results without this snippet, I only observe TypeErrors but this could be due to a backport from pandas.

For cases when addition will fail no matter what (e.g. adding integers and strings) retrying __add__ in this fashion makes for very long tracebacks due to the nested exceptions (see below)

This may be avoided by checking for string datatypes on both arrays before retrying. I think we may even be able to do this check against the narwhals type?

very long tracebacks

import pandas as pd import narwhals as nw df = pd.DataFrame({"count": [1, 2, 3], "fruit": ["apple", "banana", "orange"]}) nw.from_native(df).with_columns(concat=nw.col("count") + nw.col("fruit"))

produces

Traceback (most recent call last): File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/arrays/arrow/array.py", line 988, in _evaluate_op_method result = pc.binary_join_element_wise(other, self._pa_array, sep) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pyarrow/compute.py", line 271, in wrapper return func.call(args, options, memory_pool) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/_compute.pyx", line 399, in pyarrow._compute.Function.call File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status pyarrow.lib.ArrowNotImplementedError: Function 'binary_join_element_wise' has no kernel matching input types (int64, large_string, large_string) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_pandas_like/series.py", line 407, in _with_binary res = op(ser, other_native) ^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/ops/common.py", line 85, in new_method return method(self, other) ^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/arraylike.py", line 190, in __add__ return self._arith_method(other, operator.add) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/series.py", line 6751, in _arith_method return base.IndexOpsMixin._arith_method(self, other, op) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/base.py", line 1644, in _arith_method result = ops.arithmetic_op(lvalues, rvalues, op) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/ops/array_ops.py", line 279, in arithmetic_op res_values = op(left, right) ^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/arrays/arrow/array.py", line 845, in __array_ufunc__ result = super().__array_ufunc__(ufunc, method, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/arrays/base.py", line 2704, in __array_ufunc__ result = arraylike.maybe_dispatch_ufunc_to_dunder_op( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pandas/_libs/ops_dispatch.pyx", line 113, in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/ops/common.py", line 85, in new_method return method(self, other) ^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/arraylike.py", line 194, in __radd__ return self._arith_method(other, roperator.radd) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/arrays/arrow/array.py", line 1079, in _arith_method result = self._evaluate_op_method(other, op, ARROW_ARITHMETIC_FUNCS) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/arrays/arrow/array.py", line 990, in _evaluate_op_method raise TypeError( TypeError: operation 'radd' not supported for dtype 'str' with dtype 'int64' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/cameron/repos/opensource/narwhals-dev/t.py", line 5, in <module> nw.from_native(df).with_columns(concat=nw.col("count") + nw.col("fruit")) File "/home/cameron/repos/opensource/narwhals-dev/narwhals/dataframe.py", line 1506, in with_columns return super().with_columns(*exprs, **named_exprs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/dataframe.py", line 214, in with_columns return self._with_compliant(self._compliant_frame.with_columns(*compliant_exprs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_pandas_like/dataframe.py", line 503, in with_columns columns = self._evaluate_exprs(*exprs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_compliant/dataframe.py", line 367, in _evaluate_exprs return tuple(chain.from_iterable(self._evaluate_expr(expr) for expr in exprs)) # pyright: ignore[reportArgumentType] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_compliant/dataframe.py", line 367, in <genexpr> return tuple(chain.from_iterable(self._evaluate_expr(expr) for expr in exprs)) # pyright: ignore[reportArgumentType] ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_compliant/dataframe.py", line 380, in _evaluate_expr result = expr(self) ^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_compliant/expr.py", line 247, in __call__ return self._call(df) ^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_compliant/expr.py", line 651, in <lambda> lambda df: [series.alias(name) for series in self(df)], ^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_compliant/expr.py", line 247, in __call__ return self._call(df) ^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_compliant/expr.py", line 372, in _reuse_series_inner series._from_scalar(method(series)) if returns_scalar else method(series) ^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_pandas_like/series.py", line 452, in __add__ return self._with_binary(operator.add, other) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_pandas_like/series.py", line 411, in _with_binary res = binary_string_sum_fallback(ser, other_native, pdx) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/narwhals/_pandas_like/utils.py", line 742, in binary_string_sum_fallback return left + right.astype(left_dtype) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/generic.py", line 6541, in astype new_data = self._mgr.astype(dtype=dtype, errors=errors) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 611, in astype return self.apply("astype", dtype=dtype, errors=errors) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 442, in apply applied = getattr(b, f)(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/internals/blocks.py", line 607, in astype new_values = astype_array_safe(values, dtype, errors=errors) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/dtypes/astype.py", line 240, in astype_array_safe new_values = astype_array(values, dtype, copy=copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/dtypes/astype.py", line 182, in astype_array values = values.astype(dtype, copy=copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/arrays/string_arrow.py", line 336, in astype return super().astype(dtype, copy=copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/arrays/base.py", line 881, in astype return np.asarray(self, dtype=dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/arrays/arrow/array.py", line 864, in __array__ return self.to_numpy(dtype=dtype, copy=copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/arrays/arrow/array.py", line 1717, in to_numpy result = result.astype(dtype, copy=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: invalid literal for int() with base 10: 'apple'

camriddell · 2026-05-04T22:24:48Z

+import pyarrow as pa
+
+STRING_DTYPE_NAN = pd.StringDtype("pyarrow", na_value=np.nan)  # type: ignore[call-arg]
+STRING_DTYPE_NA = pd.StringDtype("pyarrow", na_value=pd.NA)  # type: ignore[call-arg]


Would there be any purpose to add non-pyarrow backed StringDtypes into this test?

MarcoGorelli added 3 commits April 14, 2026 15:00

fix: Add fallback for incompatible string concenation in pandas

9a5a726

cvg

b94ba8e

fixup

4187c38

MarcoGorelli marked this pull request as ready for review April 14, 2026 15:19

MarcoGorelli added 3 commits May 4, 2026 10:19

Merge remote-tracking branch 'upstream/main' into binary-pyarrow-comp…

4d6368c

…arisons

align with upstream pandas

b1f312d

pyrefly typing

ddf5615

MarcoGorelli added the fix label May 4, 2026

FBruzzesi reviewed May 4, 2026

View reviewed changes

Comment thread narwhals/_pandas_like/series.py

FBruzzesi approved these changes May 4, 2026

View reviewed changes

camriddell reviewed May 4, 2026

View reviewed changes

Conversation

MarcoGorelli commented Apr 14, 2026

Description

What type of PR is this? (check all applicable)

Related issues

Checklist

Uh oh!

dangotbanned commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcoGorelli commented Apr 14, 2026

Uh oh!

dangotbanned commented Apr 14, 2026

Uh oh!

MarcoGorelli commented Apr 14, 2026

Uh oh!

dangotbanned commented Apr 14, 2026

Uh oh!

MarcoGorelli commented May 4, 2026

Uh oh!

FBruzzesi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FBruzzesi left a comment

Choose a reason for hiding this comment

Uh oh!

FBruzzesi May 4, 2026

Choose a reason for hiding this comment

Uh oh!

camriddell left a comment

Choose a reason for hiding this comment

Uh oh!

camriddell May 4, 2026

Choose a reason for hiding this comment

Uh oh!

camriddell May 4, 2026

Choose a reason for hiding this comment

Uh oh!

camriddell May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dangotbanned commented Apr 14, 2026 •

edited

Loading