Skip to content

BUG: json object parse index failure #6631

@dentiny

Description

@dentiny

Hi team, I found this issue when I'm using lance

>>> import lance, pyarrow as pa, json
>>> arr_data = [json.dumps(["zero", "one", "two"])]
>>> obj_data = [json.dumps({"0": "from_object"})]
>>> table = pa.table({
...     "id": [1, 2],
...     "data": pa.array(arr_data + obj_data, type=pa.json_()),
... })
>>> lance.write_dataset(table, "/tmp/test_json_key", mode="overwrite")
[2026-04-28T06:07:16Z WARN  lance::dataset::write::insert] No existing dataset at /tmp/test_json_key, it will be created
<lance.dataset.LanceDataset object at 0x12aa1ebd0>
>>> ds = lance.dataset("/tmp/test_json_key")
>>> result = ds.to_table(filter="json_get_string(data, '0') = 'from_object'")
>>> print(f"Object field '0' match: {result.num_rows}")
Object field '0' match: 0

I expect the matched row count should be 1, instead of 0

An example that works as expected

>>> import lance, pyarrow as pa, json
>>> arr_data = [json.dumps(["zero", "one", "two"])]
>>> obj_data = [json.dumps({"hello": "from_object"})]
>>> table = pa.table({"id": [1, 2], "data": pa.array(arr_data + obj_data, type=pa.json_())})
>>> lance.write_dataset(table, "/tmp/test_json_key", mode="overwrite")
<lance.dataset.LanceDataset object at 0x106b2ebd0>
>>> ds = lance.dataset("/tmp/test_json_key")
>>> result = ds.to_table(filter="json_get_string(data, 'hello') = 'from_object'")
>>> print(f"Object field '0' match: {result.num_rows}")
Object field '0' match: 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions