Skip to content

Deserialization as Vector{SubArray} breaks push! on DataFrame #506

@maleadt

Description

@maleadt

I'm using Arrow v2.7.2 with DataFrames v1.6.1 on Julia 1.10, and am running into an issue that seems to stem from Arrow.jl deserializing my Vector{Vector{T}} columns as Vector{SubArray{...}}:

julia> using Arrow, DataFrames

julia> df = DataFrame(foo=Vector{Int}[]);

julia> push!(df, [[1,2,3]])
1×1 DataFrame
 Row │ foo
     │ Array
─────┼───────────
   1 │ [1, 2, 3]

julia> Arrow.write("/tmp/test.arrow", df)
"/tmp/test.arrow"

julia> df2 = copy(DataFrame(Arrow.Table("/tmp/test.arrow")));

julia> typeof(df2.foo)
Vector{SubArray{Int64, 1, Primitive{Int64, Vector{Int64}}, Tuple{UnitRange{Int64}}, true}} (alias for Array{SubArray{Int64, 1, Arrow.Primitive{Int64, Array{Int64, 1}}, Tuple{UnitRange{Int64}}, true}, 1})

This breaks certain push!es on the dataframe, which I haven't been able to reproduce in isolation, but which looks as follows:

MethodError: Cannot `convert` an object of type Vector{Int64} to an object of type SubArray{Int64, 1, Arrow.Primitive{Int64, Vector{Int64}}, Tuple{UnitRange{Int64}}, true}

Stacktrace:
  [1] push!(a::Vector{SubArray{Int64, 1, Arrow.Primitive{Int64, Vector{Int64}}, Tuple{UnitRange{Int64}}, true}}, item::Vector{Int64})
    @ Base ./array.jl:1118
  [2] _row_inserter!(df::DataFrame, loc::Int64, row::Tuple{String, Vector{Int64}, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, String, Bool, Bool, Bool, Vector{Int64}, Vector{Int64}, Vector{Int64}, String, String, Float64}, mode::Val{:push}, promote::Bool)
    @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/insertion.jl:663
  [3] push!(df::DataFrame, row::Tuple{String, Vector{Int64}, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, String, Bool, Bool, Bool, Vector{Int64}, Vector{Int64}, Vector{Int64}, String, String, Float64})
    @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/insertion.jl:457

It's possible I'm doing something wrong; first time Arrow.jl user here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions