Skip to content

Terminal pivot operations return incorrect T schema, should be Any? instead #1787

@koperagen

Description

@koperagen

result of pivot is always new set of columns created based on data in columns. Current return type is a guaranteed runtime exception:

Image

Operations like public fun <T> Pivot<T>.count(): DataRow<T> = delegate { count() } should become

public fun <T> Pivot<T>.count(): DataRow<*> = delegate { count() }

students.json

import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.annotations.*
import org.jetbrains.kotlinx.dataframe.api.*
import org.jetbrains.kotlinx.dataframe.io.*

@DataSchema
data class Students(
    val age: Int,
    val name: Name,
    val scores: List<Scores>
) {
    @DataSchema
    data class Name(
        val firstName: String,
        val lastName: String?
    )

    @DataSchema
    data class Scores(
        val subject: String,
        @ColumnName("value")
        val `value`: Int
    )
}

fun main() {
    val students = DataFrame.readJson("/home/nikita/IdeaProjects/dataframe-examples/students.json")
        .cast<Students>()

    students.explode { scores }.pivot { scores.subject }.count().schema().print()
}

=>

math: Int
biology: Int
music: Int
nothing: Int
russian: Int
art: Int

There's some connection to original data if "with" is used, but even so it's still misleading. As a side note,

students.explode { scores }.pivot { scores.subject }.with { it }.schema().print()
math: *
    name:
        firstName: String
        lastName: String
    age: Int
    scores:
        subject: String
        value: Int

biology: *
    name:
        firstName: String
        lastName: String
    age: Int
    scores:
        subject: String
        value: Int

music:
    name:
        firstName: String
        lastName: String
    age: Int
    scores:
        subject: String
        value: Int

nothing:
    name:
        firstName: String
        lastName: String?
    age: Int
    scores:
        subject: String
        value: Int

russian:
    name:
        firstName: String
        lastName: String
    age: Int
    scores:
        subject: String
        value: Int

art:
    name:
        firstName: String
        lastName: String
    age: Int
    scores:
        subject: String
        value: Int

Metadata

Metadata

Assignees

No one assigned

    Labels

    APIIf it touches our APICompiler pluginAnything related to the DataFrame Compiler PluginbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions