Skip to content

Hardcode data type for deploy-on and off date #115

@peterdesmet

Description

@peterdesmet

The reference data for Galapagos albatrosses contains the field deploy-off-date.

  1. Since it is empty for all records, add_resource() assumes it is a string => type: string
  2. Since its definition contains yyyy-MM-dd HH:mm:ss.SSS, the format is set to format: %Y-%m-%d %H:%M:%S.%f:

movepub/R/add_resource.R

Lines 75 to 79 in 6508d5f

format = ifelse(
grepl("Format: yyyy-MM-dd HH:mm:ss.SSS;", definition),
"%Y-%m-%d %H:%M:%S.%f",
"default"
),

  1. That is an invalid format for type: string, so frictionless validate datapackage.json results in:
                                                dataset                                                
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Row  ┃ Field ┃ Type        ┃ Message                                                                ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ None │ None  │ field-error │ Field is not valid: '%Y-%m-%d %H:%M:%S.%f' is not one of ['default',   │
│      │       │             │ 'email', 'uri', 'binary', 'uuid', 'wkt'] at property 'format'          │
└──────┴───────┴─────────────┴────────────────────────────────────────────────────────────────────────┘

We could fix this by not guessing the date type, but hard-coding it, like we do for other fields:

movepub/R/add_resource.R

Lines 49 to 66 in 6508d5f

type <- dplyr::recode(
prefLabel,
"algorithm marked outlier" = "boolean",
"animal ID" = "string",
"barometric height" = "number",
"barometric pressure" = "number",
"compass heading" = "number",
"deployment ID" = "string",
"event ID" = "integer",
"GPS satellite count" = "integer",
"GPS VDOP" = "number",
"individual local identifier" = "string",
"tag ID" = "string",
"tag local identifier" = "string",
"tag serial no" = "string",
.missing = field$type,
.default = field$type
)

While the hardcoding solution works, we'll miss some. We might need a smarter approach to setting date types. @sarahcd Ideally, type would be defined in the Movebank Attribute Dictionary. Not sure it's possible to define it there?

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions