Skip to content

Remove deprecated NGED CKAN code #100

@JackKelly

Description

@JackKelly

Once we're confident that NGED's S3 data pipeline is working well, then delete all the CKAN code, including:

  • The entire nged_data package.
  • All the old CKAN Dagster assets.
  • Anything in the data contracts that's only used in the CKAN data. e.g. different substation names from the location data versus the live substation data.

TODO

  • Implement code to ingest NGED's new JSON data
  • Remove code that ingested CKAN data
  • Why does PowerTimeSeries schema have startTime? Can't we just store endTime, and ensure the incoming data is half-hourly?
  • Rename value to power in the XGBoostFeatures, and elsewhere?
  • Maybe rename end_time to valid_time in PowerTimeSeries to match the use of valid_time in the NWP data? Or maybe it's nice to be explicit that end_time is the end time of the half-hour period???
  • Go through the sample archive JSON data, and find the allowed values (e.g. for substation_type), and tighten up the data contracts.
  • Rename packages/nged_json_data to nged_timeseries_data? (assuming we use a different package for the non-time-series data, like the adjacency matrix?) Or maybe to nged_data?
  • Run the pipeline on the JSON data from sharepoint. Save Delta table & Parquet to disk. Visualise in dashboard.
  • Rename flows_30m to power_time_series
  • Check the pipeline for creating the h3_res_5 column in the time_series_metadata.parquet, and - crucially - make sure it's not over-complicated. I think there might be some silly code in the model training that fills in the H3 if it's missing. But it shouldn't be missing!
  • Run full pipeline (including model training & eval) on NGED's JSON archive
  • Why are we getting spurious periods of zeros in the cleaned power data?
  • Investigate any remaining mentions of csv or substation #105
  • Pull data from NGED's S3 bucket.
  • Final automated code review
  • Manual review code

Related

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions