-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Problem
When collecting VPC flow logs, records with log_status = 'NODATA' are treated as errors because they don't have start_time or end_time values (these fields contain - which is the nil value).
NODATA is a valid, documented VPC flow log status that indicates "no data was recorded during the aggregation interval." These records don't have timestamps by design - there was simply no traffic during that window.
Current behavior
- NODATA records fail validation because
tp_timestampis never set - These are counted as errors
- Errors prevent the collection state from being updated
- This can cause the same files to be reprocessed repeatedly ("stuck" collecting the same data)
Expected behavior
NODATA records should be silently skipped (not treated as errors) since they:
- Are a normal part of VPC flow logs
- Don't contain traffic data by design
- Should not block collection progress
Proposed Solution
In EnrichRow, check for log_status == 'NODATA' before processing timestamps and skip these records without treating them as errors.
Optionally, count skipped NODATA records separately so users have visibility into how many were skipped.
References
- AWS VPC Flow Log documentation: https://docs.aws.amazon.com/vpc/latest/userguide/flow-log-records.html#flow-logs-fields
log-status: "The logging status of the flow log: OK, NODATA, or SKIPDATA"- "NODATA – There was no network traffic to or from the network interface during the aggregation interval."
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels