Skip to content

Commit 7fb109e

Browse files
github-actions[bot]FAQ Botalexeygrigorev
authored
[FAQ Bot] NEW: Why are there unexpected years in lpep_pickup_datetime after loading tax (#194)
* NEW: Why are there unexpected years in lpep_pickup_datetime after loading tax * Fix filename slug and sort order for BigQuery timestamp FAQ --------- Co-authored-by: FAQ Bot <faq-bot@datatalks.club> Co-authored-by: Alexey Grigorev <alexey.s.grigoriev@gmail.com>
1 parent a9fd8e6 commit 7fb109e

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
id: 52e74f0053
3+
question: Why are there unexpected years in lpep_pickup_datetime after loading taxi
4+
data into BigQuery?
5+
sort_order: 49
6+
---
7+
8+
Unexpected years in lpep_pickup_datetime after loading taxi data into BigQuery usually indicate a corrupted or incorrect load. Common causes include:
9+
10+
- CSV schema autodetect misinterpreting timestamp format
11+
- Mixing Parquet and CSV loads into the same table
12+
- Appending instead of replacing during reload
13+
- Partial failed loads
14+
15+
How to verify:
16+
17+
- Run a range check:
18+
19+
```sql
20+
SELECT
21+
MIN(lpep_pickup_datetime),
22+
MAX(lpep_pickup_datetime)
23+
FROM `project.dataset.table`;
24+
```
25+
26+
If the values fall outside the expected years (for example 2019–2020), reload the table from a clean source using a replacement load, preferably Parquet:
27+
28+
```bash
29+
bq load \
30+
--source_format=PARQUET \
31+
--replace \
32+
dataset.table \
33+
gs://bucket/path/*.parquet
34+
```
35+
36+
Note: Using Parquet instead of CSV often helps prevent schema interpretation issues and ensures a cleaner, replace-based reload.

0 commit comments

Comments
 (0)