Commit 0849ee3
committed
Update TIMDEXDataset.write method to only overwrite similarly named parquet files
Why these changes are being introduced:
* Since the TIMDEXDataset partitions are now the [year, month, day]
of the 'run_date', parquet files from different source runs
will be written to the same partition. The previous configuration
of existing_data_behavior="delete_matching" would result in
the deletion of any existing parquet files from the partition directory
with every source run, which is not the desired outcome.
To support the new partitions, this updates the configuration
existing_data_behavior="overwrite_or_ignore" which will
ignore any existing data and will only overwrite files with the
same filename.
How this addresses that need:
* Set existing_data_behavior="overwrite_or_ignore" in ds.write_dataset method call
* Add unit tests to demonstrate updated existing_data_behavior
Side effects of this change:
* In the event the multiple runs are performed for the same 'source' and 'run-date',
which is unlikely to occur, parquet files from both runs will exist in the
partitioned directory. DatasetRecords are can still be uniquely identified via the
'run_id' column.
Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-4321 parent 76347b1 commit 0849ee3
2 files changed
+59
-28
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
148 | 149 | | |
149 | 150 | | |
150 | 151 | | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
158 | 157 | | |
159 | | - | |
160 | | - | |
161 | 158 | | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
166 | 163 | | |
167 | | - | |
168 | | - | |
169 | | - | |
| 164 | + | |
| 165 | + | |
170 | 166 | | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
171 | 173 | | |
172 | 174 | | |
173 | | - | |
174 | | - | |
| 175 | + | |
| 176 | + | |
175 | 177 | | |
176 | 178 | | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
177 | 177 | | |
178 | 178 | | |
179 | 179 | | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
184 | 185 | | |
185 | 186 | | |
186 | 187 | | |
| |||
209 | 210 | | |
210 | 211 | | |
211 | 212 | | |
212 | | - | |
| 213 | + | |
213 | 214 | | |
214 | 215 | | |
215 | 216 | | |
| |||
0 commit comments