Skip to content

Commit c6ccd58

Browse files
committed
Additional current_records view docstrings
1 parent 140a8d5 commit c6ccd58

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

timdex_dataset_api/metadata.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -443,6 +443,14 @@ def _create_current_records_view(self, conn: DuckDBPyConnection) -> None:
443443
This metadata view includes only the most current version of each record in the
444444
dataset. With the metadata provided from this view, we can streamline data
445445
retrievals in TIMDEXDataset read methods.
446+
447+
For performance reasons, the final view reads from a DuckDB temporary table that
448+
is constructed, "temp.main.current_records". Because our connection is in memory,
449+
the data in this temporary table is mostly in memory but has the ability to spill
450+
to disk if we risk getting too close to our memory constraints. We explicitly
451+
set the temporary location on disk for DuckDB at "/tmp" to play nice with contexts
452+
like AWS ECS or Lambda, where sometimes the $HOME env var is missing; DuckDB
453+
often tries to utilize the user's home directory and this works around that.
446454
"""
447455
logger.info("creating view of current records metadata")
448456

0 commit comments

Comments
 (0)