feat: Handle memory errors in batch_process_dataset#1602
Open
feat: Handle memory errors in batch_process_dataset#1602
Conversation
…dataset-prod-function
…dataset-prod-function
…dataset-prod-function
davidgamez
reviewed
Feb 18, 2026
…dataset-prod-function
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
closes #1538
Added some memory management functionalities to limit the available memory of the process. The result is that any out-of-memory error will happen earlier, with some memory left (a security margin currently set at 200MB by default) so the exception can be handled properly and the http call to the function can return a 200 code. This (in theory) prevents automatic retries.
Also log messages will still be printed since there is memory left, which allows printing the stable_id that caused the error.
Also modified the way gtfs datasets are unzipped. Each .txt file within the dataset is unzipped separately on the in-memory file system, then uploaded to GCP storage then immediately deleted locally. This reduces the number of out of disk space errors and (apparently) does not make the process significantly slower.
From Copilot:
This pull request introduces significant improvements to memory management and disk usage in the dataset batch processing function. The main change is the introduction of a memory limiting utility and a new approach for extracting and uploading files from ZIP archives, which minimizes local disk usage and helps prevent out-of-memory errors. Several method names and flows have been updated to reflect these improvements. Additionally, error handling and environment variable parsing have been enhanced for robustness.
Memory Management Enhancements:
shared/common/gcp_memory_utils.pywith functions to calculate available process memory and set memory limits using cgroups and tmpfs information, and integratedlimit_gcp_memory()at startup to restrict process memory usage. [1] [2]Efficient ZIP Extraction and Upload:
extract_and_upload_files_from_zip, which extracts and uploads files one at a time, immediately deleting temporary files to minimize disk usage. This change is reflected in both dataset upload and bucket processing flows, and the oldunzip_filesmethod was removed. [1] [2] [3] [4]Robustness and Error Handling:
MAXIMUM_EXECUTIONS). [1] [2]Database Integration Updates:
Gtfsdatasetand its relationship withgtfsfiles. [1] [2]These changes collectively make the batch processing function more reliable, efficient, and scalable, especially in environments with constrained memory and disk resources.
Expected behavior:
Testing tips:
For the memory limitation change, increased the in-memory disk space to 7 GB (out of 8 GB for the whole process). This left 1GB of memory for running the code, of which 200MB were kept as a security margin. Testing with mdb-2014, we now get these errors:
For the separate zip upload improvement, used mdb-2014. WIth the original code and 6 GB of in-memory disk space, it would originally have an out of disk space exception. With the changes the files were extracted properly.
Please make sure these boxes are checked before submitting your pull request - thanks!
./scripts/api-tests.shto make sure you didn't break anything