Skip to content

feat: streaming database exports to R2 for large databases#238

Open
kartikganesh wants to merge 2 commits into
outerbase:mainfrom
kartikganesh:feat/streaming-export
Open

feat: streaming database exports to R2 for large databases#238
kartikganesh wants to merge 2 commits into
outerbase:mainfrom
kartikganesh:feat/streaming-export

Conversation

@kartikganesh
Copy link
Copy Markdown

Summary

Fixes #59. Large database exports now stream chunks to R2 using multipart upload with DO alarm-based continuation for exports exceeding 25s.

/claim #59

Changes

  • src/export/dump-streaming.ts (new): Core streaming export with R2 multipart upload, chunked processing (1000 rows/batch), DO alarm continuation, progress tracking in tmp_export_state
  • src/export/dump.ts: Updated to delegate to streaming export when R2 is available, fallback to in-memory when not
  • src/do.ts: Added alarm handler integration for export continuation
  • src/handler.ts: Added /export/status/:exportId and /export/download/:exportId endpoints
  • src/index.ts: Added EXPORT_BUCKET R2 binding to Env interface
  • wrangler.toml: Added commented-out R2 bucket config with setup instructions

How it works

  1. If R2 is configured and export will be large, creates a multipart upload to R2
  2. Processes tables in 1000-row batches, flushing to R2 parts at 5MB
  3. If approaching 25s timeout, saves state to DO storage and sets alarm for continuation
  4. On alarm, picks up where it left off (same table, same row offset)
  5. On completion, finalizes multipart upload and optionally POSTs callback URL
  6. If export is small enough (<30s), returns file directly in response (backwards compatible)

Test plan

  • Verify existing /export/dump works unchanged when no R2 binding
  • Configure R2, test small DB export returns inline
  • Test large DB export creates R2 file and returns 202 with exportId
  • Verify /export/status/:id shows progress
  • Verify /export/download/:id streams completed file
  • Test callback URL is called on completion

Fixes outerbase#59. Large database exports now stream chunks to R2 using
multipart upload, with DO alarm-based continuation when exports
exceed 25s. Adds /export/status/:id and /export/download/:id
endpoints. Falls back to in-memory export when R2 is not configured.

/claim outerbase#59

Co-Authored-By: Claude Opus 4.6 <[email protected]>
10 tests covering startStreamingDump, getExportStatus, and
downloadExport with mocked R2 bucket and data source.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@kartikganesh
Copy link
Copy Markdown
Author

/claim

@kartikganesh
Copy link
Copy Markdown
Author

Hey team — wanted to highlight how this PR maps directly to the 5-point proposed solution in the issue:

Issue requirement Implementation
1. Require an R2 binding EXPORT_BUCKET binding, optional — graceful fallback to existing in-memory export when not configured
2. File naming dump_YYYYMMDD-HHMMSS.sql Exact format implemented in generateR2Key()
3. Continuously append chunks to R2 R2 multipart upload — flushes 5MB parts as it processes rows
4. DO alarm to continue after timeout Saves state (current table, row offset, part number) and sets alarm at 25s; resumes exactly where it left off
5. Callback URL on completion Optional callbackUrl query param, POSTs {exportId, status, r2Key} on finish

Additional design decisions:

  • Backwards compatible — small exports still return the file inline in the response, no behavior change for existing users
  • Observable/export/status/:id and /export/download/:id endpoints for async exports
  • Tested — 10 unit tests covering happy path, empty DB, errors, multipart mechanics, status/download endpoints

The other open PRs either miss the R2 multipart requirement (using single PUT which fails >5GB), skip the DO alarm continuation, or don't handle the backwards-compatible inline response for small exports.

Happy to address any feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Database dumps do not work on large databases

1 participant