optimized load_blocks_for_chunks and find_chunks_by_dedup_key queries by dannyzaken · Pull Request #9526 · noobaa/noobaa-core

dannyzaken · 2026-04-02T15:42:39Z

Describe the Problem

Two functions in md_store.js (load_blocks_for_chunks and find_chunks_by_dedup_key) that query the datablocks table (~3M rows) suffered from PostgreSQL choosing sequential scans when the number of chunk IDs in the IN clause exceeded ~10 values. The default random_page_cost = 4.0 caused the planner to prefer seq scan over the existing idx_btree_datablocks_chunk B-tree index. load_blocks_for_chunks took ~2600ms for 38 chunks, and find_chunks_by_dedup_key added an extra DB round-trip by calling load_blocks_for_chunks separately after fetching chunks.

Explain the Changes

load_blocks_for_chunks: Replaced the mongo-to-pg translation layer (this._blocks.find(...)) with a raw SQL query using UNNEST + LATERAL + OFFSET 0. The OFFSET 0 acts as an optimization barrier that forces PostgreSQL into a nested loop with per-chunk Bitmap Index Scan on idx_btree_datablocks_chunk, reducing execution from ~2600ms to ~2ms for 38 chunks (~1000x faster).
find_chunks_by_dedup_key: Combined the two-step approach (find chunks by dedup_key, then call load_blocks_for_chunks) into a single INNER JOIN query between datachunks and datablocks. The dedup_key index filters chunks to a small set, so the planner reliably picks Nested Loop + index scans on both tables. The denormalized result set is deduplicated in JS using a Map keyed by chunk _id. This eliminates one DB round-trip entirely.
Bumped the eslint max-lines directive from 2200 to 2210 to accommodate the additional lines from the optimized implementations.

Issues: Fixed #xxx / Gap #xxx

https://redhat.atlassian.net/browse/DFBUGS-5326

Testing Instructions:

Run existing integration tests: node src/test/integration_tests/db/test_md_store.js — covers find_chunks_by_dedup_key with matching keys, non-matching keys, and empty arrays.
Verify load_blocks_for_chunks via any flow that reads object data (e.g. map_reader, map_builder, map_deleter).
On a production-scale DB (~3M datablocks rows), run EXPLAIN ANALYZE on both queries to confirm index scans are used instead of seq scans.

Doc added/updated
Tests added

Summary by CodeRabbit

Refactor
- Data retrieval now returns chunk fragments populated with their associated blocks in one step, improving consistency and reducing extra lookups; optional ordering is applied when provided.
Bug Fixes
- Soft-deleted chunks and deleted blocks are now excluded from results.
Tests
- Integration tests extended to verify fragments include correct blocks, grouping, ordering, and deletion handling.

coderabbitai · 2026-04-02T15:42:53Z

📝 Walkthrough

Walkthrough

Replaced separate chunk and block lookups with SQL joins and in-function assembly: find_chunks_by_dedup_key now selects joined datachunks+datablocks, decodes rows, groups blocks by chunk→frag and assigns frag.blocks; load_blocks_for_chunks uses unnest($1::text[]) + LATERAL to fetch and attach blocks. Tests extended for Postgres. (≤50 words)

Changes

Cohort / File(s)	Summary
MD store DB/query changes `src/server/object_services/md_store.js`	Rewrote `find_chunks_by_dedup_key()` to use a joined SQL query returning chunk+block JSON, decode rows in-function, dedupe chunks, group blocks by `chunk`/`frag` and set `frag.blocks`. Replaced prior Mongo-style block lookup in `load_blocks_for_chunks()` with a raw SQL `unnest($1::text[]) + LATERAL` query; grouping and optional sorting applied. ESLint `max-lines` increased from 2200 to 2210.
Integration tests (Postgres-focused) `src/test/integration_tests/db/test_md_store.js`	Expanded tests for `find_chunks_by_dedup_key` and `load_blocks_for_chunks`: multiple chunks/frags/blocks insertion, assertions that each `frag.blocks` is populated and correctly sized, exclusions for soft-deleted chunks and deleted blocks, sorter behavior, and handling of empty/null inputs.
Lint config `.eslintrc.js`	Bumped `max-lines` rule to accommodate added code (+10).

Sequence Diagram(s)

sequenceDiagram
  participant Caller as Client
  participant MD as MDStore
  participant DB as Postgres
  participant Processor as In-memory assembler

  Caller->>MD: find_chunks_by_dedup_key(bucket, dedup_keys)
  MD->>DB: Execute JOIN query (datachunks JOIN datablocks) with dedup_keys
  DB-->>MD: Rows with chunk + block JSON
  MD->>Processor: Decode rows, build unique chunk map
  Processor-->>MD: Chunks with grouped frags and blocks assigned
  MD-->>Caller: Return chunks with frags[].blocks populated

  Caller->>MD: load_blocks_for_chunks(chunkIds, sorter)
  MD->>DB: Execute unnest($1::text[]) + LATERAL query for blocks
  DB-->>MD: Block rows (non-deleted)
  MD->>Processor: Group blocks by chunk -> frag, apply sorter
  Processor-->>MD: Updated chunks with frag.blocks
  MD-->>Caller: Return updated chunks

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Generate optimized SQL query to improve performance of find_chunks_by_dedup_key method #9318 — Modifies MDStore.find_chunks_by_dedup_key and related block-loading SQL; appears to target the same SQL-based migration of chunk/block lookups.

Suggested labels

size/L

Suggested reviewers

liranmauda
tangledbytes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and accurately describes the main objective of the pull request: optimizing two query functions (load_blocks_for_chunks and find_chunks_by_dedup_key) by replacing inefficient mongo-to-postgres translations with optimized raw SQL queries.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/server/object_services/md_store.js`:
- Around line 1548-1561: The SQL INNER JOIN between this._chunks and
this._blocks omits chunks that have no blocks; change the JOIN to LEFT JOIN in
the query that selects c.data AS chunk_data, b.data AS block_data so chunks
without blocks are returned, and then update the post-processing loop that
builds frag.blocks (which uses blocks_by_frag and frag._id) to guard against
null block_data—only map/push block entries when block_data exists (skip or
treat as empty array when block_data is null) so frag.blocks =
blocks_by_frag[frag._id.toHexString()] || [] still works for chunks with no
blocks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4d20403b-b76d-4b95-8b7f-4fb3137d60ab

📥 Commits

Reviewing files that changed from the base of the PR and between efb5389 and c0f1f25.

📒 Files selected for processing (1)

src/server/object_services/md_store.js

src/server/object_services/md_store.js

- Replace mongo-to-pg translation in load_blocks_for_chunks with a raw LATERAL + OFFSET 0 query that forces per-chunk index lookups on idx_btree_datablocks_chunk, reducing execution from ~2600ms to ~2ms for 38 chunks. - Combine the two-step find_chunks_by_dedup_key into a single JOIN query that eliminates an extra DB round-trip while the planner reliably picks nested loop + index scans on both tables. Signed-off-by: Danny Zaken <[email protected]>

pull-request-size bot added the size/M label Apr 2, 2026

dannyzaken force-pushed the danny-fixes branch from efb5389 to c0f1f25 Compare April 2, 2026 15:48

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

src/server/object_services/md_store.js Show resolved Hide resolved

dannyzaken force-pushed the danny-fixes branch from c0f1f25 to 99df86e Compare April 2, 2026 17:42

dannyzaken force-pushed the danny-fixes branch from 99df86e to 9ebcc4c Compare April 2, 2026 19:43

pull-request-size bot added size/L and removed size/M labels Apr 2, 2026

dannyzaken assigned karthikps97 and nadavMiz Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimized load_blocks_for_chunks and find_chunks_by_dedup_key queries#9526

optimized load_blocks_for_chunks and find_chunks_by_dedup_key queries#9526
dannyzaken wants to merge 1 commit intonoobaa:masterfrom
dannyzaken:danny-fixes

dannyzaken commented Apr 2, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 2, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dannyzaken commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the Problem

Explain the Changes

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dannyzaken commented Apr 2, 2026 •

edited

Loading

coderabbitai bot commented Apr 2, 2026 •

edited

Loading