Skip to content

combine db querries for get command#9514

Open
nadavMiz wants to merge 1 commit intonoobaa:masterfrom
nadavMiz:get_db_querries
Open

combine db querries for get command#9514
nadavMiz wants to merge 1 commit intonoobaa:masterfrom
nadavMiz:get_db_querries

Conversation

@nadavMiz
Copy link
Copy Markdown
Contributor

@nadavMiz nadavMiz commented Mar 25, 2026

Describe the Problem

  1. during get command we made seperate querries to read parts, chunk, blocks. this casues a redundent querries to the database. that mightt takes a lot of time
  2. there was a redundent call to read_object_md inside read_object_mapping even though the data to call read_object_md always came from the object_md itself
  3. added short path for small objects. used similer method to namespace_s3 short path, for loading the data as part of read_object_md. moved check for can_use_get_inline. so we will only get the data for get commnads. as this is the only place where we need it

Explain the Changes

  1. combine the db querries to parts, chunk, blocks into one querry
  2. move the relevant information from object_md to read_object_mapping and don't return it as the data is not used

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • Breaking Changes

    • The object-mapping read API now requires a numeric size parameter in requests.
    • Mapping responses no longer include object metadata; responses return chunk information only.
  • Tests

    • Integration and unit tests updated to include the new size field in mapping requests and validations.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a required numeric size param to read_object_mapping requests and removes object_md from responses. Server accepts a minimal { _id, size }, introduces a Postgres single-query parts↔chunks↔blocks path, and client/cache/tests were updated to include size.

Changes

Cohort / File(s) Summary
API Schema
src/api/object_api.js
read_object_mapping request params now requires size (integer); reply no longer includes object_md, chunks is sole required reply property.
Clients / SDK
src/sdk/map_client.js, src/sdk/namespace_cache.js, src/sdk/namespace_nb.js, src/sdk/namespace_s3.js
RPC payloads and cache calls now include size; NamespaceNB.read_object_md became async and may perform inline read when can_use_get_inline true; NamespaceS3.read_object_md uses caller-provided can_use_get_inline.
Server: map reader
src/server/object_services/map_reader.js
Added read_parts_mapping_postgres() and switched Postgres branch to use combined-query helper; read_object_mapping now accepts { _id, size } and retains rebuild-on-read flow.
Server: metadata store
src/server/object_services/md_store.js
Added MDStore.find_parts_chunks_blocks_by_range(...) performing a single JOIN across parts↔chunks↔blocks, decoding/deduping/grouping frags and optional per-frag sorting; increased ESLint max-lines.
Server: RPC handler
src/server/object_services/object_server.js
read_object_mapping now derives minimal object { _id, size } from params, validates size, authorizes using provided key, updates read stats by obj_id, and returns only chunks.
Endpoint behavior
src/endpoint/s3/ops/s3_get_object.js
Sets md_params.can_use_get_inline = true for GETs without part_number and non-folder keys.
Tests / Utilities
src/test/.../test_s3_bucket_policy.js, src/test/integration_tests/internal/test_map_builder.js, src/test/utils/object_api_functions.js
Tests and helpers updated to pass size into read_object_mapping calls and related helpers.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client
participant RPC as RPC Server
participant MapSrv as Map Reader
participant MDStore
participant DB as Postgres DB

Client->>RPC: object.read_object_mapping(params including size)
RPC->>MapSrv: read_object_mapping({ _id, size, start, end, location_info })
alt DB_TYPE == 'postgres'
    MapSrv->>MDStore: find_parts_chunks_blocks_by_range(obj_id, range, sorter)
    MDStore->>DB: single JOIN query (parts ↔ chunks ↔ blocks)
    DB-->>MDStore: rows
    MDStore-->>MapSrv: { parts, chunks_db }
    MapSrv->>MapSrv: prepare_chunks({ chunks_db })
else non-postgres
    MapSrv->>MDStore: find_parts_by_start_range(...) 
    MapSrv->>MDStore: read_parts_mapping(parts)
end
MapSrv-->>RPC: { chunks }
RPC-->>Client: reply (chunks only)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • dannyzaken
  • nimrod-becker
  • romayalon
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'combine db querries for get command' directly describes the main objective of the PR—combining database queries for the get command—which aligns with the primary change across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nadavMiz nadavMiz marked this pull request as ready for review March 25, 2026 10:25
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/server/object_services/object_server.js`:
- Around line 694-698: Validate req.rpc_params.size before building the object
descriptor: in the block that calls load_bucket(req) and constructs obj (the
line creating obj = { _id: obj_id, size }), ensure size is a finite,
non-negative number (e.g. typeof size === 'number' && isFinite(size) && size >=
0); if it is missing/invalid, fail fast by returning an RPC error/logging and do
not construct/pass an object with undefined/NaN size to sanitize_object_range,
or explicitly coerce to a safe default only after clearly documenting that
behavior. This touches the code around get_obj_id, load_bucket, and the obj
construction where sanitize_object_range later uses obj.size.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 86025c67-302d-49b1-9b53-d704b4a762fb

📥 Commits

Reviewing files that changed from the base of the PR and between 91c9962 and bb8c7d1.

📒 Files selected for processing (9)
  • src/api/object_api.js
  • src/sdk/map_client.js
  • src/sdk/namespace_cache.js
  • src/server/object_services/map_reader.js
  • src/server/object_services/md_store.js
  • src/server/object_services/object_server.js
  • src/test/integration_tests/api/s3/test_s3_bucket_policy.js
  • src/test/integration_tests/internal/test_map_builder.js
  • src/test/utils/object_api_functions.js

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/server/object_services/map_reader.js`:
- Around line 26-38: The code reuses joined chunk payloads from
MDStore.instance().find_parts_chunks_blocks_by_range which returns one row per
(part, block), causing duplicated entries in each chunk's frag.blocks when
multiple parts reference the same chunk; update the construction of ChunkDB
instances (the chunks creation step that maps parts to new ChunkDB) to group
rows by chunk id and deduplicate the frag.blocks for each chunk before calling
map_server.prepare_chunks; specifically, aggregate parts/blocks from parts and
chunks_db into a single ChunkDB per unique chunk id (used in the chunks array)
ensuring frag.blocks contains only unique block identifiers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3d55dcb1-9f0b-4c8b-bb5f-b0e2fa4c8f74

📥 Commits

Reviewing files that changed from the base of the PR and between bb8c7d1 and 21db7c2.

📒 Files selected for processing (9)
  • src/api/object_api.js
  • src/sdk/map_client.js
  • src/sdk/namespace_cache.js
  • src/server/object_services/map_reader.js
  • src/server/object_services/md_store.js
  • src/server/object_services/object_server.js
  • src/test/integration_tests/api/s3/test_s3_bucket_policy.js
  • src/test/integration_tests/internal/test_map_builder.js
  • src/test/utils/object_api_functions.js
✅ Files skipped from review due to trivial changes (4)
  • src/test/integration_tests/internal/test_map_builder.js
  • src/sdk/map_client.js
  • src/sdk/namespace_cache.js
  • src/test/integration_tests/api/s3/test_s3_bucket_policy.js
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/test/utils/object_api_functions.js
  • src/api/object_api.js
  • src/server/object_services/md_store.js
  • src/server/object_services/object_server.js

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/sdk/map_client.js (1)

389-397: Fail fast when object_md.size is missing.

read_object_mapping() now depends on size, but MapClient still accepts Partial<nb.ObjectInfo> and the read-side construction in src/sdk/object_io.js:700-710 forwards params.object_md as-is. A local guard here will turn an RPC schema failure into an immediate, easier-to-debug error.

Suggested guard
     async read_object_mapping() {
+        if (typeof this.object_md?.size !== 'number') {
+            throw new Error('MapClient.read_object_mapping requires object_md.size');
+        }
         const res = await this.rpc_client.object.read_object_mapping({
             obj_id: this.object_md.obj_id,
             bucket: this.object_md.bucket,
             key: this.object_md.key,
             size: this.object_md.size,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sdk/map_client.js` around lines 389 - 397, Add a fast-fail guard in
MapClient before calling rpc_client.object.read_object_mapping: verify that
this.object_md and this.object_md.size are present (non-null/undefined) and
throw a clear, synchronous Error if size is missing so the RPC isn't invoked
with an invalid schema; update the call site that uses read_start, read_end,
location_info and read_object_mapping to rely on this check (identify symbols:
MapClient, this.object_md, read_object_mapping, read_start, read_end,
location_info) so callers that pass Partial<nb.ObjectInfo> get an immediate,
debuggable failure instead of an RPC schema error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/server/object_services/md_store.js`:
- Around line 1922-1934: The current SQL joins blocks directly and causes
duplicate part/chunk rows; refactor the query inside the method that builds this
SQL (using this._parts.name, this._chunks.name, this._blocks.name and the
p.data->>'chunk' / c.data->>'_id' join) so blocks are aggregated per chunk first
(e.g. a subquery on ${this._blocks.name} grouping by chunk id and using
jsonb_agg(blocks.data) / jsonb_agg(...) AS blocks) and then join that
aggregated-blocks subquery to chunks and parts; preserve all existing WHERE
filters and the ORDER BY on (p.data->>'start')::bigint ASC and ensure the joined
chunk payload uses the aggregated blocks field instead of producing one row per
block.

---

Nitpick comments:
In `@src/sdk/map_client.js`:
- Around line 389-397: Add a fast-fail guard in MapClient before calling
rpc_client.object.read_object_mapping: verify that this.object_md and
this.object_md.size are present (non-null/undefined) and throw a clear,
synchronous Error if size is missing so the RPC isn't invoked with an invalid
schema; update the call site that uses read_start, read_end, location_info and
read_object_mapping to rely on this check (identify symbols: MapClient,
this.object_md, read_object_mapping, read_start, read_end, location_info) so
callers that pass Partial<nb.ObjectInfo> get an immediate, debuggable failure
instead of an RPC schema error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ee451efa-bbe9-4542-8b76-7dd83a32a155

📥 Commits

Reviewing files that changed from the base of the PR and between 21db7c2 and 6fdc255.

📒 Files selected for processing (9)
  • src/api/object_api.js
  • src/sdk/map_client.js
  • src/sdk/namespace_cache.js
  • src/server/object_services/map_reader.js
  • src/server/object_services/md_store.js
  • src/server/object_services/object_server.js
  • src/test/integration_tests/api/s3/test_s3_bucket_policy.js
  • src/test/integration_tests/internal/test_map_builder.js
  • src/test/utils/object_api_functions.js
✅ Files skipped from review due to trivial changes (1)
  • src/sdk/namespace_cache.js
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/test/integration_tests/internal/test_map_builder.js
  • src/test/utils/object_api_functions.js
  • src/server/object_services/object_server.js

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/server/object_services/map_reader.js (1)

33-37: ⚠️ Potential issue | 🟠 Major

Duplicate frag.blocks issue remains unaddressed.

The combined query returns one row per (part, block) pair. When multiple selected parts reference the same chunk, frag.blocks may contain duplicates. The existing read_parts_mapping path (line 131) loads blocks once per unique chunk, avoiding this issue.

Consider deduplicating frag.blocks for each chunk before the keyBy:

Suggested fix
     if (parts.length === 0) return [];
+    for (const chunk_db of chunks_db) {
+        for (const frag of chunk_db.frags || []) {
+            frag.blocks = _.uniqBy(frag.blocks || [], block => block._id.toHexString());
+        }
+    }
     const chunks_db_by_id = _.keyBy(chunks_db, '_id');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/object_services/map_reader.js` around lines 33 - 37, The fragments
returned by the combined query can contain duplicate blocks so before building
chunks_db_by_id and mapping parts into ChunkDB you should deduplicate
frag.blocks per chunk id: iterate the chunks_db array (the variable used to
build chunks_db_by_id), for each entry (refer to frag.blocks / blocks property)
remove duplicate block entries (e.g., by block id or unique key) and replace
with the de-duplicated blocks array, then build chunks_db_by_id and continue
creating ChunkDB instances (the code creating chunks via parts.map -> new
ChunkDB(...)). This mirrors the deduplication behavior in read_parts_mapping and
prevents duplicate frag.blocks in the resulting ChunkDB objects.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/server/object_services/map_reader.js`:
- Around line 33-37: The fragments returned by the combined query can contain
duplicate blocks so before building chunks_db_by_id and mapping parts into
ChunkDB you should deduplicate frag.blocks per chunk id: iterate the chunks_db
array (the variable used to build chunks_db_by_id), for each entry (refer to
frag.blocks / blocks property) remove duplicate block entries (e.g., by block id
or unique key) and replace with the de-duplicated blocks array, then build
chunks_db_by_id and continue creating ChunkDB instances (the code creating
chunks via parts.map -> new ChunkDB(...)). This mirrors the deduplication
behavior in read_parts_mapping and prevents duplicate frag.blocks in the
resulting ChunkDB objects.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f82ef749-3c3e-48a8-a738-f1020d8f34b7

📥 Commits

Reviewing files that changed from the base of the PR and between 6fdc255 and 1eec208.

📒 Files selected for processing (9)
  • src/api/object_api.js
  • src/sdk/map_client.js
  • src/sdk/namespace_cache.js
  • src/server/object_services/map_reader.js
  • src/server/object_services/md_store.js
  • src/server/object_services/object_server.js
  • src/test/integration_tests/api/s3/test_s3_bucket_policy.js
  • src/test/integration_tests/internal/test_map_builder.js
  • src/test/utils/object_api_functions.js
✅ Files skipped from review due to trivial changes (1)
  • src/sdk/namespace_cache.js
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/test/utils/object_api_functions.js
  • src/sdk/map_client.js
  • src/server/object_services/object_server.js
  • src/api/object_api.js
  • src/test/integration_tests/internal/test_map_builder.js
  • src/server/object_services/md_store.js

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/server/object_services/md_store.js (1)

1922-1927: ⚠️ Potential issue | 🟠 Major

Aggregate blocks before joining them back to parts.

This still expands to one row per (part, block), so Postgres ships and decode_json() re-parses the same part_data/chunk_data once per block. On top of that, joining chunks via c.data->>'_id' gives up the direct _id lookup the old find_chunks_by_ids() path had. On replicated/EC chunks that can wipe out the latency win this PR is targeting. Please aggregate blocks per chunk first and join that grouped result back on the chunk table’s _id column.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/object_services/md_store.js` around lines 1922 - 1927, Aggregate
blocks by chunk before joining: replace the direct LEFT JOIN to
${this._blocks.name} with a subquery that groups blocks by their chunk id (e.g.,
SELECT b.data->>'chunk' AS chunk_id, jsonb_agg(b.data) AS blocks FROM
${this._blocks.name} b WHERE (b.data->'deleted' IS NULL OR b.data->'deleted' =
'null'::jsonb) GROUP BY chunk_id) and then LEFT JOIN that grouped result back to
${this._chunks.name} using the chunk's _id column (match the grouped.chunk_id to
the chunk `_id` value) so the main SELECT returns part_data, chunk_data and the
aggregated blocks per chunk (e.g., blocks) instead of one row per (part,block);
update the SELECT list and any downstream decode_json()/parsing to expect
aggregated blocks.
🧹 Nitpick comments (1)
src/server/object_services/map_reader.js (1)

25-39: Extract the shared chunk-assembly step.

read_parts_mapping_postgres() now duplicates the chunks_db_by_idnew ChunkDB()prepare_chunks() flow from read_parts_mapping(). A small helper that consumes { parts, chunks_db } would keep the Mongo and Postgres paths aligned the next time this mapping logic changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/object_services/map_reader.js` around lines 25 - 39, The Postgres
path in read_parts_mapping_postgres duplicates the chunk-assembly flow from
read_parts_mapping; extract that shared logic into a new helper (e.g.,
assemble_chunks_from_parts or buildChunksFromParts) that takes an object with
{parts, chunks_db}, constructs ChunkDB instances (using chunk.toHexString()
lookups into chunks_db_by_id and attaching parts), calls
map_server.prepare_chunks({chunks}), and returns the prepared chunks; then
replace the duplicated block in read_parts_mapping_postgres (and update
read_parts_mapping to use the same helper) so both functions call this single
helper instead of repeating the chunks_db_by_id → new ChunkDB(...) →
prepare_chunks flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/sdk/namespace_nb.js`:
- Around line 88-97: The inline-read path uses this.target_bucket instead of the
resolved bucket present in rpc_params, causing reads to use an undefined bucket
when NamespaceNB was called without a fixed bucket; update the call to
object_sdk.object_io.read_entire_object(...) to pass the resolved bucket from
rpc_params (rpc_params.bucket) instead of this.target_bucket so the inline GET
uses the same effective bucket returned/used by
object_sdk.rpc_client.object.read_object_md; keep all other args (client:
object_sdk.rpc_client, object_md: info) unchanged.

In `@src/server/object_services/object_server.js`:
- Around line 696-705: The code currently trusts caller-supplied obj_id/size
before permission and mapping, allowing ID/key mismatch attacks; after
load_bucket and get_obj_id but before calling req.has_s3_bucket_permission or
map_reader/read_object_mapping, look up the object row by its identifier with
bucket and key predicates (e.g., query where _id = obj_id AND bucket =
req.bucket AND key = '/' + key or equivalent), rebind obj_id and replace size
with the stored size from that row (or return NOT_FOUND if no matching row),
then perform the authorization check (req.has_s3_bucket_permission) and only
call map_reader/read_object_mapping using the verified object row and stored
size.

---

Duplicate comments:
In `@src/server/object_services/md_store.js`:
- Around line 1922-1927: Aggregate blocks by chunk before joining: replace the
direct LEFT JOIN to ${this._blocks.name} with a subquery that groups blocks by
their chunk id (e.g., SELECT b.data->>'chunk' AS chunk_id, jsonb_agg(b.data) AS
blocks FROM ${this._blocks.name} b WHERE (b.data->'deleted' IS NULL OR
b.data->'deleted' = 'null'::jsonb) GROUP BY chunk_id) and then LEFT JOIN that
grouped result back to ${this._chunks.name} using the chunk's _id column (match
the grouped.chunk_id to the chunk `_id` value) so the main SELECT returns
part_data, chunk_data and the aggregated blocks per chunk (e.g., blocks) instead
of one row per (part,block); update the SELECT list and any downstream
decode_json()/parsing to expect aggregated blocks.

---

Nitpick comments:
In `@src/server/object_services/map_reader.js`:
- Around line 25-39: The Postgres path in read_parts_mapping_postgres duplicates
the chunk-assembly flow from read_parts_mapping; extract that shared logic into
a new helper (e.g., assemble_chunks_from_parts or buildChunksFromParts) that
takes an object with {parts, chunks_db}, constructs ChunkDB instances (using
chunk.toHexString() lookups into chunks_db_by_id and attaching parts), calls
map_server.prepare_chunks({chunks}), and returns the prepared chunks; then
replace the duplicated block in read_parts_mapping_postgres (and update
read_parts_mapping to use the same helper) so both functions call this single
helper instead of repeating the chunks_db_by_id → new ChunkDB(...) →
prepare_chunks flow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5a9f2a8b-3d5b-44a1-812f-677e7aeeda97

📥 Commits

Reviewing files that changed from the base of the PR and between 1eec208 and 151df05.

📒 Files selected for processing (12)
  • src/api/object_api.js
  • src/endpoint/s3/ops/s3_get_object.js
  • src/sdk/map_client.js
  • src/sdk/namespace_cache.js
  • src/sdk/namespace_nb.js
  • src/sdk/namespace_s3.js
  • src/server/object_services/map_reader.js
  • src/server/object_services/md_store.js
  • src/server/object_services/object_server.js
  • src/test/integration_tests/api/s3/test_s3_bucket_policy.js
  • src/test/integration_tests/internal/test_map_builder.js
  • src/test/utils/object_api_functions.js
✅ Files skipped from review due to trivial changes (1)
  • src/sdk/map_client.js
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/test/utils/object_api_functions.js
  • src/sdk/namespace_cache.js
  • src/test/integration_tests/internal/test_map_builder.js

Comment on lines 696 to 705
load_bucket(req);
const obj_id = get_obj_id(req, 'BAD_OBJECT_ID');
if (typeof size !== 'number' || size < 0) {
throw new RpcError('BAD_REQUEST', 'size must be a positive number');
}
const obj = { _id: obj_id, size };

// Check if the requesting account is authorized to read the object
if (!await req.has_s3_bucket_permission(req.bucket, 's3:GetObject', '/' + obj.key, undefined)) {
if (!await req.has_s3_bucket_permission(req.bucket, 's3:GetObject', '/' + key, undefined)) {
throw new RpcError('UNAUTHORIZED', 'requesting account is not authorized to read the object');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don’t authorize read_object_mapping against a caller-supplied key alone.

Lines 697-705 no longer verify that obj_id actually belongs to the requested bucket/key/system before the permission check. Since obj_id and size are already exposed by metadata/list responses, a caller can pair an allowed key with another object’s obj_id and fetch that other object’s mapping. Please re-bind the ID to the real object row (or enforce bucket/key predicates in the mapping query) and use its stored size before calling map_reader.

🛡️ Proposed fix
     load_bucket(req);
     const obj_id = get_obj_id(req, 'BAD_OBJECT_ID');
     if (typeof size !== 'number' || size < 0) {
-        throw new RpcError('BAD_REQUEST', 'size must be a positive number');
+        throw new RpcError('BAD_REQUEST', 'size must be a non-negative number');
     }
-    const obj = { _id: obj_id, size };
+    const obj = await MDStore.instance().find_object_by_id(obj_id);
+    check_object_mode(req, obj, 'NO_SUCH_OBJECT');

-    if (!await req.has_s3_bucket_permission(req.bucket, 's3:GetObject', '/' + key, undefined)) {
+    if (!await req.has_s3_bucket_permission(req.bucket, 's3:GetObject', '/' + obj.key, undefined)) {
         throw new RpcError('UNAUTHORIZED', 'requesting account is not authorized to read the object');
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/object_services/object_server.js` around lines 696 - 705, The code
currently trusts caller-supplied obj_id/size before permission and mapping,
allowing ID/key mismatch attacks; after load_bucket and get_obj_id but before
calling req.has_s3_bucket_permission or map_reader/read_object_mapping, look up
the object row by its identifier with bucket and key predicates (e.g., query
where _id = obj_id AND bucket = req.bucket AND key = '/' + key or equivalent),
rebind obj_id and replace size with the stored size from that row (or return
NOT_FOUND if no matching row), then perform the authorization check
(req.has_s3_bucket_permission) and only call map_reader/read_object_mapping
using the verified object row and stored size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant