Conversation
662fb61 to
abfaa2a
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a required numeric Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
abfaa2a to
bb8c7d1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/server/object_services/object_server.js`:
- Around line 694-698: Validate req.rpc_params.size before building the object
descriptor: in the block that calls load_bucket(req) and constructs obj (the
line creating obj = { _id: obj_id, size }), ensure size is a finite,
non-negative number (e.g. typeof size === 'number' && isFinite(size) && size >=
0); if it is missing/invalid, fail fast by returning an RPC error/logging and do
not construct/pass an object with undefined/NaN size to sanitize_object_range,
or explicitly coerce to a safe default only after clearly documenting that
behavior. This touches the code around get_obj_id, load_bucket, and the obj
construction where sanitize_object_range later uses obj.size.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 86025c67-302d-49b1-9b53-d704b4a762fb
📒 Files selected for processing (9)
src/api/object_api.jssrc/sdk/map_client.jssrc/sdk/namespace_cache.jssrc/server/object_services/map_reader.jssrc/server/object_services/md_store.jssrc/server/object_services/object_server.jssrc/test/integration_tests/api/s3/test_s3_bucket_policy.jssrc/test/integration_tests/internal/test_map_builder.jssrc/test/utils/object_api_functions.js
bb8c7d1 to
21db7c2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/server/object_services/map_reader.js`:
- Around line 26-38: The code reuses joined chunk payloads from
MDStore.instance().find_parts_chunks_blocks_by_range which returns one row per
(part, block), causing duplicated entries in each chunk's frag.blocks when
multiple parts reference the same chunk; update the construction of ChunkDB
instances (the chunks creation step that maps parts to new ChunkDB) to group
rows by chunk id and deduplicate the frag.blocks for each chunk before calling
map_server.prepare_chunks; specifically, aggregate parts/blocks from parts and
chunks_db into a single ChunkDB per unique chunk id (used in the chunks array)
ensuring frag.blocks contains only unique block identifiers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3d55dcb1-9f0b-4c8b-bb5f-b0e2fa4c8f74
📒 Files selected for processing (9)
src/api/object_api.jssrc/sdk/map_client.jssrc/sdk/namespace_cache.jssrc/server/object_services/map_reader.jssrc/server/object_services/md_store.jssrc/server/object_services/object_server.jssrc/test/integration_tests/api/s3/test_s3_bucket_policy.jssrc/test/integration_tests/internal/test_map_builder.jssrc/test/utils/object_api_functions.js
✅ Files skipped from review due to trivial changes (4)
- src/test/integration_tests/internal/test_map_builder.js
- src/sdk/map_client.js
- src/sdk/namespace_cache.js
- src/test/integration_tests/api/s3/test_s3_bucket_policy.js
🚧 Files skipped from review as they are similar to previous changes (4)
- src/test/utils/object_api_functions.js
- src/api/object_api.js
- src/server/object_services/md_store.js
- src/server/object_services/object_server.js
21db7c2 to
6fdc255
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/sdk/map_client.js (1)
389-397: Fail fast whenobject_md.sizeis missing.
read_object_mapping()now depends onsize, butMapClientstill acceptsPartial<nb.ObjectInfo>and the read-side construction insrc/sdk/object_io.js:700-710forwardsparams.object_mdas-is. A local guard here will turn an RPC schema failure into an immediate, easier-to-debug error.Suggested guard
async read_object_mapping() { + if (typeof this.object_md?.size !== 'number') { + throw new Error('MapClient.read_object_mapping requires object_md.size'); + } const res = await this.rpc_client.object.read_object_mapping({ obj_id: this.object_md.obj_id, bucket: this.object_md.bucket, key: this.object_md.key, size: this.object_md.size,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sdk/map_client.js` around lines 389 - 397, Add a fast-fail guard in MapClient before calling rpc_client.object.read_object_mapping: verify that this.object_md and this.object_md.size are present (non-null/undefined) and throw a clear, synchronous Error if size is missing so the RPC isn't invoked with an invalid schema; update the call site that uses read_start, read_end, location_info and read_object_mapping to rely on this check (identify symbols: MapClient, this.object_md, read_object_mapping, read_start, read_end, location_info) so callers that pass Partial<nb.ObjectInfo> get an immediate, debuggable failure instead of an RPC schema error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/server/object_services/md_store.js`:
- Around line 1922-1934: The current SQL joins blocks directly and causes
duplicate part/chunk rows; refactor the query inside the method that builds this
SQL (using this._parts.name, this._chunks.name, this._blocks.name and the
p.data->>'chunk' / c.data->>'_id' join) so blocks are aggregated per chunk first
(e.g. a subquery on ${this._blocks.name} grouping by chunk id and using
jsonb_agg(blocks.data) / jsonb_agg(...) AS blocks) and then join that
aggregated-blocks subquery to chunks and parts; preserve all existing WHERE
filters and the ORDER BY on (p.data->>'start')::bigint ASC and ensure the joined
chunk payload uses the aggregated blocks field instead of producing one row per
block.
---
Nitpick comments:
In `@src/sdk/map_client.js`:
- Around line 389-397: Add a fast-fail guard in MapClient before calling
rpc_client.object.read_object_mapping: verify that this.object_md and
this.object_md.size are present (non-null/undefined) and throw a clear,
synchronous Error if size is missing so the RPC isn't invoked with an invalid
schema; update the call site that uses read_start, read_end, location_info and
read_object_mapping to rely on this check (identify symbols: MapClient,
this.object_md, read_object_mapping, read_start, read_end, location_info) so
callers that pass Partial<nb.ObjectInfo> get an immediate, debuggable failure
instead of an RPC schema error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ee451efa-bbe9-4542-8b76-7dd83a32a155
📒 Files selected for processing (9)
src/api/object_api.jssrc/sdk/map_client.jssrc/sdk/namespace_cache.jssrc/server/object_services/map_reader.jssrc/server/object_services/md_store.jssrc/server/object_services/object_server.jssrc/test/integration_tests/api/s3/test_s3_bucket_policy.jssrc/test/integration_tests/internal/test_map_builder.jssrc/test/utils/object_api_functions.js
✅ Files skipped from review due to trivial changes (1)
- src/sdk/namespace_cache.js
🚧 Files skipped from review as they are similar to previous changes (3)
- src/test/integration_tests/internal/test_map_builder.js
- src/test/utils/object_api_functions.js
- src/server/object_services/object_server.js
6fdc255 to
1eec208
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/server/object_services/map_reader.js (1)
33-37:⚠️ Potential issue | 🟠 MajorDuplicate
frag.blocksissue remains unaddressed.The combined query returns one row per
(part, block)pair. When multiple selected parts reference the same chunk,frag.blocksmay contain duplicates. The existingread_parts_mappingpath (line 131) loads blocks once per unique chunk, avoiding this issue.Consider deduplicating
frag.blocksfor each chunk before thekeyBy:Suggested fix
if (parts.length === 0) return []; + for (const chunk_db of chunks_db) { + for (const frag of chunk_db.frags || []) { + frag.blocks = _.uniqBy(frag.blocks || [], block => block._id.toHexString()); + } + } const chunks_db_by_id = _.keyBy(chunks_db, '_id');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/object_services/map_reader.js` around lines 33 - 37, The fragments returned by the combined query can contain duplicate blocks so before building chunks_db_by_id and mapping parts into ChunkDB you should deduplicate frag.blocks per chunk id: iterate the chunks_db array (the variable used to build chunks_db_by_id), for each entry (refer to frag.blocks / blocks property) remove duplicate block entries (e.g., by block id or unique key) and replace with the de-duplicated blocks array, then build chunks_db_by_id and continue creating ChunkDB instances (the code creating chunks via parts.map -> new ChunkDB(...)). This mirrors the deduplication behavior in read_parts_mapping and prevents duplicate frag.blocks in the resulting ChunkDB objects.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/server/object_services/map_reader.js`:
- Around line 33-37: The fragments returned by the combined query can contain
duplicate blocks so before building chunks_db_by_id and mapping parts into
ChunkDB you should deduplicate frag.blocks per chunk id: iterate the chunks_db
array (the variable used to build chunks_db_by_id), for each entry (refer to
frag.blocks / blocks property) remove duplicate block entries (e.g., by block id
or unique key) and replace with the de-duplicated blocks array, then build
chunks_db_by_id and continue creating ChunkDB instances (the code creating
chunks via parts.map -> new ChunkDB(...)). This mirrors the deduplication
behavior in read_parts_mapping and prevents duplicate frag.blocks in the
resulting ChunkDB objects.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f82ef749-3c3e-48a8-a738-f1020d8f34b7
📒 Files selected for processing (9)
src/api/object_api.jssrc/sdk/map_client.jssrc/sdk/namespace_cache.jssrc/server/object_services/map_reader.jssrc/server/object_services/md_store.jssrc/server/object_services/object_server.jssrc/test/integration_tests/api/s3/test_s3_bucket_policy.jssrc/test/integration_tests/internal/test_map_builder.jssrc/test/utils/object_api_functions.js
✅ Files skipped from review due to trivial changes (1)
- src/sdk/namespace_cache.js
🚧 Files skipped from review as they are similar to previous changes (6)
- src/test/utils/object_api_functions.js
- src/sdk/map_client.js
- src/server/object_services/object_server.js
- src/api/object_api.js
- src/test/integration_tests/internal/test_map_builder.js
- src/server/object_services/md_store.js
1eec208 to
151df05
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/server/object_services/md_store.js (1)
1922-1927:⚠️ Potential issue | 🟠 MajorAggregate blocks before joining them back to parts.
This still expands to one row per
(part, block), so Postgres ships anddecode_json()re-parses the samepart_data/chunk_dataonce per block. On top of that, joining chunks viac.data->>'_id'gives up the direct_idlookup the oldfind_chunks_by_ids()path had. On replicated/EC chunks that can wipe out the latency win this PR is targeting. Please aggregate blocks per chunk first and join that grouped result back on the chunk table’s_idcolumn.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/object_services/md_store.js` around lines 1922 - 1927, Aggregate blocks by chunk before joining: replace the direct LEFT JOIN to ${this._blocks.name} with a subquery that groups blocks by their chunk id (e.g., SELECT b.data->>'chunk' AS chunk_id, jsonb_agg(b.data) AS blocks FROM ${this._blocks.name} b WHERE (b.data->'deleted' IS NULL OR b.data->'deleted' = 'null'::jsonb) GROUP BY chunk_id) and then LEFT JOIN that grouped result back to ${this._chunks.name} using the chunk's _id column (match the grouped.chunk_id to the chunk `_id` value) so the main SELECT returns part_data, chunk_data and the aggregated blocks per chunk (e.g., blocks) instead of one row per (part,block); update the SELECT list and any downstream decode_json()/parsing to expect aggregated blocks.
🧹 Nitpick comments (1)
src/server/object_services/map_reader.js (1)
25-39: Extract the shared chunk-assembly step.
read_parts_mapping_postgres()now duplicates thechunks_db_by_id→new ChunkDB()→prepare_chunks()flow fromread_parts_mapping(). A small helper that consumes{ parts, chunks_db }would keep the Mongo and Postgres paths aligned the next time this mapping logic changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/object_services/map_reader.js` around lines 25 - 39, The Postgres path in read_parts_mapping_postgres duplicates the chunk-assembly flow from read_parts_mapping; extract that shared logic into a new helper (e.g., assemble_chunks_from_parts or buildChunksFromParts) that takes an object with {parts, chunks_db}, constructs ChunkDB instances (using chunk.toHexString() lookups into chunks_db_by_id and attaching parts), calls map_server.prepare_chunks({chunks}), and returns the prepared chunks; then replace the duplicated block in read_parts_mapping_postgres (and update read_parts_mapping to use the same helper) so both functions call this single helper instead of repeating the chunks_db_by_id → new ChunkDB(...) → prepare_chunks flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/sdk/namespace_nb.js`:
- Around line 88-97: The inline-read path uses this.target_bucket instead of the
resolved bucket present in rpc_params, causing reads to use an undefined bucket
when NamespaceNB was called without a fixed bucket; update the call to
object_sdk.object_io.read_entire_object(...) to pass the resolved bucket from
rpc_params (rpc_params.bucket) instead of this.target_bucket so the inline GET
uses the same effective bucket returned/used by
object_sdk.rpc_client.object.read_object_md; keep all other args (client:
object_sdk.rpc_client, object_md: info) unchanged.
In `@src/server/object_services/object_server.js`:
- Around line 696-705: The code currently trusts caller-supplied obj_id/size
before permission and mapping, allowing ID/key mismatch attacks; after
load_bucket and get_obj_id but before calling req.has_s3_bucket_permission or
map_reader/read_object_mapping, look up the object row by its identifier with
bucket and key predicates (e.g., query where _id = obj_id AND bucket =
req.bucket AND key = '/' + key or equivalent), rebind obj_id and replace size
with the stored size from that row (or return NOT_FOUND if no matching row),
then perform the authorization check (req.has_s3_bucket_permission) and only
call map_reader/read_object_mapping using the verified object row and stored
size.
---
Duplicate comments:
In `@src/server/object_services/md_store.js`:
- Around line 1922-1927: Aggregate blocks by chunk before joining: replace the
direct LEFT JOIN to ${this._blocks.name} with a subquery that groups blocks by
their chunk id (e.g., SELECT b.data->>'chunk' AS chunk_id, jsonb_agg(b.data) AS
blocks FROM ${this._blocks.name} b WHERE (b.data->'deleted' IS NULL OR
b.data->'deleted' = 'null'::jsonb) GROUP BY chunk_id) and then LEFT JOIN that
grouped result back to ${this._chunks.name} using the chunk's _id column (match
the grouped.chunk_id to the chunk `_id` value) so the main SELECT returns
part_data, chunk_data and the aggregated blocks per chunk (e.g., blocks) instead
of one row per (part,block); update the SELECT list and any downstream
decode_json()/parsing to expect aggregated blocks.
---
Nitpick comments:
In `@src/server/object_services/map_reader.js`:
- Around line 25-39: The Postgres path in read_parts_mapping_postgres duplicates
the chunk-assembly flow from read_parts_mapping; extract that shared logic into
a new helper (e.g., assemble_chunks_from_parts or buildChunksFromParts) that
takes an object with {parts, chunks_db}, constructs ChunkDB instances (using
chunk.toHexString() lookups into chunks_db_by_id and attaching parts), calls
map_server.prepare_chunks({chunks}), and returns the prepared chunks; then
replace the duplicated block in read_parts_mapping_postgres (and update
read_parts_mapping to use the same helper) so both functions call this single
helper instead of repeating the chunks_db_by_id → new ChunkDB(...) →
prepare_chunks flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5a9f2a8b-3d5b-44a1-812f-677e7aeeda97
📒 Files selected for processing (12)
src/api/object_api.jssrc/endpoint/s3/ops/s3_get_object.jssrc/sdk/map_client.jssrc/sdk/namespace_cache.jssrc/sdk/namespace_nb.jssrc/sdk/namespace_s3.jssrc/server/object_services/map_reader.jssrc/server/object_services/md_store.jssrc/server/object_services/object_server.jssrc/test/integration_tests/api/s3/test_s3_bucket_policy.jssrc/test/integration_tests/internal/test_map_builder.jssrc/test/utils/object_api_functions.js
✅ Files skipped from review due to trivial changes (1)
- src/sdk/map_client.js
🚧 Files skipped from review as they are similar to previous changes (3)
- src/test/utils/object_api_functions.js
- src/sdk/namespace_cache.js
- src/test/integration_tests/internal/test_map_builder.js
| load_bucket(req); | ||
| const obj_id = get_obj_id(req, 'BAD_OBJECT_ID'); | ||
| if (typeof size !== 'number' || size < 0) { | ||
| throw new RpcError('BAD_REQUEST', 'size must be a positive number'); | ||
| } | ||
| const obj = { _id: obj_id, size }; | ||
|
|
||
| // Check if the requesting account is authorized to read the object | ||
| if (!await req.has_s3_bucket_permission(req.bucket, 's3:GetObject', '/' + obj.key, undefined)) { | ||
| if (!await req.has_s3_bucket_permission(req.bucket, 's3:GetObject', '/' + key, undefined)) { | ||
| throw new RpcError('UNAUTHORIZED', 'requesting account is not authorized to read the object'); |
There was a problem hiding this comment.
Don’t authorize read_object_mapping against a caller-supplied key alone.
Lines 697-705 no longer verify that obj_id actually belongs to the requested bucket/key/system before the permission check. Since obj_id and size are already exposed by metadata/list responses, a caller can pair an allowed key with another object’s obj_id and fetch that other object’s mapping. Please re-bind the ID to the real object row (or enforce bucket/key predicates in the mapping query) and use its stored size before calling map_reader.
🛡️ Proposed fix
load_bucket(req);
const obj_id = get_obj_id(req, 'BAD_OBJECT_ID');
if (typeof size !== 'number' || size < 0) {
- throw new RpcError('BAD_REQUEST', 'size must be a positive number');
+ throw new RpcError('BAD_REQUEST', 'size must be a non-negative number');
}
- const obj = { _id: obj_id, size };
+ const obj = await MDStore.instance().find_object_by_id(obj_id);
+ check_object_mode(req, obj, 'NO_SUCH_OBJECT');
- if (!await req.has_s3_bucket_permission(req.bucket, 's3:GetObject', '/' + key, undefined)) {
+ if (!await req.has_s3_bucket_permission(req.bucket, 's3:GetObject', '/' + obj.key, undefined)) {
throw new RpcError('UNAUTHORIZED', 'requesting account is not authorized to read the object');
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server/object_services/object_server.js` around lines 696 - 705, The code
currently trusts caller-supplied obj_id/size before permission and mapping,
allowing ID/key mismatch attacks; after load_bucket and get_obj_id but before
calling req.has_s3_bucket_permission or map_reader/read_object_mapping, look up
the object row by its identifier with bucket and key predicates (e.g., query
where _id = obj_id AND bucket = req.bucket AND key = '/' + key or equivalent),
rebind obj_id and replace size with the stored size from that row (or return
NOT_FOUND if no matching row), then perform the authorization check
(req.has_s3_bucket_permission) and only call map_reader/read_object_mapping
using the verified object row and stored size.
151df05 to
ea4f625
Compare
Signed-off-by: nadav mizrahi <[email protected]>
ea4f625 to
6e28f77
Compare
Describe the Problem
Explain the Changes
Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
Summary by CodeRabbit
Breaking Changes
Tests