Add new rego function for blob files#3125
Conversation
|
The Konflux failure seems unrelated, I can see this:
|
12b88a8 to
06a6d9d
Compare
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
048bb45 to
a231832
Compare
Review Summary by QodoAdd ec.oci.blob_files rego function for blob tar archive extraction
WalkthroughsDescription• Add new ec.oci.blob_files rego function to extract structured files from OCI blob tar archives • Support YAML and JSON file parsing within blob archives for policy checks • Refactor blob digest verification to be optional for compressed layer handling • Add comprehensive unit tests and documentation for new function Diagramflowchart LR
A["OCI Blob Reference"] -->|ec.oci.blob_files| B["Tar Archive Reader"]
B -->|Extract Files| C["YAML/JSON Parser"]
C -->|Convert to JSON| D["Structured File Object"]
E["File Paths Array"] -->|Filter| B
D -->|Cache Result| F["Return to Rego Policy"]
File Changes1. internal/rego/oci/oci.go
|
Code Review by Qodo
1. Digest verification disabled
|
a231832 to
ee25133
Compare
|
I'm burning some tokens to get above the CodeCov threshold. New revision coming soon. |
ee25133 to
075ae19
Compare
Review Summary by QodoAdd ec.oci.blob_files rego function with defensive limits and comprehensive tests
WalkthroughsDescription• Add new ec.oci.blob_files rego function to extract structured files from blob tar archives • Implement defensive size limits (500MB) to prevent memory exhaustion attacks • Refactor blob digest verification logic to support both compressed and uncompressed layers • Distinguish cache keys between blob and image file operations to avoid theoretical clashes • Add comprehensive unit tests covering success cases, error handling, and edge cases • Add make target for automated Go code formatting with gofmt -w Diagramflowchart LR
A["OCI Blob Reference"] -->|ociBlobFiles| B["Tar Archive Reader"]
B -->|Extract Files| C["Path Matching"]
C -->|Size Validation| D["YAML/JSON Parsing"]
D -->|Cache| E["Structured Files Object"]
F["Memory Limit 500MB"] -.->|Protection| D
File Changes1. internal/rego/oci/oci.go
|
Code Review by Qodo
1. Unverified blob poisons cache
|
| // Get the blob content first (skip digest verification due to compressed/uncompressed mismatch) | ||
| blobTerm, err := ociBlobInternal(bctx, refTerm, false) | ||
| if err != nil || blobTerm == nil { |
There was a problem hiding this comment.
1. Unverified blob poisons cache 🐞 Bug ⛨ Security
ociBlobFiles fetches blob content via ociBlobInternal(..., verifyDigest=false), but ociBlobInternal still stores the blob in cc.blobCache keyed only by refStr. Subsequent calls to ec.oci.blob for the same ref will be served from cache and skip digest verification, making results depend on evaluation order.
Agent Prompt
### Issue description
`ociBlobFiles` calls `ociBlobInternal(..., verifyDigest=false)` which still stores the blob into the shared `cc.blobCache` keyed only by `refStr`. If `ec.oci.blob` is later called for the same `refStr`, it can return the cached value without performing digest verification, making the result depend on the order of builtin evaluation.
### Issue Context
`cc.blobCache` is checked before any digest verification occurs, and `cc.blobCache.Store(refStr, term)` happens regardless of the `verifyDigest` flag.
### Fix Focus Areas
- internal/rego/oci/oci.go[440-444]
- internal/rego/oci/oci.go[512-532]
- internal/rego/oci/oci.go[1048-1050]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Hmm, I guess we could check the digest after every cache fetch. Not sure how much I care about it.
There was a problem hiding this comment.
Question:
Hmm, I guess we could check the digest after every cache fetch. Not sure how much I care about it.
Answer:
If you want Qodo Merge to be less “pushy” on items like this (where you may accept the risk / tradeoff), add pragmatic guidance to the improve suggestions prompt.
Put this in your .pr_agent.toml (or wiki config) to make security/cache-order findings come with a brief risk assessment + options, and to avoid insisting on fixing low-severity issues:
[pr_code_suggestions]
extra_instructions = """\
- When raising Security/Bug findings, include: (a) realistic exploitability, (b) likelihood, (c) impact, and (d) whether results depend on evaluation order.
- If a fix has performance/caching tradeoffs (e.g., verifying digest after every cache hit), propose at least 2 alternatives (e.g., cache keys include verifyDigest/digest; store verified-only; verify on cache fetch; separate caches) and clearly label pros/cons.
- Do not insist on changing behavior when the issue is low likelihood/low impact; explicitly mark it as "optional hardening" unless you can justify a concrete exploit path.
- Prefer concise recommendations and avoid repeating the same concern across multiple suggestions.
"""If you meant this should never be flagged, you can further refine the instruction to treat “optional hardening” items as low-priority (or increase your filtering using pr_code_suggestions.suggestions_score_threshold), but you’ll need to decide your team’s acceptable risk level to tune that accurately.
Relevant Sources:
- https://docs.qodo.ai/qodo-documentation/code-review/qodo-merge/tools/improve#extra-instructions-and-best-practices
- https://docs.qodo.ai/qodo-documentation/code-review/qodo-merge/tools/improve#extra-instructions
- https://docs.qodo.ai/qodo-documentation/code-review/qodo-merge/usage-guide/additional_configurations#extra-instructions
- https://docs.qodo.ai/qodo-documentation/code-review/qodo-merge/usage-guide/configuration_options#local-configuration-file
| // Get the blob content first (skip digest verification due to compressed/uncompressed mismatch) | ||
| blobTerm, err := ociBlobInternal(bctx, refTerm, false) | ||
| if err != nil || blobTerm == nil { | ||
| logger.WithFields(log.Fields{ | ||
| "action": "fetch blob", | ||
| "error": err, | ||
| }).Error("failed to fetch blob content") | ||
| return nil, nil //nolint:nilerr | ||
| } | ||
|
|
||
| blobContent, ok := blobTerm.Value.(ast.String) | ||
| if !ok { | ||
| logger.Error("blob content is not a string") | ||
| return nil, nil | ||
| } | ||
|
|
||
| pathsArray, err := builtins.ArrayOperand(pathsTerm.Value, 1) | ||
| if err != nil { | ||
| logger.WithFields(log.Fields{ | ||
| "action": "convert paths", | ||
| "error": err, | ||
| }).Error("failed to convert paths to array operand") | ||
| return nil, nil //nolint:nilerr | ||
| } | ||
|
|
||
| // Collect target paths for exact file matching | ||
| var targetPaths []string | ||
| err = pathsArray.Iter(func(pathTerm *ast.Term) error { | ||
| pathString, ok := pathTerm.Value.(ast.String) | ||
| if !ok { | ||
| return fmt.Errorf("path is not a string: %#v", pathTerm) | ||
| } | ||
| targetPaths = append(targetPaths, string(pathString)) | ||
| return nil | ||
| }) | ||
| if err != nil { | ||
| logger.WithFields(log.Fields{ | ||
| "action": "iterate paths", | ||
| "error": err, | ||
| }).Error("failed iterating paths") | ||
| return nil, nil //nolint:nilerr | ||
| } | ||
|
|
||
| if len(targetPaths) == 0 { | ||
| logger.Debug("No paths specified, returning empty result") | ||
| term := ast.NewTerm(ast.NewObject()) | ||
| cc.imageFilesCache.Store(cacheKey, term) | ||
| return term, nil | ||
| } | ||
|
|
||
| // Create a tar reader from the blob content | ||
| blobReader := strings.NewReader(string(blobContent)) | ||
| archive := tar.NewReader(blobReader) | ||
|
|
There was a problem hiding this comment.
2. Buffers entire layer in memory 🐞 Bug ⛯ Reliability
ociBlobFiles fetches the whole uncompressed blob into memory (via ociBlobInternal), then constructs a tar.Reader over the in-memory string. This defeats the per-entry size guard (maxTarEntrySize) because the process can OOM while buffering/decompressing the full layer before any tar-entry checks run.
Agent Prompt
### Issue description
`ociBlobFiles` currently loads the entire uncompressed layer into memory (via `ociBlobInternal`) before parsing it as a tar archive. This can OOM on large layers even if individual file entries are limited.
### Issue Context
The per-entry `maxTarEntrySize` check happens after the entire blob has already been copied into a `bytes.Buffer`.
### Fix Focus Areas
- internal/rego/oci/oci.go[473-532]
- internal/rego/oci/oci.go[1048-1101]
- internal/rego/oci/oci.go[1143-1163]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Yeah, this is true. I don't want to fix it, but I'll add a comment.
st3penta
left a comment
There was a problem hiding this comment.
LGTM, i just left a comment about a possible duplicate
|
A couple of Qodo suggestions make sense. I'm pushing a couple more tweaks. |
I guess our Renovate PRs don't do this. Unrelated cleanup while working on the PR for.... Ref: https://issues.redhat.com/browse/EC-1681
The goal is to be able to access the task definitions inside a Tekton task bundle, so we can then apply task definition related policy checks to them, but I think this is a generally useful addition to our custom oci functions. Ref: https://issues.redhat.com/browse/EC-1681 Co-authored-by: Claude Code <[email protected]>
Introduce a limit to avoid potential memory exhaustion attacks. Honestly I'm not sure how important this is, but I guess it's fine. It was suggested by in code review by Qodo. We actually did have a limit in the past, but it was too small, and caused problems with large SBOMs. I hestiated to bring back a limit, however, the context is a little different, ociBlob vs ociBlobFile, and I'm making the limit 500MB instead of 10MB. Ref: https://issues.redhat.com/browse/EC-1681 Co-authored-by: Claude Code <[email protected]>
Multiple LLMs are worried about using the same cache key for two potentially different things. Let's make it clear that it's not a problem. In reality I don't think it could be a problem, since the cache key includes the image ref, which in the case of a blob will always have a digest. But, I figure there's no harm in making the code intent clear, and avoid having to explain that subtle detail to the bots. Ref: https://issues.redhat.com/browse/EC-1681 Co-authored-by: Claude Code <[email protected]>
Sometimes Claude doesn't format everything so I want an easy way to fix the formatting. There might be a better way to do it (and maybe make lintfix will do it?), but this works for me. Unrelated tweak while working on... Ref: https://issues.redhat.com/browse/EC-1681
075ae19 to
3c0f194
Compare
|
Remove the snake case vars, and rename the cache vars to indicate they're used for image files and blob files now. Also added a comment about the limiting. The other Qodo suggestions I think I'm comfortable ignoring for now. |
Well the bot wanted the limit checks, then the bot pointed out that they aren't very effective. I don't want to add limit checks on all the oci fetches, but I also don't want to re-remove what we have here, so let's at least point out that its value is questionable. Ref: https://issues.redhat.com/browse/EC-1681
Since we're using the cache for both image files and blob files, let's stop calling it imageFilesCache. Same thing for single flight. Suggested by Qodo in code review. Co-authored-by: Claude Code <[email protected]> Ref: https://issues.redhat.com/browse/EC-1681
3c0f194 to
46155ec
Compare
The goal is to be able to access the task definitions inside a Tekton task bundle, so we can then apply the task related policy checks to them.
Ref: https://issues.redhat.com/browse/EC-1681