feat: add support for preserving and labeling intermediate stage images#6556
feat: add support for preserving and labeling intermediate stage images#6556ezopezo wants to merge 1 commit intocontainers:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ezopezo The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Ephemeral COPR build failed. @containers/packit-build please check. |
59cd9ae to
6bd3187
Compare
b7e81df to
3314051
Compare
|
/retest |
|
@ezopezo: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@nalind can you please take a look and put ok-to-test label? It seems to me that tests are failing most likely with some timeouts and thus I would like to try to re-run them (or please tell me what I just broke :) ). |
|
/ok-to-test |
|
/test |
|
@ezopezo: No presubmit jobs available for containers/buildah@main DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
3314051 to
3955b20
Compare
|
@nalind @mtrmac @TomSweeneyRedHat can you please take a look on this? (or pick up some appropriate reviewers?) Thanks in advance! |
|
@flouthoc if you have time |
|
|
||
| buildah build --layers -t imageName . | ||
|
|
||
| buildah build --cache-stages --stage-labels -t imageName . |
There was a problem hiding this comment.
just to show the bool values in play, perhaps
| buildah build --cache-stages --stage-labels -t imageName . | |
| buildah build --cache-stages false --stage-labels false -t imageName . |
There was a problem hiding this comment.
I skipped that since this is a relatively uncommon use case for boolean flags, because the false behavior is implicit whenever the flag is absent. Other flags such as --no-cache or --layers follow the same pattern (present when true, absent when not, without presence of the --layers false) so I wanted to stay consistent with your docs. If you feel strongly about it, I certainly can add the examples :)
There was a problem hiding this comment.
There's a long-standing bit of confusion in the man pages where, even though the boolean argument values are optional, we don't suggest that they always have to be supplied in the --flag=value form, with an equal sign, to prevent them from being treated as unrelated arguments.
There was a problem hiding this comment.
@nalind Does it make a sense to you to add args with equal sign and boolean or leave it as it is? What do you think?
There was a problem hiding this comment.
If we want an example where the flag is set to false, the equal sign is going to be necessary.
| Find an intermediate image for a specific stage name: | ||
|
|
||
| buildah images --filter "label=io.buildah.stage.name=builder" | ||
|
|
pkg/cli/build.go
Outdated
| LayerLabels: iopts.LayerLabel, | ||
| Layers: layers, | ||
| CacheStages: iopts.CacheStages, | ||
| StageLabels: iopts.StageLabels, |
There was a problem hiding this comment.
Please add these three options in alphabetically. I.e. CacheStages at line 396.5 and so on.
pkg/cli/common.go
Outdated
| Format string | ||
| From string | ||
| Iidfile string | ||
| BuildIDFile string |
There was a problem hiding this comment.
To keep making my OCD happy, please move this to line 63.5
pkg/cli/common.go
Outdated
| Layers bool | ||
| ForceRm bool | ||
| Layers bool | ||
| CacheStages bool |
There was a problem hiding this comment.
More OCD alpha changes, please put this at line 28.5
|
|
||
| run_buildah rmi --all | ||
| } | ||
|
|
There was a problem hiding this comment.
can you update and/or add tests to use "true" and "false" for the boolean flags? Leave at least one using the default true for each too.
There was a problem hiding this comment.
For false I added two new tests for --cache-stages and --stage-labels and for true I updated one existing test 👍
6b2baab to
3272226
Compare
Does it make a sense to add an mutual exclusion condition that --layers and --cache-stages cannot be used together (as serve fundamentally different purposes)? I want to keep this functionality as isolated as possible from others to avoid unnecessary complexities like this. |
|
@nalind Would you be able to take an another look please? Thanks in advance! |
|
|
Yes, it works well |
|
@nalind Also I almost forgot to ask - how I can effectively test if current changes are compatible with podman?
|
Since we're on |
0fee359 to
a54b1ee
Compare
1bfc86d to
89f7b79
Compare
@nalind Thank you I did it according your guide and created this testing PR. I think it is failing because of the docs mismatch and exceeding some limit of the binary size (which could be overridden by label I think) and also I think I need from maintainers approval for running all of the jobs, could you help me with this please? |
|
I've given it a kick. |
ea82074 to
3818667
Compare
The configuration for the bot which handled those things was updated to not drop most containers-org repositories by openshift/release#71397; anything that was in-progress before it was merged would still include a record of previous interactions with the bot. |
I had to think about this for a bit, and I'm not sure it's what we want. Much like earlier versions of For the Would this use case be better served by doing something similar? The |
I may be misunderstanding this concern. Our implementation explicitly disables cache lookup when --cache-stages is used (stage_executor.go:1260-1262): This ensures every build with --cache-stages is a fresh build, and caches new intermediate images with a new build UUID. Cache evaluation logic (search) for stages doesn't run at all when --cache-stages is used. Regarding cache reuse scenarios:
Scenario 2:
Scenario 3 (considering that podman has always
However, our actual use case (Konflux SBOM generation) will use --cache-stages without --layers, so the layered cache scenario isn't relevant for us.
*I see the architectural similarity, but there's a practical difference IMHO:
Given that cache is disabled when --cache-stages is used, is there still a concern about labels (added by --stage labels, that cannot be used without --cache-stages) not being in build history? Could you clarify what specific scenario you're concerned about where the current approach would fail? |
3818667 to
cf654f6
Compare
This adds support for preserving and labeling intermediate stage images in multi-stage builds. In contrast to the --layers flag, --cache-stages preserves only the final image from each named stage (FROM ... AS name), not every instruction layer. This also keeps the final image's layer count unchanged compared to a regular build. --layers and --cache-stages can be used together. When --cache-stages is enabled with --layers together, all other subsequent builds without --cache-stages can use cached layers. When --cache-stages is enabled (with or without --layers) checkForLayers is set to false, blocking cache lookup (fresh build every time with --cache-stages). New flags: - --cache-stages: preserve intermediate stage images instead of removing them - --stage-labels: add metadata labels to intermediate stage images (stage name, base image, build ID, parent stage name). Requires --cache-stages. - --build-id-file: write unique build ID (UUID) to file for easier identification and grouping of intermediate images from a single build. Requires --stage-labels. The implementation also includes: - Detection of the pattern when stage is using another intermediate stage as base (indicated by labels) - Validation that --stage-labels requires --cache-stages - Validation that --build-id-file requires --stage-labels - Test coverage and documentation updates This functionality is useful for debugging, exploring, and reusing final intermediate stage images from multi-stage builds. Signed-off-by: Erik Mravec <emravec@redhat.com>
cf654f6 to
7031351
Compare
@nalind I understand, thank you for explanation! I am just repushing commit with chnages to trigger pipeline - is it common that I am getting so much failures from timeouts? It is very hard for me to get all-green (I wittnessed it few times) :| |
Yes, I was hoping to be able to remove this special-case check.
Your understand matches mine, yes.
There's really nothing enforcing that distinction, though, and I would advise against assuming otherwise when using the produced images.
Throwing the stage labels into the history makes the new feature's interaction with caching much easier to reason about - based on where in the parsed tree the new labels are added, we can predict how the produced images will interact with the cache evaluation logic in future builds, and I value that. |
This adds support for preserving and labeling intermediate stage images in multi-stage builds. In contrast to the
--layersflag,--cache-stagespreserves only the final image from each named stage (FROM ... AS name), not every instruction layer. This also keeps the final image's layer count unchanged compared to a regular build.New flags:
--cache-stages: preserve intermediate stage images instead of removing them--stage-labels: add metadata labels to intermediate stage images (stage name, base image, build ID, parent stage name). Requires--cache-stages.--build-id-file: write unique build ID (UUID) to file for easier identification and grouping of intermediate images from a single build. Requires--stage-labels.The implementation also includes:
--stage-labelsrequires--cache-stages--build-id-filerequires--stage-labelsWhat type of PR is this?
/kind feature
What this PR does / why we need it:
General use: This functionality is useful for identification, debugging, and reusing intermediate stage images in multi-stage builds.
Specific need: Identifying the content copied from intermediate stages in multi-stage builds into the final image is a hard requirement for supporting Contextual SBOM - an SBOM that understands the origin of each component.
While intermediate images can be extracted using the
--layersoption, this approach has several issues for our use case:buildah images --all), which introduces unnecessary noise for our purposes.--layers), meaning:Related repositories:
konflux (uses mobster for SBOM generation),
mobster (implements contextual SBOM functionality requiring this change),
capo (wraps builder content identification functionality for mobster),
Contact person: emravec (RedHat) / @ezopezo (Github)
How to verify it
Run any multistage build with intermediate stage specified with implemented arguments. Resulting intermediate images should be correctly labeled. Example:
buildah build --cache-stages --stage-labels --build-id-file ./file.txt -t test:0.1 .Which issue(s) this PR fixes:
Fixes: #6257
Internal Jira: https://issues.redhat.com/browse/ISV-6122
Does this PR introduce a user-facing change?