Add support for memory pages compression#2895
Open
rst0git wants to merge 17 commits into
Open
Conversation
4 tasks
avagin
reviewed
Feb 18, 2026
avagin
reviewed
Feb 18, 2026
avagin
reviewed
Feb 18, 2026
5c86e95 to
50b748b
Compare
ee2618a to
7ac3c61
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds optional LZ4-based compression for memory pages images (pages.img) in CRIU, recording per-page compressed sizes in the pagemap image so restore can locate and decompress pages correctly (including for streaming, pre-dump chains, and page-server flows).
Changes:
- Introduces
--compress/RPC support and persists the setting in inventory images. - Extends pagemap images with
compressed_size[]andtotal_compressed_size, and updates dump/restore page I/O paths (including a helper daemon for PIE restore). - Updates ZDTM and CI scripts to exercise compressed dumps/restores.
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
criu/compression.c |
New LZ4 compression/decompression + helper daemon for restore-time decompression. |
criu/include/compression.h |
Public compression API and page-size bound macro. |
criu/include/cr_options.h |
Adds pages_compression option. |
criu/config.c |
Adds -c/--compress option parsing. |
criu/cr-service.c |
Wires RPC option to enable compression. |
criu/crtools.c |
Adds CLI help text for --compress (under CONFIG_LZ4). |
criu/page-xfer.c / criu/include/page-xfer.h |
Implements compressed write path (local + page-server receive side buffering). |
criu/pagemap.c / criu/include/pagemap.h |
Implements compressed read paths (local + streaming) and carries compressed metadata into restorer args. |
criu/mem.c |
Starts helper daemon and passes pipe fds to restorer. |
criu/pie/restorer.c / criu/include/restorer.h |
Adds compressed restore path via pipe protocol to helper daemon. |
criu/cr-restore.c |
Fixes up restorer pointers for compressed_size arrays. |
criu/image.c |
Persists compression setting in inventory.img and enables it on restore when present. |
images/pagemap.proto |
Adds compressed_size[] and total_compressed_size. |
images/inventory.proto |
Adds pages_compression to inventory entry. |
images/rpc.proto |
Adds RPC compress boolean option. |
criu/unittest/unit.c / criu/Makefile* |
Adds unit test coverage and build integration for compression module. |
test/zdtm.py / scripts/ci/run-ci-tests.sh |
Adds --compress wiring and CI test runs. |
Makefile.config / dependency scripts |
Adds LZ4 feature detection and distro package dependencies. |
Documentation/criu.txt |
Documents --compress. |
contrib/criu-compression-benchmark.py |
Adds benchmarking script for compression impact. |
3ad2776 to
da288db
Compare
da288db to
8b1ef89
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## criu-dev #2895 +/- ##
============================================
- Coverage 57.26% 56.59% -0.67%
============================================
Files 154 156 +2
Lines 40444 41666 +1222
Branches 8866 9146 +280
============================================
+ Hits 23161 23582 +421
- Misses 17019 17820 +801
Partials 264 264 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8b1ef89 to
6682d0a
Compare
6a5785d to
ab5bd78
Compare
| total_blocks = nr_pages; | ||
| } | ||
|
|
||
| if ((uint64_t)total_blocks * sizeof(uint32_t) > SIZE_MAX / 2) { |
ab5bd78 to
7dee7b2
Compare
| try: | ||
| with open(p, "rb") as f: | ||
| chunks.append(f.read()) | ||
| except OSError: |
728fcd2 to
450cc01
Compare
Add build system plumbing for LZ4 compression. When liblz4 is found via pkg-config, CONFIG_LZ4 is defined and the library is linked. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add the protobuf fields used to encode memory page compression both in images and on the wire. - (inventory) uint32 compress: compression mode for the dump, encoded with enum compress_mode values: 0 = off, 1 = per-page, 2 = region. Lets the restore side detect and reproduce the compression encoding automatically. - (pagemap) repeated uint32 compressed_size: per-block compressed size array. Each value is the number of bytes the compressed block occupies in the pages image. In per-page mode each block is one page; in region mode each block covers up to region_pages consecutive pages. Sentinel values: 0 = all-zero block (no payload is stored), block bytes = stored raw (no decompression needed), anything else = LZ4-compressed block of that size. - (pagemap) uint64 total_compressed_size: sum of compressed_size[]. Used to size the read in one pread(); uint64 is needed because a single pagemap entry can cover millions of pages and the sum can exceed 4 GiB. - (pagemap) uint32 region_pages: number of pages per compressed block in region mode. Absent or 0 means per-page compression. - (rpc) uint32 compress: same encoding as the inventory field. - (rpc) uint32 compress_acceleration: LZ4 acceleration value. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add compression.h with the public helpers used by the dump and
restore paths:
- compress_data() / decompress_data(): per-page LZ4 round-trip.
- compress_region() / decompress_region(): multi-page region LZ4
round-trip with built-in zero-region detection (returns 0) and a
store-raw fallback (returns block_bytes) when the region does
not compress below REGION_COMPRESSION_THRESHOLD.
- page_is_all_zero(): fast zero-page detection using unsigned long
comparison, mirroring is_folio_zero_filled() in the kernel.
The header also exports:
- enum compress_mode { COMPRESS_OFF, COMPRESS_PER_PAGE,
COMPRESS_REGION }.
- PAGE_COMPRESSED_SIZE_BOUND, REGION_COMPRESSED_SIZE_BOUND(n_pages)
-- LZ4 worst-case output size for one page or for a region of
n_pages pages.
- PAGE_COMPRESSION_THRESHOLD,
REGION_COMPRESSION_THRESHOLD(region_bytes) -- store-raw thresholds.
- LZ4_DEFAULT_ACCELERATION, LZ4_MAX_ACCELERATION.
- MAX_REGION_PAGES (1024), DEFAULT_REGION_PAGES (64).
Stubs are provided for builds without CONFIG_LZ4.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add CLI options to enable memory page compression and the
corresponding feature check used to gate the LZ4 build.
-c, --compress enable per-page LZ4 compression
--compress-region SIZE enable region LZ4 compression with
the given region size; SIZE accepts
K/M/G suffixes (e.g. 256K, 1M)
--compress-acceleration N LZ4 acceleration; implies --compress
if no other mode is set
criu check --feature compress
The selected mode is stored in opts.compress_mode (enum compress_mode
value) and persisted in the inventory image so that the restore
side detects the encoding automatically. When CRIU is built without
CONFIG_LZ4, the option is rejected early in check_options() with a
clear error message. --compress-region is also rejected when used
with --page-server or --stream, because those wire formats are
per-page only.
The RPC interface accepts the same options via the compress,
compress_acceleration and compress_region_size fields.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add the local-image and page-server write paths for memory page compression. write_pagemap_loc_compressed() and write_pages_loc_compressed() buffer per-block compressed sizes into pending_pe and flush a PagemapEntry once all blocks of an iovec have been compressed. The loop body is parameterised on pending_pe.region_pages: when 0, each page is compressed independently; when non-zero, pages are accumulated into regions of region_pages and compressed as a single LZ4 block. Zero pages and zero regions are stored with compressed_size=0 (no image payload); blocks that do not compress below the 7/8 store-raw threshold are written verbatim. For the page server, add PS_IOV_ADD_F_COMPRESSED and write_pages_to_server_compressed(): pages are compressed before being sent over the network and the receiver writes the compressed bytes to the local image without re-compressing. write_fd_full() handles short writes on the pages image. close_page_xfer() frees pending_pe.compressed_size on error paths; it is initialised to NULL so the unused-branch close is a no-op. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add comments to the page_read function pointers and data fields. No functional changes. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The PIE restorer cannot link against LZ4, so a helper daemon process handles decompression. The daemon is forked in prepare_vma_ios() and communicates with the restorer over a pair of pipes. Wire protocol header (struct pipe_hdr): pid_t remote_pid; off_t offs; /* file offset in pages.img */ uint64_t total_compressed_size; int n_pages; /* total pages in request */ int nr_iovs; /* number of destination iovecs */ int n_blocks; /* count of compressed_size[] */ uint32_t region_pages; /* 0 = per-page, >0 = region */ After the header come compressed_size[n_blocks]; in region mode the daemon then reads block_pages[n_blocks] (uint16 per block) giving each block's actual page count (the last block of an entry may be shorter than region_pages). The remote-destination iovs[nr_iovs] follow last. The daemon reads compressed data with a single pread() per request, decompresses block-by-block (one page in per-page mode, up to region_pages pages in region mode), and writes the result into the target process via process_vm_writev(). Zero pages are not written at all; the target process VMAs are MAP_ANONYMOUS, so unwritten pages remain on the kernel zero page and do not consume physical memory. The decompression buffer is mmap(MAP_ANONYMOUS) with MADV_HUGEPAGE to enable the fast GUP path in process_vm_writev() and to reduce TLB misses. MADV_DONTNEED re-zeros the buffer between requests. posix_fadvise(FADV_DONTNEED) is called after each batch read to release page cache for already-read compressed data. Per-block compressed sizes (and per-block page counts in region mode) are validated against the corresponding bounds before use to prevent out-of-bounds reads from corrupted images. Negative n_pages/nr_iovs/n_blocks values are rejected. The process_vm_writev() iovec count is capped at IOV_MAX per call. Pipe I/O uses pipe_write_full()/pipe_read_full() in the PIE restorer and read_full() in the daemon to handle short reads and writes on pipe buffer boundaries. The daemon PID is stored in decompress_daemon_pid in task_restore_args instead of appending to the helpers array, which would corrupt the array built by collect_helper_pids(). The restorer waits for the daemon explicitly after closing the pipes. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add maybe_read_page_local_compressed() and maybe_read_page_img_streamer_compressed() for restoring compressed pages from local images and streaming pipes respectively. Both readers fall back to the uncompressed path when a pagemap entry has no compressed_size array, which happens with shared memory pagemaps or entries from uncompressed parent images. They also dispatch on pe->region_pages: per-page mode uses read_compressed_pages(), which decompresses page-by-page directly into the destination buffer; region mode uses read_compressed_pages_region(), which decompresses an entire block (up to region_pages pages) into a heap scratch buffer and copies the requested page slice into the destination iovec, supporting partial-region reads via an in-block cursor (region_block_offset). skip_pagemap_pages() advances pi_off by summing per-block compressed sizes; in region mode it walks block-by-block and keeps region_block_offset consistent so partial-region skips remain correct. Per-block compressed sizes are validated against PAGE_COMPRESSED_SIZE_BOUND or REGION_COMPRESSED_SIZE_BOUND(n_pages) as appropriate. Zero blocks (compressed_size=0) are restored with memset. The pread() calls loop to handle short reads. The PR_ASYNC flag is supported. Compressed reads are enqueued via pagemap_enqueue_iovec(); coalescing requires matching region_pages between piovs. process_async_reads() reads all compressed data in one pread() call and decompresses block-by-block into the destination iovecs, with a direct-into-iovec fast path in region mode when a block fits inside a single destination slot. posix_fadvise(FADV_SEQUENTIAL) is applied to the pages image fd to hint the kernel for aggressive readahead. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
process_async_reads() allocates a single buffer for all compressed data in a piov batch. When pages coalesce into one giant piov (common with large GPU checkpoints), the buffer can exceed host memory. For example, checkpointing LLaMA 3.1-8B running on A100-SXM4-80GB has 77 GB of memory and produces ~72 GiB of compressed data. Thus, without this patch it would require 72 GiB for the decompression buffer and 77 GiB of premapped pages: 149 GiB total. This can exceed host memory and result in OOM during restore. Cap compressed piov batches at 1 GiB of compressed data during coalescing in pagemap_enqueue_iovec(). Larger checkpoints split into multiple batches, each allocating a bounded decompression buffer. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Wire up the memory page compression options through the zdtm test framework for both CLI and RPC modes: -c, --compress --compress-region SIZE (K/M/G suffix accepted) --compress-acceleration N The page-count validation auto-detects compression from the test descriptor opts, so the flags work whether they come from the CLI or from a .desc file. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add zdtm tests that verify memory page content after checkpoint/restore with compression, in both per-page and region modes: compress_pages00 / compress_pages_region00: single process with zero-filled pages, compressible pattern pages, and incompressible random pages. Exercises all three compression outcomes (zero-skip, LZ4 compressed, raw fallback). compress_pages01 / compress_pages_region01: parent/child process tree with copy-on-write pages. Parent fills 64 pages, child modifies 16 of them. After restore, both parent and child verify their respective views byte-by-byte. compress_pages02 / compress_pages_region02: eight different mapping types in a parent/child tree -- MAP_PRIVATE anonymous (data and zeros), MAP_SHARED anonymous, private and shared file-backed, memfd shared, read-only (PROT_READ after mprotect), and PROT_NONE guard page adjacent to a data page. The compress_pages_region* siblings share C source with the per-page tests (via symlinks) and differ only in their .desc opts string. All tests use the compress feature check to auto-skip when CRIU is built without LZ4. The .desc files set --compress (-c) or --compress-region=256K so compression is always active and the tests run with --pre, --page-server, --lazy-pages, --stream, etc. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add compression test coverage to the CI script: - iterative checkpointing with compression - iterative + dedup, iterative + page-server - compress_pages tests in basic, iterative, page-server, dedup, and lazy-pages modes - streaming tests with compress_pages - mixed-compression parent chain test Add test/others/compress-mixed/ which tests mixed-compression parent chains: two uncompressed pre-dumps followed by a compressed final dump, then restore. This exercises the per-entry fallback in the compressed reader when parent pagemap entries have no compressed_size array. Add shellcheck coverage for test/others/compress-mixed/. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add a Python benchmark that measures the storage and performance
impact of memory page compression across the configured modes and
data patterns.
Layout:
contrib/compression-benchmark/main.py -- driver and reporter
contrib/compression-benchmark/workload.py -- pattern generators;
also runs as the
long-lived workload
process under criu.
main.py imports workload.fill_pattern() so the SHA-256 the driver
expects after restore is computed from the same code that wrote
the bytes inside the workload, avoiding any drift between the two
sides of the integrity check.
Sweeps compression mode (none / per-page / region) and, for region
mode, region size (default 64 K, 256 K, 1 M). Workload patterns:
zero (highly compressible), mixed (50% zero / 25% repeating /
25% random), random (incompressible), text (JSON-shaped), elf
(concatenated system binaries). Reports compression ratio, dump
and restore latency (median with interquartile range), throughput,
and CRIU stats counters; validates memory integrity via SHA-256
across each restore.
Usage:
sudo python3 contrib/compression-benchmark/main.py
sudo python3 contrib/compression-benchmark/main.py \
-p mixed text elf --modes none per-page region \
--region-sizes 65536 262144 1048576 --json out.json
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add offline tools to convert checkpoint images between compressed and uncompressed formats: crit compress <dir> -- compress memory pages with LZ4 crit decompress <dir> -- decompress memory pages By default, original files are backed up as .bak. Use --in-place to skip backups. The --acceleration flag controls LZ4 speed/ratio trade-off. Requires the Python lz4 package (optional dependency, added to all package manager dependency lists). When lz4 is not installed, other crit commands work normally and the compress/decompress commands print install instructions. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add five tests covering compress and decompress round-trips using compress_pages02 which exercises all eight mapping types (anonymous, zeros, shared, file-backed, memfd, read-only, guard pages). - compressed dump, decompress with crit, restore and verify - uncompressed dump, compress with crit, restore and verify - compress already compressed, decompress already decompressed - compress, decompress, compress, verify pages are identical - decompress, compress, decompress, verify pages are identical Each restore runs the test process which verifies all memory regions byte-by-byte. The round-trip tests also compare md5 checksums of the raw pages data across cycles. When lz4 or CRIU compression support is not available, the tests are skipped gracefully. Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Test compress_data() / decompress_data() with zero-filled,
repeating pattern, pseudo-random, and single-byte pages across
three LZ4 acceleration levels.
Test compress_region() / decompress_region() with the same
patterns at region sizes {16, 64, 256} pages and acceleration
levels {1, 4, 32}, including an "all zeros except one non-zero
page" case to exercise the zero pre-pass fast path and per-page
zero detection inside the decompression result.
Also test page_is_all_zero() edge cases.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
450cc01 to
4889093
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request extends CRIU with support for LZ4 compression of
memory pages during dump, pre-dump, and restore. Memory pages are compressed individually and their compressed sizes are stored in the pagemap entry. During restore, the corresponding file offsets in the pages file are computed by adding the compressed sizes. This approach preserves support for optimizations such as iterative checkpointing, lazy pages, page server migration, and image streaming.
Before compression, zero-filled pages are detected and skipped entirely, while memory pages with low compression ratio are stored raw to avoid unnecessary decompression overhead on restore. When compression is used with the page server, pages are compressed before being sent over the network to reduce the amount of data transferred during live migration.
On restore, a helper daemon handles decompression since the PIE restorer cannot link against external libraries like LZ4. In the future, this daemon can be extended to support decryption of memory pages as well.
This pull request also includes an compress/decompress extension of the CRIT tool for converting checkpoint images offline, a benchmark tool for for measuring compression performance and storage impact, and ZDTM tests covering various memory page content and mapping combinations.