New option --mmio-region to specify memory regions#6747
New option --mmio-region to specify memory regions#6747tautschnig wants to merge 3 commits intodiffblue:developfrom
Conversation
|
Is that a 'check' or an 'instrumentation'? |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6747 +/- ##
===========================================
+ Coverage 80.00% 80.04% +0.03%
===========================================
Files 1700 1702 +2
Lines 188252 188533 +281
Branches 73 73
===========================================
+ Hits 150613 150910 +297
+ Misses 37639 37623 -16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
It affects what checks are generated (instrumented?). |
The test goes as follows: how would the flow with goto-cc look like? |
I see two possible approaches we could take, with the current PR implementing the first variant, but we could also go for option 2 (or any other approach that I might not be aware of at this time):
|
ac2eaf3 to
fafe18a
Compare
If "2" seems more like the the right thing to do, then let's do the right thing rather than the easy thing. |
|
Can we make sure this PR also adds documentation to the user manual about the behaviour of MMIO? |
|
Documentation here: #7157 |
|
@kroening - now that we have the requested documentation can we have approval for this PR? |
|
CBMC should not get new options that change the model. Modeling should be done in |
|
@tautschnig As a PR (#6748) depending on this is marked |
fafe18a to
6f275f7
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a new --mmio-region command-line option for CBMC and goto-instrument that provides a more efficient and scalable way to model memory-mapped I/O regions. The implementation creates individual byte-array objects for each declared MMIO region, avoiding the performance issues of the existing single unbounded __CPROVER_memory array approach. The new per-region model can be combined with the existing callback model (--mmio) for maximum flexibility.
Changes:
- Added new
--mmio-region addr:sizecommand-line option to bothcbmcandgoto-instrument - Implemented per-region MMIO instrumentation that creates individual byte-array symbols for each region
- Added deprecation warning for
__CPROVER_allocated_memoryrecommending migration to--mmio-region
Reviewed changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/goto-programs/mm_io.h | Added API for per-region MMIO model including mmio_regiont struct and parse_mmio_regions function |
| src/goto-programs/mm_io.cpp | Implemented per-region instrumentation logic with region parsing, overlap checking, and read/write dispatch |
| src/goto-programs/process_goto_program.cpp | Integrated per-region MMIO processing into the CBMC pipeline |
| src/cbmc/cbmc_parse_options.h/.cpp | Added --mmio-region command-line option to cbmc |
| src/goto-instrument/goto_instrument_parse_options.h/.cpp | Added --mmio-region command-line option to goto-instrument |
| src/ansi-c/goto-conversion/goto_check_c.cpp | Added deprecation warning for __CPROVER_allocated_memory |
| src/goto-cc/linker_script_merge.cpp | Cleaned up outdated comments about __CPROVER_allocated_memory |
| unit/goto-programs/mm_io.cpp | Added unit tests for per-region symbol creation and write instrumentation |
| unit/Makefile | Added mm_io.cpp to unit test build |
| regression/goto-instrument/mmio_* | Added 5 new regression tests covering basic functionality, offsets, symbolic addresses, bad format, and overlaps |
| regression/cbmc/mmio_per_region | Added regression test for per-region model in cbmc |
| regression/cbmc/union12, memory_allocation*, Pointer28, pointer-overflow2 | Added alternative cmdline.desc tests using --mmio-region |
| regression/cbmc-incr-smt2/pointers-conversions/pointer_from_int_cmdline.desc | Added test variant using --mmio-region |
| doc/man/goto-instrument.1 | Documented --mmio-region option with examples |
| doc/man/cbmc.1 | Documented --mmio-region option with examples |
| doc/cprover-manual/modeling-mmio.md | Updated user guide with comprehensive documentation of per-region model and deprecation notice |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/goto-programs/mm_io.h
Outdated
| /// Instrument MMIO using the per-region object model. | ||
| /// Each region in \p regions must have been previously registered via | ||
| /// `--mmio-region`. The regions must not overlap. |
There was a problem hiding this comment.
The documentation comment states "Each region in \p regions must have been previously registered via --mmio-region", but this is misleading. The regions parameter contains the already-parsed region specifications; they don't need to be "previously registered". The function itself creates the region objects in the symbol table. Consider rephrasing to "Each region in \p regions specifies an MMIO region to be instrumented" or similar.
| // Handle reads | ||
| if(deref_expr_r.size() == 1) | ||
| { | ||
| const dereference_exprt &d = *deref_expr_r.begin(); | ||
| source_locationt source_location = it->source_location(); | ||
|
|
||
| exprt addr_as_int = typecast_exprt::conditional_cast( | ||
| d.pointer(), size_type()); | ||
|
|
||
| exprt mmio_access = get_mmio_object_for_address( | ||
| addr_as_int, d.type(), source_location); | ||
|
|
||
| if_exprt if_expr( | ||
| integer_address(d.pointer()), mmio_access, d); | ||
|
|
||
| replace_expr(d, if_expr, a_rhs); | ||
| ++reads_replaced; | ||
| } | ||
|
|
||
| // Handle writes | ||
| if(a_lhs.id() == ID_dereference) |
There was a problem hiding this comment.
The code handles reads and writes sequentially for each assignment instruction. If an instruction has both a dereference read on the RHS and a dereference write on the LHS (e.g., *p1 = *p2), the read instrumentation is applied first (line 135-151), modifying a_rhs. Then the write instrumentation checks if the LHS is still a dereference (line 154). However, this should work correctly because the read instrumentation only modifies the RHS expression, not the LHS. This is correct behavior, but consider adding a comment to clarify this scenario.
| if( | ||
| a.start_address < b.start_address + b.size && | ||
| b.start_address < a.start_address + a.size) |
There was a problem hiding this comment.
The overlap check at line 435 computes a.start_address + a.size and b.start_address + b.size. If a region has a very large start_address and size such that their sum overflows the mp_integer type, this could lead to incorrect overlap detection. While mp_integer can represent arbitrarily large numbers in practice, it would be good to validate that start_address + size doesn't wrap around the address space (e.g., for a 64-bit address space, ensure start_address + size <= 2^64).
src/goto-programs/mm_io.cpp
Outdated
| mp_integer start_address = | ||
| (addr_str.find("0x") == 0 || addr_str.find("0X") == 0) | ||
| ? string2integer(addr_str.substr(2), 16) | ||
| : string2integer(addr_str); | ||
|
|
||
| mp_integer size = | ||
| (size_str.find("0x") == 0 || size_str.find("0X") == 0) | ||
| ? string2integer(size_str.substr(2), 16) | ||
| : string2integer(size_str); |
There was a problem hiding this comment.
The parsing logic does not validate that the parsed address and size are non-negative and non-zero. If a user provides a negative address (e.g., "-100:256") or a zero/negative size (e.g., "0x1000:0" or "0x1000:-10"), the code will accept these invalid values. Additionally, string2integer returns 0 for invalid input strings (e.g., "xyz:abc"), which would create a region at address 0 with size 0 without any error. Consider adding validation after parsing to ensure: 1) start_address >= 0, 2) size > 0, and 3) that string2integer successfully parsed valid numeric values.
src/goto-programs/mm_io.cpp
Outdated
| @@ -66,6 +101,7 @@ mm_iot::mm_iot(symbol_table_baset &symbol_table) | |||
|
|
|||
| if(const auto mm_io_w_symbol = symbol_table.lookup(id_w)) | |||
| mm_io_w = mm_io_w_symbol->symbol_expr(); | |||
|
|
|||
| } | |||
|
|
|||
| static std::set<dereference_exprt> collect_deref_expr(const exprt &src) | |||
| @@ -80,6 +116,179 @@ static std::set<dereference_exprt> collect_deref_expr(const exprt &src) | |||
|
|
|||
| void mm_iot::mm_io(goto_functionst::goto_functiont &goto_function) | |||
| { | |||
| // Use per-region object model if regions are specified | |||
| if(!regions.empty()) | |||
| { | |||
| // Instrument with per-region object model | |||
| for(auto it = goto_function.body.instructions.begin(); | |||
| it != goto_function.body.instructions.end(); | |||
| it++) | |||
| { | |||
| if(!it->is_assign()) | |||
| continue; | |||
|
|
|||
| auto &a_lhs = it->assign_lhs(); | |||
| auto &a_rhs = it->assign_rhs_nonconst(); | |||
| const auto deref_expr_r = collect_deref_expr(a_rhs); | |||
|
|
|||
| // Handle reads | |||
| if(deref_expr_r.size() == 1) | |||
| { | |||
| const dereference_exprt &d = *deref_expr_r.begin(); | |||
| source_locationt source_location = it->source_location(); | |||
|
|
|||
| exprt addr_as_int = typecast_exprt::conditional_cast( | |||
| d.pointer(), size_type()); | |||
|
|
|||
| exprt mmio_access = get_mmio_object_for_address( | |||
| addr_as_int, d.type(), source_location); | |||
|
|
|||
| if_exprt if_expr( | |||
| integer_address(d.pointer()), mmio_access, d); | |||
|
|
|||
| replace_expr(d, if_expr, a_rhs); | |||
| ++reads_replaced; | |||
| } | |||
|
|
|||
| // Handle writes | |||
| if(a_lhs.id() == ID_dereference) | |||
| { | |||
| const dereference_exprt &d = to_dereference_expr(a_lhs); | |||
| source_locationt source_location = it->source_location(); | |||
|
|
|||
| exprt addr_as_int = typecast_exprt::conditional_cast( | |||
| d.pointer(), size_type()); | |||
|
|
|||
| // We build the following structure after the original | |||
| // assignment (which we then erase): | |||
| // | |||
| // IF !integer_address(ptr) GOTO lbl_orig | |||
| // IF in_region_0 GOTO lbl_r0 | |||
| // IF in_region_1 GOTO lbl_r1 | |||
| // ... | |||
| // GOTO lbl_end // not in any known region | |||
| // lbl_r0: region0[off] = rhs; GOTO lbl_end | |||
| // lbl_r1: region1[off] = rhs; GOTO lbl_end | |||
| // ... | |||
| // lbl_orig: *ptr = rhs; GOTO lbl_end | |||
| // lbl_end: SKIP | |||
|
|
|||
| // First pass: build region assignments in a temporary | |||
| // program and collect targets for the dispatch GOTOs. | |||
| goto_programt region_code; | |||
| struct region_info | |||
| { | |||
| exprt condition; | |||
| goto_programt::targett label; | |||
| }; | |||
| std::vector<region_info> region_gotos; | |||
|
|
|||
| for(const auto ®ion : regions) | |||
| { | |||
| const symbolt *sym = | |||
| symbol_table.lookup(region.object_name); | |||
| if(!sym) | |||
| continue; | |||
|
|
|||
| binary_predicate_exprt lower( | |||
| addr_as_int, | |||
| ID_ge, | |||
| from_integer( | |||
| region.start_address, addr_as_int.type())); | |||
| binary_predicate_exprt upper( | |||
| addr_as_int, | |||
| ID_lt, | |||
| from_integer( | |||
| region.start_address + region.size, | |||
| addr_as_int.type())); | |||
| and_exprt in_range(lower, upper); | |||
|
|
|||
| minus_exprt offset( | |||
| addr_as_int, | |||
| from_integer( | |||
| region.start_address, addr_as_int.type())); | |||
|
|
|||
| index_exprt idx(sym->symbol_expr(), offset); | |||
| exprt rhs_cast = typecast_exprt::conditional_cast( | |||
| a_rhs, idx.type()); | |||
|
|
|||
| auto lbl_region = region_code.add( | |||
| goto_programt::make_assignment( | |||
| idx, rhs_cast, source_location)); | |||
| // goto lbl_end added below after we know the target | |||
|
|
|||
| region_gotos.push_back({std::move(in_range), lbl_region}); | |||
| } | |||
|
|
|||
| // Original dereference write | |||
| auto lbl_orig = region_code.add( | |||
| goto_programt::make_assignment( | |||
| d, a_rhs, source_location)); | |||
| // goto lbl_end added below | |||
|
|
|||
| // lbl_end | |||
| auto lbl_end = region_code.add( | |||
| goto_programt::make_skip(source_location)); | |||
|
|
|||
| // Now insert GOTO lbl_end after each region assignment | |||
| // and after lbl_orig. We do this by iterating and | |||
| // inserting before the next instruction. | |||
| for(auto ri = region_code.instructions.begin(); | |||
| ri != lbl_end;) | |||
| { | |||
| if(ri->is_assign()) | |||
| { | |||
| auto next_ri = std::next(ri); | |||
| region_code.instructions.insert( | |||
| next_ri, | |||
| goto_programt::make_goto(lbl_end, source_location)); | |||
| ri = next_ri; | |||
| } | |||
| else | |||
| ++ri; | |||
| } | |||
|
|
|||
| // Build the dispatch block | |||
| goto_programt result; | |||
|
|
|||
| // Guard: not integer address → original dereference | |||
| result.add(goto_programt::make_goto( | |||
| lbl_orig, | |||
| not_exprt(integer_address(d.pointer())), | |||
| source_location)); | |||
|
|
|||
| // Conditional jumps to each region | |||
| for(const auto &rg : region_gotos) | |||
| { | |||
| result.add(goto_programt::make_goto( | |||
| rg.label, rg.condition, source_location)); | |||
| } | |||
|
|
|||
| // Fall-through: address not in any region → skip | |||
| result.add( | |||
| goto_programt::make_goto(lbl_end, source_location)); | |||
|
|
|||
| // Append region assignments + original + end | |||
| result.destructive_append(region_code); | |||
|
|
|||
| // Splice into the function body after `it` | |||
| auto next = std::next(it); | |||
| goto_function.body.destructive_insert( | |||
| next, result); | |||
|
|
|||
| // Erase the original assignment | |||
| goto_function.body.instructions.erase(it); | |||
| // Continue iteration from lbl_end | |||
| it = lbl_end; | |||
|
|
|||
| ++writes_replaced; | |||
| } | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
Trailing whitespace detected on lines 56, 104, and 287. Please remove the trailing spaces to maintain consistent code formatting.
src/goto-programs/mm_io.h
Outdated
| mp_integer start_address; ///< First byte address of the region | ||
| mp_integer size; ///< Size of the region in bytes | ||
| irep_idt object_name; ///< Symbol name (e.g. `__CPROVER_mmio_region_0x1000`) |
There was a problem hiding this comment.
Inconsistent alignment in the Doxygen comments for struct members. Line 65 has fewer spaces before the comment delimiter compared to lines 63-64. Consider aligning the comment delimiters consistently for better readability.
Refactor MMIO modeling to use per-region byte-array objects instead of a single unbounded __CPROVER_memory array. Each --mmio-region addr:size declaration creates an individual byte-array symbol in the symbol table. Reads from constant addresses within a region are mapped directly to the corresponding array element. Symbolic addresses produce a chain of conditionals over all declared regions. Writes are replaced by a conditional GOTO dispatch that directs the store to the matching region. Overlapping regions are rejected at parse time. The --mmio-region option can be used standalone or combined with --mmio (which adds callback-based read/write handlers via __CPROVER_mm_io_r/w). Includes man page, CProver manual, and Doxygen documentation. Includes regression tests (per_region, overlap, bad_format, symbolic_addr, offset_access) and unit tests. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add --mmio-region to cbmc so that per-region MMIO instrumentation is available without requiring goto-instrument as a separate step. The option is handled in process_goto_program, which is shared by cbmc, goto-analyzer, goto-diff, and the C++ API. Per-region instrumentation runs before the callback model so that declared regions get precise array-backed modeling while callbacks can still handle remaining dereferences. Extract parse_mmio_regions() into mm_io.h/mm_io.cpp to eliminate duplicated region-parsing and overlap-checking logic. Add cmdline.desc tests for all existing tests that use __CPROVER_allocated_memory (memory_allocation1/2, Pointer28, union12, pointer-overflow2, cbmc-incr-smt2/pointer_from_int) with #ifndef CMDLINE guards so the same source works with both mechanisms. Includes cbmc man page entry and updated CProver manual and goto-instrument man page with cross-references. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Emit a warning at analysis time when __CPROVER_allocated_memory is encountered, directing users to --mmio-region instead. Update the CProver manual to list --mmio-region as the recommended mechanism and mark __CPROVER_allocated_memory as deprecated. Remove the dead code branch in linker_script_merge.cpp that used __CPROVER_allocated_memory. The active implementation already creates proper array symbols without relying on the deprecated function. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
6f275f7 to
e93a3f6
Compare
This new option will enable us to eventually drop support for
__CPROVER_allocated_memory, which