Skip to content

New option --mmio-region to specify memory regions#6747

Open
tautschnig wants to merge 3 commits intodiffblue:developfrom
tautschnig:feature/mmio-cmdline
Open

New option --mmio-region to specify memory regions#6747
tautschnig wants to merge 3 commits intodiffblue:developfrom
tautschnig:feature/mmio-cmdline

Conversation

@tautschnig
Copy link
Collaborator

This new option will enable us to eventually drop support for
__CPROVER_allocated_memory, which

  1. requires awkward scanning of goto functions in goto_check_c,
  2. wrongly suggests these declarations are limited to some scope, and
  3. yields a spurious undefined-function warning in symex.
  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@kroening
Copy link
Collaborator

Is that a 'check' or an 'instrumentation'?

@codecov
Copy link

codecov bot commented Mar 23, 2022

Codecov Report

❌ Patch coverage is 91.40893% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.04%. Comparing base (15eb10a) to head (e93a3f6).

Files with missing lines Patch % Lines
src/goto-programs/mm_io.cpp 88.26% 25 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6747      +/-   ##
===========================================
+ Coverage    80.00%   80.04%   +0.03%     
===========================================
  Files         1700     1702       +2     
  Lines       188252   188533     +281     
  Branches        73       73              
===========================================
+ Hits        150613   150910     +297     
+ Misses       37639    37623      -16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tautschnig
Copy link
Collaborator Author

Is that a 'check' or an 'instrumentation'?

It affects what checks are generated (instrumented?).

@kroening
Copy link
Collaborator

It affects what checks are generated (instrumented?).

The test goes as follows: how would the flow with goto-cc look like?

@tautschnig
Copy link
Collaborator Author

It affects what checks are generated (instrumented?).

The test goes as follows: how would the flow with goto-cc look like?

I see two possible approaches we could take, with the current PR implementing the first variant, but we could also go for option 2 (or any other approach that I might not be aware of at this time):

  1. goto-cc does not have awareness of MMIO regions. Checks added by goto_check_c are updated based on the information provided by --mmio-regions. Two invocations of CBMC on the same GOTO binary may use different MMIO regions and thereby may yield different verification outcomes.
  2. goto-cc gets the --mmio-regions flag and adds such information to the architecture configuration, storing it in the GOTO binary as we already do for other architecture/platform aspects. Requires updating the GOTO binary version and users might have to change their compilation workflow. The advantage, however, is that the implementation is simpler in that we wouldn't have to update properties after the fact.

@tautschnig tautschnig force-pushed the feature/mmio-cmdline branch from ac2eaf3 to fafe18a Compare July 6, 2022 10:21
@jimgrundy
Copy link
Collaborator

I see two possible approaches we could take, with the current PR implementing the first variant, but we could also go for option 2 (or any other approach that I might not be aware of at this time):

  1. goto-cc does not have awareness of MMIO regions. Checks added by goto_check_c are updated based on the information provided by --mmio-regions. Two invocations of CBMC on the same GOTO binary may use different MMIO regions and thereby may yield different verification outcomes.
  2. goto-cc gets the --mmio-regions flag and adds such information to the architecture configuration, storing it in the GOTO binary as we already do for other architecture/platform aspects. ...

If "2" seems more like the the right thing to do, then let's do the right thing rather than the easy thing.

@jimgrundy jimgrundy added aws Bugs or features of importance to AWS CBMC users aws-high and removed aws-high labels Aug 24, 2022
@danielsn
Copy link
Contributor

Can we make sure this PR also adds documentation to the user manual about the behaviour of MMIO?

@TGWDB
Copy link
Contributor

TGWDB commented Oct 6, 2022

Documentation here: #7157

@jimgrundy
Copy link
Collaborator

@kroening - now that we have the requested documentation can we have approval for this PR?

@kroening
Copy link
Collaborator

kroening commented Oct 9, 2022

CBMC should not get new options that change the model. Modeling should be done in goto-instrument, but not in CBMC. We should gradually remove model creation from CBMC.

@feliperodri feliperodri added the Kani Bugs or features of importance to Kani Rust Verifier label Oct 26, 2022
@esteffin
Copy link
Contributor

esteffin commented Nov 8, 2023

@tautschnig As a PR (#6748) depending on this is marked version 6, should this also be considered part of version 6?

@esteffin esteffin added the Version 6 Pull requests and issues requiring a major version bump label Nov 9, 2023
@esteffin esteffin removed the Version 6 Pull requests and issues requiring a major version bump label Nov 9, 2023
@NlightNFotis NlightNFotis added the Version 6 Pull requests and issues requiring a major version bump label Nov 14, 2023
@tautschnig tautschnig added Version 7 and removed Version 6 Pull requests and issues requiring a major version bump labels Nov 30, 2023
Copilot AI review requested due to automatic review settings February 24, 2026 10:13
@tautschnig tautschnig changed the title New option --mmio-regions to specify memory regions New option --mmio-region to specify memory regions Feb 24, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new --mmio-region command-line option for CBMC and goto-instrument that provides a more efficient and scalable way to model memory-mapped I/O regions. The implementation creates individual byte-array objects for each declared MMIO region, avoiding the performance issues of the existing single unbounded __CPROVER_memory array approach. The new per-region model can be combined with the existing callback model (--mmio) for maximum flexibility.

Changes:

  • Added new --mmio-region addr:size command-line option to both cbmc and goto-instrument
  • Implemented per-region MMIO instrumentation that creates individual byte-array symbols for each region
  • Added deprecation warning for __CPROVER_allocated_memory recommending migration to --mmio-region

Reviewed changes

Copilot reviewed 40 out of 40 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/goto-programs/mm_io.h Added API for per-region MMIO model including mmio_regiont struct and parse_mmio_regions function
src/goto-programs/mm_io.cpp Implemented per-region instrumentation logic with region parsing, overlap checking, and read/write dispatch
src/goto-programs/process_goto_program.cpp Integrated per-region MMIO processing into the CBMC pipeline
src/cbmc/cbmc_parse_options.h/.cpp Added --mmio-region command-line option to cbmc
src/goto-instrument/goto_instrument_parse_options.h/.cpp Added --mmio-region command-line option to goto-instrument
src/ansi-c/goto-conversion/goto_check_c.cpp Added deprecation warning for __CPROVER_allocated_memory
src/goto-cc/linker_script_merge.cpp Cleaned up outdated comments about __CPROVER_allocated_memory
unit/goto-programs/mm_io.cpp Added unit tests for per-region symbol creation and write instrumentation
unit/Makefile Added mm_io.cpp to unit test build
regression/goto-instrument/mmio_* Added 5 new regression tests covering basic functionality, offsets, symbolic addresses, bad format, and overlaps
regression/cbmc/mmio_per_region Added regression test for per-region model in cbmc
regression/cbmc/union12, memory_allocation*, Pointer28, pointer-overflow2 Added alternative cmdline.desc tests using --mmio-region
regression/cbmc-incr-smt2/pointers-conversions/pointer_from_int_cmdline.desc Added test variant using --mmio-region
doc/man/goto-instrument.1 Documented --mmio-region option with examples
doc/man/cbmc.1 Documented --mmio-region option with examples
doc/cprover-manual/modeling-mmio.md Updated user guide with comprehensive documentation of per-region model and deprecation notice

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 87 to 89
/// Instrument MMIO using the per-region object model.
/// Each region in \p regions must have been previously registered via
/// `--mmio-region`. The regions must not overlap.
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation comment states "Each region in \p regions must have been previously registered via --mmio-region", but this is misleading. The regions parameter contains the already-parsed region specifications; they don't need to be "previously registered". The function itself creates the region objects in the symbol table. Consider rephrasing to "Each region in \p regions specifies an MMIO region to be instrumented" or similar.

Copilot uses AI. Check for mistakes.
Comment on lines 134 to 154
// Handle reads
if(deref_expr_r.size() == 1)
{
const dereference_exprt &d = *deref_expr_r.begin();
source_locationt source_location = it->source_location();

exprt addr_as_int = typecast_exprt::conditional_cast(
d.pointer(), size_type());

exprt mmio_access = get_mmio_object_for_address(
addr_as_int, d.type(), source_location);

if_exprt if_expr(
integer_address(d.pointer()), mmio_access, d);

replace_expr(d, if_expr, a_rhs);
++reads_replaced;
}

// Handle writes
if(a_lhs.id() == ID_dereference)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code handles reads and writes sequentially for each assignment instruction. If an instruction has both a dereference read on the RHS and a dereference write on the LHS (e.g., *p1 = *p2), the read instrumentation is applied first (line 135-151), modifying a_rhs. Then the write instrumentation checks if the LHS is still a dereference (line 154). However, this should work correctly because the read instrumentation only modifies the RHS expression, not the LHS. This is correct behavior, but consider adding a comment to clarify this scenario.

Copilot uses AI. Check for mistakes.
Comment on lines +434 to +436
if(
a.start_address < b.start_address + b.size &&
b.start_address < a.start_address + a.size)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overlap check at line 435 computes a.start_address + a.size and b.start_address + b.size. If a region has a very large start_address and size such that their sum overflows the mp_integer type, this could lead to incorrect overlap detection. While mp_integer can represent arbitrarily large numbers in practice, it would be good to validate that start_address + size doesn't wrap around the address space (e.g., for a 64-bit address space, ensure start_address + size <= 2^64).

Copilot uses AI. Check for mistakes.
Comment on lines 407 to 415
mp_integer start_address =
(addr_str.find("0x") == 0 || addr_str.find("0X") == 0)
? string2integer(addr_str.substr(2), 16)
: string2integer(addr_str);

mp_integer size =
(size_str.find("0x") == 0 || size_str.find("0X") == 0)
? string2integer(size_str.substr(2), 16)
: string2integer(size_str);
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parsing logic does not validate that the parsed address and size are non-negative and non-zero. If a user provides a negative address (e.g., "-100:256") or a zero/negative size (e.g., "0x1000:0" or "0x1000:-10"), the code will accept these invalid values. Additionally, string2integer returns 0 for invalid input strings (e.g., "xyz:abc"), which would create a region at address 0 with size 0 without any error. Consider adding validation after parsing to ensure: 1) start_address >= 0, 2) size > 0, and 3) that string2integer successfully parsed valid numeric values.

Copilot uses AI. Check for mistakes.
Comment on lines 56 to 287
@@ -66,6 +101,7 @@ mm_iot::mm_iot(symbol_table_baset &symbol_table)

if(const auto mm_io_w_symbol = symbol_table.lookup(id_w))
mm_io_w = mm_io_w_symbol->symbol_expr();

}

static std::set<dereference_exprt> collect_deref_expr(const exprt &src)
@@ -80,6 +116,179 @@ static std::set<dereference_exprt> collect_deref_expr(const exprt &src)

void mm_iot::mm_io(goto_functionst::goto_functiont &goto_function)
{
// Use per-region object model if regions are specified
if(!regions.empty())
{
// Instrument with per-region object model
for(auto it = goto_function.body.instructions.begin();
it != goto_function.body.instructions.end();
it++)
{
if(!it->is_assign())
continue;

auto &a_lhs = it->assign_lhs();
auto &a_rhs = it->assign_rhs_nonconst();
const auto deref_expr_r = collect_deref_expr(a_rhs);

// Handle reads
if(deref_expr_r.size() == 1)
{
const dereference_exprt &d = *deref_expr_r.begin();
source_locationt source_location = it->source_location();

exprt addr_as_int = typecast_exprt::conditional_cast(
d.pointer(), size_type());

exprt mmio_access = get_mmio_object_for_address(
addr_as_int, d.type(), source_location);

if_exprt if_expr(
integer_address(d.pointer()), mmio_access, d);

replace_expr(d, if_expr, a_rhs);
++reads_replaced;
}

// Handle writes
if(a_lhs.id() == ID_dereference)
{
const dereference_exprt &d = to_dereference_expr(a_lhs);
source_locationt source_location = it->source_location();

exprt addr_as_int = typecast_exprt::conditional_cast(
d.pointer(), size_type());

// We build the following structure after the original
// assignment (which we then erase):
//
// IF !integer_address(ptr) GOTO lbl_orig
// IF in_region_0 GOTO lbl_r0
// IF in_region_1 GOTO lbl_r1
// ...
// GOTO lbl_end // not in any known region
// lbl_r0: region0[off] = rhs; GOTO lbl_end
// lbl_r1: region1[off] = rhs; GOTO lbl_end
// ...
// lbl_orig: *ptr = rhs; GOTO lbl_end
// lbl_end: SKIP

// First pass: build region assignments in a temporary
// program and collect targets for the dispatch GOTOs.
goto_programt region_code;
struct region_info
{
exprt condition;
goto_programt::targett label;
};
std::vector<region_info> region_gotos;

for(const auto &region : regions)
{
const symbolt *sym =
symbol_table.lookup(region.object_name);
if(!sym)
continue;

binary_predicate_exprt lower(
addr_as_int,
ID_ge,
from_integer(
region.start_address, addr_as_int.type()));
binary_predicate_exprt upper(
addr_as_int,
ID_lt,
from_integer(
region.start_address + region.size,
addr_as_int.type()));
and_exprt in_range(lower, upper);

minus_exprt offset(
addr_as_int,
from_integer(
region.start_address, addr_as_int.type()));

index_exprt idx(sym->symbol_expr(), offset);
exprt rhs_cast = typecast_exprt::conditional_cast(
a_rhs, idx.type());

auto lbl_region = region_code.add(
goto_programt::make_assignment(
idx, rhs_cast, source_location));
// goto lbl_end added below after we know the target

region_gotos.push_back({std::move(in_range), lbl_region});
}

// Original dereference write
auto lbl_orig = region_code.add(
goto_programt::make_assignment(
d, a_rhs, source_location));
// goto lbl_end added below

// lbl_end
auto lbl_end = region_code.add(
goto_programt::make_skip(source_location));

// Now insert GOTO lbl_end after each region assignment
// and after lbl_orig. We do this by iterating and
// inserting before the next instruction.
for(auto ri = region_code.instructions.begin();
ri != lbl_end;)
{
if(ri->is_assign())
{
auto next_ri = std::next(ri);
region_code.instructions.insert(
next_ri,
goto_programt::make_goto(lbl_end, source_location));
ri = next_ri;
}
else
++ri;
}

// Build the dispatch block
goto_programt result;

// Guard: not integer address → original dereference
result.add(goto_programt::make_goto(
lbl_orig,
not_exprt(integer_address(d.pointer())),
source_location));

// Conditional jumps to each region
for(const auto &rg : region_gotos)
{
result.add(goto_programt::make_goto(
rg.label, rg.condition, source_location));
}

// Fall-through: address not in any region → skip
result.add(
goto_programt::make_goto(lbl_end, source_location));

// Append region assignments + original + end
result.destructive_append(region_code);

// Splice into the function body after `it`
auto next = std::next(it);
goto_function.body.destructive_insert(
next, result);

// Erase the original assignment
goto_function.body.instructions.erase(it);
// Continue iteration from lbl_end
it = lbl_end;

++writes_replaced;
}
}

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace detected on lines 56, 104, and 287. Please remove the trailing spaces to maintain consistent code formatting.

Copilot uses AI. Check for mistakes.
Comment on lines 63 to 65
mp_integer start_address; ///< First byte address of the region
mp_integer size; ///< Size of the region in bytes
irep_idt object_name; ///< Symbol name (e.g. `__CPROVER_mmio_region_0x1000`)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent alignment in the Doxygen comments for struct members. Line 65 has fewer spaces before the comment delimiter compared to lines 63-64. Consider aligning the comment delimiters consistently for better readability.

Copilot uses AI. Check for mistakes.
tautschnig and others added 3 commits February 24, 2026 11:23
Refactor MMIO modeling to use per-region byte-array objects instead of
a single unbounded __CPROVER_memory array. Each --mmio-region addr:size
declaration creates an individual byte-array symbol in the symbol table.

Reads from constant addresses within a region are mapped directly to
the corresponding array element. Symbolic addresses produce a chain of
conditionals over all declared regions. Writes are replaced by a
conditional GOTO dispatch that directs the store to the matching region.

Overlapping regions are rejected at parse time. The --mmio-region
option can be used standalone or combined with --mmio (which adds
callback-based read/write handlers via __CPROVER_mm_io_r/w).

Includes man page, CProver manual, and Doxygen documentation.
Includes regression tests (per_region, overlap, bad_format,
symbolic_addr, offset_access) and unit tests.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add --mmio-region to cbmc so that per-region MMIO instrumentation is
available without requiring goto-instrument as a separate step. The
option is handled in process_goto_program, which is shared by cbmc,
goto-analyzer, goto-diff, and the C++ API.

Per-region instrumentation runs before the callback model so that
declared regions get precise array-backed modeling while callbacks
can still handle remaining dereferences.

Extract parse_mmio_regions() into mm_io.h/mm_io.cpp to eliminate
duplicated region-parsing and overlap-checking logic.

Add cmdline.desc tests for all existing tests that use
__CPROVER_allocated_memory (memory_allocation1/2, Pointer28, union12,
pointer-overflow2, cbmc-incr-smt2/pointer_from_int) with #ifndef
CMDLINE guards so the same source works with both mechanisms.

Includes cbmc man page entry and updated CProver manual and
goto-instrument man page with cross-references.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Emit a warning at analysis time when __CPROVER_allocated_memory is
encountered, directing users to --mmio-region instead.

Update the CProver manual to list --mmio-region as the recommended
mechanism and mark __CPROVER_allocated_memory as deprecated.

Remove the dead code branch in linker_script_merge.cpp that used
__CPROVER_allocated_memory. The active implementation already creates
proper array symbols without relying on the deprecated function.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

aws Bugs or features of importance to AWS CBMC users blocker Kani Bugs or features of importance to Kani Rust Verifier Property Instrumentation Version 7

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants