StrictMetricsEvaluator::Evaluate currently returns ROWS_MUST_MATCH when data_file.record_count <= 0:
|
Result<bool> StrictMetricsEvaluator::Evaluate(const DataFile& data_file) const { |
|
if (data_file.record_count <= 0) { |
|
return kRowsMustMatch; |
|
} |
This is correct for record_count == 0, but not for record_count == -1. iceberg-cpp uses -1 as an unknown row-count sentinel when writer metrics do not include row_count:
- data writer:
|
auto data_file = std::make_shared<DataFile>(DataFile{ |
|
.content = DataFile::Content::kData, |
|
.file_path = options_.path, |
|
.file_format = options_.format, |
|
.partition = options_.partition, |
|
.record_count = metrics.row_count.value_or(-1), |
|
.file_size_in_bytes = length, |
- equality delete writer:
|
auto data_file = std::make_shared<DataFile>(DataFile{ |
|
.content = DataFile::Content::kEqualityDeletes, |
|
.file_path = options_.path, |
|
.file_format = options_.format, |
|
.partition = options_.partition, |
|
.record_count = metrics.row_count.value_or(-1), |
|
.file_size_in_bytes = length, |
- position delete writer:
|
auto data_file = std::make_shared<DataFile>(DataFile{ |
|
.content = DataFile::Content::kPositionDeletes, |
|
.file_path = options_.path, |
|
.file_format = options_.format, |
|
.partition = options_.partition, |
|
.record_count = metrics.row_count.value_or(-1), |
|
.file_size_in_bytes = length, |
Other code already treats negative record count as unknown/missing:
- inclusive metrics:
|
Result<bool> InclusiveMetricsEvaluator::Evaluate(const DataFile& data_file) const { |
|
if (data_file.record_count == 0) { |
|
return kRowCannotMatch; |
|
} |
|
if (data_file.record_count < 0) { |
|
// we haven't implemented parsing record count from avro file and thus set record |
|
// count -1 when importing avro tables to iceberg tables. This should be updated once |
|
// we implemented and set correct record count. |
|
return kRowsMightMatch; |
|
} |
- count aggregate:
|
Result<int64_t> CountStarAggregate::CountFor(const DataFile& file) const { |
|
if (!HasValue(file)) { |
|
return NotFound("Record count is missing"); |
|
} |
|
return file.record_count; |
|
} |
|
|
|
bool CountStarAggregate::HasValue(const DataFile& file) const { |
|
return file.record_count >= 0; |
|
} |
Impact: a file with unknown row count can be treated as if every row must match, even for predicates like AlwaysFalse.
Suggested fix: only special-case record_count == 0; let negative/unknown counts fall through to normal strict metrics evaluation.
A focused regression test with record_count = -1 and AlwaysFalse fails before changing <= 0 to == 0, and passes after.
StrictMetricsEvaluator::Evaluatecurrently returnsROWS_MUST_MATCHwhendata_file.record_count <= 0:iceberg-cpp/src/iceberg/expression/strict_metrics_evaluator.cc
Lines 534 to 537 in c0c6b01
This is correct for
record_count == 0, but not forrecord_count == -1. iceberg-cpp uses-1as an unknown row-count sentinel when writer metrics do not includerow_count:iceberg-cpp/src/iceberg/data/data_writer.cc
Lines 80 to 86 in c0c6b01
iceberg-cpp/src/iceberg/data/equality_delete_writer.cc
Lines 80 to 86 in c0c6b01
iceberg-cpp/src/iceberg/data/position_delete_writer.cc
Lines 140 to 146 in c0c6b01
Other code already treats negative record count as unknown/missing:
iceberg-cpp/src/iceberg/expression/inclusive_metrics_evaluator.cc
Lines 507 to 516 in c0c6b01
iceberg-cpp/src/iceberg/expression/aggregate.cc
Lines 379 to 388 in c0c6b01
Impact: a file with unknown row count can be treated as if every row must match, even for predicates like
AlwaysFalse.Suggested fix: only special-case
record_count == 0; let negative/unknown counts fall through to normal strict metrics evaluation.A focused regression test with
record_count = -1andAlwaysFalsefails before changing<= 0to== 0, and passes after.