[Bug](round function) Use shortest round-trip when serializing double to string#64577
[Bug](round function) Use shortest round-trip when serializing double to string#64577xy720 wants to merge 4 commits into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Automated review result: request changes.
Findings are inline. The main blocker is that the FE formatter no longer stays aligned with the BE runtime formatter for FLOAT, and the DOUBLE shortest-round-trip change is still incomplete for FE scientific notation.
Critical checkpoint conclusions:
- Goal/tests: the reported fixed-notation ROUND display case is covered, but the implementation does not fully accomplish FE/BE cast-to-string consistency.
- Scope/focus: mostly focused, but the FE FLOAT path changes behavior despite the PR being scoped to DOUBLE.
- Concurrency/lifecycle/config/storage/transactions/observability: not applicable to this formatter/test-only change.
- Parallel paths/compatibility: BE and FE parallel formatting paths remain inconsistent for FLOAT and scientific DOUBLE values.
- Test coverage/results: added regression coverage exercises the reported cases, but misses the inconsistent FLOAT/scientific DOUBLE paths, and the added suite drops its table after testing against the regression-test rules.
- Performance: the BE row-by-row double serialization path should keep the compiled fmt pattern.
- User focus: no additional user-provided review focus was supplied.
|
/review |
There was a problem hiding this comment.
Automated review result: no additional blocking issues found on the current head.
Current head appears to address the earlier inline concerns I saw in the existing review context: BE now uses FMT_COMPILE("{}"), the regression table is only dropped before use, FLOAT keeps the fixed 7-significant-digit path, and DOUBLE has a shortest-format scientific branch. I did not submit duplicate inline comments.
Critical checkpoint conclusions:
- Goal/tests: the PR targets incorrect DOUBLE-to-string exposure of IEEE-754 residuals for ROUND output. The changed BE/FE formatting paths and added regression cases cover the reported fixed-notation ROUND cases, including table-backed BE output and one-row FE output.
- Scope/focus: the change is small and focused on double serialization plus matching FE formatting and regression coverage.
- Concurrency/lifecycle/config/storage/transactions/observability: not applicable; this is formatter and regression-test code only.
- Parallel paths/compatibility: FE and BE double formatting paths now use the same fixed/scientific cutoff and shortest-round-trip value source for the cases checked; FLOAT behavior remains on the prior fixed-precision branch.
- Special checks/error handling/memory: no new Status/exception, allocation, nullable, or defensive-control-flow concern found in the modified code.
- Test coverage/results: the appended regression results are deterministic for the new queries and follow the table cleanup rule. I did not run the full regression suite in this review runner.
- Performance: the BE row serialization path keeps a compiled fmt pattern.
- User focus: no additional user-provided review focus was supplied.
|
run buildall |
TPC-H: Total hot run time: 29040 ms |
TPC-DS: Total hot run time: 175053 ms |
TPC-H: Total hot run time: 29310 ms |
TPC-DS: Total hot run time: 175619 ms |
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found one current-head correctness issue. The common round/cast cases are improved and the previous inline concerns about float formatting, the old FE scientific branch, BE compiled format, and the trailing regression-test drop are not duplicated here.
Critical checkpoint conclusions:
- Goal and proof: the PR aims to avoid exposing residual double tails when casting rounded doubles to strings. The added regression coverage proves the reported round cases, but edge finite double values still have an FE/BE cast-to-string mismatch.
- Scope: the implementation is small and focused on double string formatting plus tests.
- Concurrency, locks, lifecycle, config, persistence, observability: not materially involved in this change.
- Parallel paths: both BE runtime cast output and FE literal/constant-folding output are modified; the new FE path does not fully match BE for extreme finite doubles.
- Tests/results: regression output is updated for the reported cases, but it does not cover FE folding for
Double.MIN_VALUE/Double.MAX_VALUE. The current BE UT macOS failure I inspected is due to the runner using JDK 25 where JDK 17 is required, so I am not treating that CI failure as a PR code defect. - User focus: no additional user-provided review focus was supplied.
TPC-H: Total hot run time: 29449 ms |
|
/review |
TPC-DS: Total hot run time: 175411 ms |
There was a problem hiding this comment.
Review result: request changes.
New finding:
- BE unit expectations for CastToString double output were not updated; inline comment left.
Checkpoint conclusions:
- Goal/test: The goal is to avoid exposing double residual tails in cast-to-string/round display. The main round regression coverage is useful, but BE unit coverage for the changed formatter is stale, and existing review threads already cover remaining FE/BE edge-format consistency concerns.
- Scope: The change is focused on double string formatting and round regression coverage.
- Concurrency/lifecycle/config/compatibility: No new concurrency, lifecycle-managed state, configuration, storage format, or FE-BE protocol changes found.
- Parallel paths: BE runtime cast, FE legacy literal rendering, and Nereids literal rendering are all touched. Existing review threads still need the edge-value consistency concern resolved across these paths and affected expected outputs.
- Special conditions: Float-compatible FE rendering keeps the previous fixed-precision path; the double path is the changed behavior.
- Tests/results: New regression SQL is ordered where needed and no longer drops the table after the test, but the changed BE unit expectations are missing. I did not run the Doris test suites in this review environment.
- Observability/transactions/writes: Not applicable.
- Performance: The BE hot-path format uses FMT_COMPILE in current head; no additional hot-path concern found.
- User focus: .code-review.6GbyXi/review_focus.txt had no additional focus points.
| std::numeric_limits<double>::digits10 + 1); | ||
| // shortest round-trip, fixed-precision %g would expose IEEE-754 | ||
| // residual error e.g. round(23900/293, 2) -> "81.56999999999999". | ||
| end = fmt::format_to(buffer, FMT_COMPILE("{}"), value); |
There was a problem hiding this comment.
This changes CastToString::from_number<double> output, but the BE unit test expectations for this same path were not updated. be/test/exprs/function/cast/cast_to_string.cpp still expects 12345678901234567.12345 -> 1.234567890123457e+16, 123456789012345678.12345 -> 1.234567890123457e+17, and denorm_min -> 4.940656458412465e-324, while this PR's generated regression output changed those spellings to 1.2345678901234568e+16, 1.2345678901234568e+17, and 5e-324. The BE UT will fail once run. Please update the BE unit expectations, and any other intentional edge-value expected data, together with this formatter change.
What problem does this PR solve?
Problem Summary:
select round(23900/293, 2)returned81.56999999999999instead of81.57, and similar issue onselect round(45800/486,2)etc.The double-to-string formatters in BE (#54760) and FE both used fixed 16-digit
%g, which exposes the residual representation error of doubles whose decimal form is not exactly representable in binary.Float branch left unchanged (regression only affects double; int/int, bigint/bigint and double/double are all promoted to double in Nereids).
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)