Skip to content

[fix](lazy topn) V2: Fix slot-not-found after PullUpProjectExprUnderTopN with chained expressions #64486

Open
englefly wants to merge 2 commits into
apache:masterfrom
englefly:topn-expr-slot-notfound-v2
Open

[fix](lazy topn) V2: Fix slot-not-found after PullUpProjectExprUnderTopN with chained expressions #64486
englefly wants to merge 2 commits into
apache:masterfrom
englefly:topn-expr-slot-notfound-v2

Conversation

@englefly

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@englefly

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29320 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e39e9577bfc587ce24435073a2cde390a94d894d, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17643	4044	3968	3968
q2	q3	10836	1397	831	831
q4	4680	480	338	338
q5	7521	866	587	587
q6	184	176	145	145
q7	782	831	662	662
q8	9395	1594	1721	1594
q9	6727	4553	4529	4529
q10	6791	1804	1549	1549
q11	432	268	266	266
q12	652	436	294	294
q13	18136	3438	2786	2786
q14	270	261	236	236
q15	q16	841	771	711	711
q17	1138	951	895	895
q18	7066	5745	5644	5644
q19	1545	1315	1100	1100
q20	531	409	265	265
q21	6141	2850	2611	2611
q22	460	388	309	309
Total cold run time: 101771 ms
Total hot run time: 29320 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5062	4911	4823	4823
q2	q3	5139	5321	4696	4696
q4	2169	2280	1396	1396
q5	4957	4766	4821	4766
q6	233	196	140	140
q7	1864	1766	1559	1559
q8	2486	2214	1968	1968
q9	7465	7565	7461	7461
q10	4769	4696	4273	4273
q11	551	384	355	355
q12	739	756	534	534
q13	3016	3442	2857	2857
q14	264	277	258	258
q15	q16	685	713	628	628
q17	1298	1297	1276	1276
q18	7346	6978	6761	6761
q19	1113	1087	1069	1069
q20	2229	2233	1945	1945
q21	5383	4630	4560	4560
q22	546	486	457	457
Total cold run time: 57314 ms
Total hot run time: 51782 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169457 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e39e9577bfc587ce24435073a2cde390a94d894d, data reload: false

query5	4312	637	490	490
query6	430	194	175	175
query7	4891	597	306	306
query8	358	219	201	201
query9	8760	4098	4107	4098
query10	457	310	262	262
query11	5910	2357	2189	2189
query12	148	102	98	98
query13	1300	594	440	440
query14	6403	5441	5084	5084
query14_1	4406	4421	4402	4402
query15	204	200	179	179
query16	994	460	438	438
query17	1139	702	594	594
query18	2449	467	347	347
query19	202	185	149	149
query20	125	109	110	109
query21	224	139	119	119
query22	13721	13513	13412	13412
query23	17518	16581	16264	16264
query23_1	16276	16379	16268	16268
query24	7520	1790	1313	1313
query24_1	1309	1287	1314	1287
query25	535	432	364	364
query26	1320	333	162	162
query27	2662	557	335	335
query28	4439	2031	2007	2007
query29	1062	612	477	477
query30	313	234	197	197
query31	1119	1069	963	963
query32	110	59	59	59
query33	518	313	252	252
query34	1187	1134	654	654
query35	739	793	678	678
query36	1364	1382	1230	1230
query37	155	106	90	90
query38	3204	3140	3073	3073
query39	943	907	908	907
query39_1	909	885	885	885
query40	225	124	99	99
query41	63	61	60	60
query42	95	92	91	91
query43	330	327	286	286
query44	
query45	193	184	177	177
query46	1136	1203	746	746
query47	2425	2385	2250	2250
query48	405	419	300	300
query49	625	477	341	341
query50	997	347	259	259
query51	4386	4338	4269	4269
query52	86	86	75	75
query53	251	274	189	189
query54	260	210	188	188
query55	78	81	67	67
query56	232	211	218	211
query57	1427	1413	1322	1322
query58	235	209	205	205
query59	1553	1638	1407	1407
query60	277	246	228	228
query61	151	184	150	150
query62	700	641	587	587
query63	234	190	179	179
query64	2541	770	596	596
query65	
query66	1790	452	335	335
query67	29712	29565	29607	29565
query68	
query69	418	301	260	260
query70	998	937	921	921
query71	291	229	219	219
query72	2854	2779	2359	2359
query73	835	819	450	450
query74	5094	4976	4772	4772
query75	2625	2556	2260	2260
query76	2346	1156	776	776
query77	354	363	289	289
query78	12392	12379	11878	11878
query79	1321	1071	742	742
query80	512	472	392	392
query81	450	278	242	242
query82	234	155	118	118
query83	267	276	247	247
query84	
query85	800	496	398	398
query86	325	300	272	272
query87	3352	3310	3260	3260
query88	3662	2737	2716	2716
query89	405	382	331	331
query90	2146	174	176	174
query91	169	155	131	131
query92	65	58	58	58
query93	1422	1484	954	954
query94	541	364	304	304
query95	668	399	430	399
query96	1107	778	367	367
query97	2706	2689	2574	2574
query98	208	203	216	203
query99	1134	1166	1031	1031
Total cold run time: 250029 ms
Total hot run time: 169457 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 86.30% (63/73) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 87.67% (64/73) 🎉
Increment coverage report
Complete coverage report

@englefly

Copy link
Copy Markdown
Contributor Author

run buildall

@englefly englefly marked this pull request as ready for review June 14, 2026 12:30
@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29047 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b173f31cdd9ef3635440ce7736c23abff00cf4df, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17672	4073	4172	4073
q2	q3	10820	1366	793	793
q4	4681	468	340	340
q5	7500	852	585	585
q6	177	172	134	134
q7	746	854	632	632
q8	9336	1696	1570	1570
q9	5860	4511	4517	4511
q10	6783	1805	1535	1535
q11	429	263	243	243
q12	628	424	290	290
q13	18114	3432	2811	2811
q14	260	262	245	245
q15	q16	825	793	709	709
q17	928	940	851	851
q18	6827	5739	5716	5716
q19	1182	1172	1075	1075
q20	526	404	251	251
q21	5648	2606	2387	2387
q22	423	357	296	296
Total cold run time: 99365 ms
Total hot run time: 29047 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4542	4446	4287	4287
q2	q3	4539	4966	4350	4350
q4	2090	2243	1638	1638
q5	4499	4305	4321	4305
q6	227	169	127	127
q7	1772	1659	1759	1659
q8	2742	2210	2158	2158
q9	8105	8396	7940	7940
q10	4848	4762	4297	4297
q11	581	412	387	387
q12	756	753	541	541
q13	3341	3649	2921	2921
q14	311	319	274	274
q15	q16	711	754	656	656
q17	1324	1306	1318	1306
q18	8006	7248	7277	7248
q19	1151	1137	1091	1091
q20	2217	2216	1931	1931
q21	5233	4515	4424	4424
q22	539	446	404	404
Total cold run time: 57534 ms
Total hot run time: 51944 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169006 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b173f31cdd9ef3635440ce7736c23abff00cf4df, data reload: false

query5	4300	624	478	478
query6	434	192	174	174
query7	4904	553	315	315
query8	362	212	198	198
query9	8772	4000	3989	3989
query10	442	303	251	251
query11	5906	2335	2192	2192
query12	150	99	103	99
query13	1276	585	378	378
query14	6368	5402	5039	5039
query14_1	4382	4351	4384	4351
query15	207	196	177	177
query16	1041	459	459	459
query17	1122	699	571	571
query18	2713	470	349	349
query19	202	186	143	143
query20	115	107	105	105
query21	222	141	117	117
query22	13639	13483	13333	13333
query23	17358	16440	16181	16181
query23_1	16232	16303	16261	16261
query24	7505	1731	1289	1289
query24_1	1300	1293	1314	1293
query25	584	459	389	389
query26	1318	312	173	173
query27	2658	586	332	332
query28	4445	2032	2039	2032
query29	1100	605	513	513
query30	311	237	203	203
query31	1121	1085	965	965
query32	111	62	59	59
query33	527	328	298	298
query34	1174	1139	652	652
query35	763	768	682	682
query36	1378	1395	1227	1227
query37	153	103	91	91
query38	3144	3116	3063	3063
query39	921	920	898	898
query39_1	884	887	867	867
query40	218	119	97	97
query41	63	60	60	60
query42	90	89	89	89
query43	317	323	273	273
query44	
query45	188	183	176	176
query46	1021	1186	754	754
query47	2363	2395	2240	2240
query48	378	372	293	293
query49	606	455	349	349
query50	998	366	257	257
query51	4330	4391	4202	4202
query52	87	87	74	74
query53	247	261	187	187
query54	273	215	188	188
query55	82	75	72	72
query56	239	226	209	209
query57	1420	1407	1311	1311
query58	243	210	203	203
query59	1620	1659	1462	1462
query60	273	241	222	222
query61	144	146	146	146
query62	698	650	587	587
query63	227	182	179	179
query64	2481	789	613	613
query65	
query66	1759	443	351	351
query67	29699	29668	29565	29565
query68	
query69	426	307	258	258
query70	981	882	937	882
query71	292	216	212	212
query72	2892	2671	2315	2315
query73	827	765	426	426
query74	5084	4917	4755	4755
query75	2647	2556	2240	2240
query76	2329	1150	759	759
query77	351	375	287	287
query78	12489	12584	11811	11811
query79	1470	1077	786	786
query80	605	459	381	381
query81	448	281	246	246
query82	584	155	120	120
query83	360	274	242	242
query84	
query85	858	491	439	439
query86	363	304	308	304
query87	3395	3338	3241	3241
query88	3582	2734	2698	2698
query89	409	385	331	331
query90	1986	174	177	174
query91	170	161	132	132
query92	63	57	55	55
query93	1535	1415	867	867
query94	545	353	323	323
query95	664	467	344	344
query96	1057	799	322	322
query97	2696	2719	2596	2596
query98	211	206	200	200
query99	1147	1170	1024	1024
Total cold run time: 250468 ms
Total hot run time: 169006 ms

@englefly

Copy link
Copy Markdown
Contributor Author

run cloud_p0

@englefly

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking correctness issue in the new bottom-up rewrite. The rule can now move aliases that canPullUp explicitly rejects when those aliases depend on a lower alias that was pulled up.

Critical checkpoint conclusions:

  • Goal/test: The PR aims to fix slot-not-found/chained-expression behavior for PullUpProjectExprUnderTopN; the added unit tests cover several nested TopN and rename cases, but miss the chained non-movable/volatile case below.
  • Scope: The change is focused to the Nereids rewrite and its unit test, but the replacement-map broadening changes semantic eligibility beyond the stated fix.
  • Concurrency/lifecycle/config/compatibility/persistence: Not applicable; this is a logical optimizer rewrite with no new config, storage, protocol, or transaction persistence changes.
  • Parallel paths: The affected path is the custom TopN expression pull-up rule used before lazy materialization; the lower lazy-materialization post processor still relies on this rule preserving expression movability constraints.
  • Tests: I attempted ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PullUpProjectExprUnderTopNTest, but it failed before test execution because thirdparty/installed/bin/protoc is missing in this runner.
  • Observability/performance: No additional observability needed; performance intent is reasonable once the semantic guard is restored.

User focus: No additional user-provided review focus was present.

@@ -237,37 +170,27 @@ public Plan visitLogicalTopN(LogicalTopN topN, CollectorContext context) {
* along the path from the TopN to the current node. An expression whose output ExprId
* is in this set cannot be pulled up past the operators that reference it.
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This records every alias in the replacement map, but later simplifyProject removes any alias whose inputs became unavailable and addUpperProject reconstructs it above the TopN. That bypasses canPullUp for aliases that were intentionally rejected. For example, with TopN -> Project(z = assert_true(x > 0)) -> Project(x = a + 1) -> Scan, x is pullable, so the lower project is simplified and x disappears from the upper project's child. Because z was also placed in pullUpExprReplaceMap, the upper project is removed and z = assert_true(a + 1 > 0) is restored above the TopN, even though AssertTrue is a NoneMovableFunction and canPullUp would have returned false for z. The same pattern applies to volatile/Score/L2Distance aliases. This can suppress errors or change semantics for rows discarded by TopN, so the replacement map needs to distinguish aliases that are safe to synthesize above TopN from aliases that must remain below it.

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 90.41% (66/73) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 90.41% (66/73) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants