Skip to content

[opt](maxcompute) Optimize split generation for LIMIT queries with partition equality predicates#60895

Open
morningman wants to merge 3 commits intoapache:masterfrom
morningman:opt-mc
Open

[opt](maxcompute) Optimize split generation for LIMIT queries with partition equality predicates#60895
morningman wants to merge 3 commits intoapache:masterfrom
morningman:opt-mc

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Feb 27, 2026

What problem does this PR solve?

When a MaxCompute query contains only partition equality predicates and
a LIMIT clause, use row_offset split strategy to read only the required
number of rows instead of generating splits for all data. This reduces
split count from potentially many to exactly one, improving query latency
for common LIMIT patterns like SELECT * FROM t WHERE pt='x' LIMIT N.

Key changes:

  • Add checkOnlyPartitionEqualityPredicate() to detect eligible queries
  • Add getSplitsWithLimitOptimization() using SplitByRowOffset with
    crossPartition=false, reading min(limit, totalRowCount) rows
  • Add session variable enable_mc_limit_split_optimization (default off)
  • Add timing logs for split generation phases to aid performance diagnosis
  • Add unit tests for predicate check and limit optimization logic
  • Add regression tests covering single/multi-partition tables, JOINs,
    aggregations, subqueries, window functions, and edge cases

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman morningman changed the title [opt](maxcompute) opt performance for query with limit [opt](maxcompute) Optimize split generation for LIMIT queries with partition equality predicates Feb 27, 2026
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28829 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dfb60731327e1a8cdbf45d7ee8d744f3318bc2ba, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17592	4485	4519	4485
q2	q3	10653	772	508	508
q4	4692	374	254	254
q5	7554	1193	1017	1017
q6	174	183	146	146
q7	784	856	657	657
q8	9296	1464	1307	1307
q9	4774	4764	4651	4651
q10	6742	1869	1645	1645
q11	461	257	233	233
q12	689	566	471	471
q13	17752	4204	3419	3419
q14	234	228	214	214
q15	919	783	790	783
q16	731	719	670	670
q17	717	862	411	411
q18	6015	5445	5192	5192
q19	1123	968	620	620
q20	497	486	398	398
q21	4799	2112	1476	1476
q22	378	340	272	272
Total cold run time: 96576 ms
Total hot run time: 28829 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4616	4538	4525	4525
q2	q3	1831	2217	1769	1769
q4	883	1182	777	777
q5	4028	4365	4284	4284
q6	180	174	136	136
q7	1732	1598	1519	1519
q8	2572	2704	2487	2487
q9	7398	7197	7388	7197
q10	2682	2853	2411	2411
q11	530	451	433	433
q12	492	596	443	443
q13	3957	4462	3639	3639
q14	282	305	293	293
q15	835	796	777	777
q16	698	753	718	718
q17	1150	1527	1304	1304
q18	7006	7042	6697	6697
q19	891	839	896	839
q20	2120	2224	2110	2110
q21	3895	3449	3334	3334
q22	484	414	362	362
Total cold run time: 48262 ms
Total hot run time: 46054 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184210 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit dfb60731327e1a8cdbf45d7ee8d744f3318bc2ba, data reload: false

query5	4982	623	509	509
query6	347	234	210	210
query7	4217	466	280	280
query8	343	238	232	232
query9	8759	2748	2727	2727
query10	550	357	330	330
query11	17023	16771	16695	16695
query12	198	131	125	125
query13	1262	447	353	353
query14	6682	3207	2990	2990
query14_1	2798	2804	2806	2804
query15	203	191	176	176
query16	986	471	464	464
query17	1081	718	612	612
query18	2711	447	348	348
query19	211	210	179	179
query20	141	128	126	126
query21	224	154	125	125
query22	5155	5923	5681	5681
query23	17631	17216	17098	17098
query23_1	17037	16919	16745	16745
query24	7079	1598	1259	1259
query24_1	1252	1237	1219	1219
query25	567	535	412	412
query26	1221	263	145	145
query27	2783	477	285	285
query28	4470	1834	1833	1833
query29	780	575	467	467
query30	303	244	209	209
query31	873	715	631	631
query32	79	67	69	67
query33	499	344	282	282
query34	936	896	559	559
query35	622	680	587	587
query36	1079	1127	971	971
query37	127	93	80	80
query38	2964	2901	2846	2846
query39	898	890	837	837
query39_1	830	832	827	827
query40	227	149	137	137
query41	64	60	99	60
query42	105	101	102	101
query43	396	382	346	346
query44	
query45	196	187	180	180
query46	881	974	606	606
query47	2114	2142	2076	2076
query48	303	323	222	222
query49	615	459	384	384
query50	666	273	211	211
query51	4090	4122	4056	4056
query52	106	106	95	95
query53	282	330	289	289
query54	312	266	256	256
query55	89	86	81	81
query56	315	317	302	302
query57	1366	1379	1287	1287
query58	289	269	274	269
query59	2604	2649	2568	2568
query60	349	330	318	318
query61	154	152	153	152
query62	627	576	538	538
query63	308	282	284	282
query64	4880	1288	1023	1023
query65	
query66	1375	472	354	354
query67	16364	16340	16258	16258
query68	
query69	393	301	282	282
query70	1006	988	936	936
query71	335	305	291	291
query72	2802	2743	2452	2452
query73	530	545	329	329
query74	10039	9886	9739	9739
query75	2838	2768	2480	2480
query76	2311	1049	661	661
query77	406	366	304	304
query78	11340	11494	10693	10693
query79	1151	793	589	589
query80	1374	623	554	554
query81	567	287	252	252
query82	1017	148	112	112
query83	361	262	256	256
query84	251	121	100	100
query85	928	486	452	452
query86	424	335	326	326
query87	3129	3095	2992	2992
query88	3597	2675	2667	2667
query89	431	369	336	336
query90	1996	172	168	168
query91	165	156	128	128
query92	73	69	70	69
query93	1025	831	495	495
query94	658	328	301	301
query95	585	396	312	312
query96	647	540	232	232
query97	2453	2494	2388	2388
query98	232	219	217	217
query99	1003	1004	917	917
Total cold run time: 254438 ms
Total hot run time: 184210 ms

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28723 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e46f793bfafe4e81ca257cc5df01afd60ac4ebb7, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17623	4458	4346	4346
q2	q3	10652	772	523	523
q4	4690	355	251	251
q5	7776	1216	1023	1023
q6	205	180	144	144
q7	798	842	673	673
q8	10538	1471	1344	1344
q9	5761	4786	4721	4721
q10	6823	1910	1616	1616
q11	462	245	228	228
q12	743	567	468	468
q13	17773	4198	3422	3422
q14	228	231	223	223
q15	974	788	794	788
q16	757	722	674	674
q17	732	865	388	388
q18	6055	5395	5280	5280
q19	1486	974	606	606
q20	503	484	392	392
q21	4448	1840	1368	1368
q22	342	279	245	245
Total cold run time: 99369 ms
Total hot run time: 28723 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4525	4362	4362	4362
q2	q3	1774	2185	1718	1718
q4	840	1160	774	774
q5	4033	4314	4319	4314
q6	179	177	142	142
q7	1714	1602	1482	1482
q8	2406	2664	2520	2520
q9	7750	7364	7360	7360
q10	2920	2927	2402	2402
q11	515	432	414	414
q12	582	593	448	448
q13	3920	4465	3631	3631
q14	274	300	291	291
q15	872	816	853	816
q16	740	794	734	734
q17	1156	1500	1322	1322
q18	6954	6828	6567	6567
q19	980	891	919	891
q20	2093	2272	2022	2022
q21	3990	3461	3336	3336
q22	462	424	382	382
Total cold run time: 48679 ms
Total hot run time: 45928 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184524 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e46f793bfafe4e81ca257cc5df01afd60ac4ebb7, data reload: false

query5	4785	645	505	505
query6	319	226	205	205
query7	4223	456	265	265
query8	327	241	256	241
query9	8754	2717	2691	2691
query10	550	394	330	330
query11	16928	16754	16559	16559
query12	189	125	127	125
query13	1280	459	352	352
query14	6466	3213	2984	2984
query14_1	2859	2844	2806	2806
query15	203	201	180	180
query16	997	475	449	449
query17	1065	712	618	618
query18	2603	458	340	340
query19	207	208	188	188
query20	138	131	134	131
query21	222	143	124	124
query22	4852	5932	5668	5668
query23	17612	17371	16991	16991
query23_1	17117	17061	17208	17061
query24	7639	1621	1238	1238
query24_1	1212	1214	1225	1214
query25	534	453	398	398
query26	1247	267	146	146
query27	2765	460	285	285
query28	4468	1851	1858	1851
query29	824	569	478	478
query30	321	250	206	206
query31	884	735	632	632
query32	83	75	68	68
query33	521	340	290	290
query34	917	919	563	563
query35	641	675	580	580
query36	1064	1137	1001	1001
query37	133	94	82	82
query38	2931	2963	2874	2874
query39	889	865	835	835
query39_1	828	835	817	817
query40	230	151	133	133
query41	64	59	58	58
query42	105	100	102	100
query43	369	373	343	343
query44	
query45	201	185	181	181
query46	889	993	588	588
query47	2133	2117	2077	2077
query48	305	308	235	235
query49	634	487	394	394
query50	686	288	219	219
query51	4056	4172	4038	4038
query52	103	105	96	96
query53	299	335	282	282
query54	292	263	266	263
query55	92	87	80	80
query56	313	312	291	291
query57	1347	1328	1286	1286
query58	286	270	271	270
query59	2610	2703	2559	2559
query60	342	335	315	315
query61	150	145	146	145
query62	651	592	546	546
query63	318	284	273	273
query64	4883	1258	963	963
query65	
query66	1438	452	350	350
query67	16441	16317	16407	16317
query68	
query69	392	285	282	282
query70	990	995	893	893
query71	330	308	286	286
query72	2744	2715	2732	2715
query73	538	541	326	326
query74	9940	9911	9792	9792
query75	2867	2779	2500	2500
query76	2320	1044	687	687
query77	363	364	316	316
query78	11247	11410	10680	10680
query79	1111	782	606	606
query80	1368	661	569	569
query81	549	316	251	251
query82	1017	154	113	113
query83	333	267	231	231
query84	248	121	96	96
query85	884	478	428	428
query86	419	310	300	300
query87	3110	3086	3023	3023
query88	3530	2642	2589	2589
query89	420	366	339	339
query90	1994	161	167	161
query91	160	156	133	133
query92	76	79	69	69
query93	905	825	496	496
query94	633	327	297	297
query95	584	347	385	347
query96	651	514	226	226
query97	2460	2526	2389	2389
query98	234	225	227	225
query99	1020	935	907	907
Total cold run time: 253814 ms
Total hot run time: 184524 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants