Skip to content

Conversation

@CalvinKirs
Copy link
Member

@CalvinKirs CalvinKirs commented Jan 12, 2026

What problem does this PR solve?

Aliyun OSS supports accessing buckets using a bucket domain name format, for example:

oss://bucket.endpoint/path

In the Hadoop ecosystem, support for this URI style differs:

JindoFS supports this format(JindoFS rewrites and extends AliyunOSSFileSystem)

It can correctly parse and handle oss://bucket.endpoint URIs

The community hadoop-aliyun implementation does not support this format

bucket.endpoint is treated as the bucket name

This leads to failures in request signing and endpoint resolution

In our system, OSS access is implemented on top of an S3-compatible protocol, which also cannot recognize bucket.endpoint as a valid bucket identifier.
Problem

When users provide a URI such as:

oss://bucket.endpoint/path

there are two incompatibilities:

The bucket part (bucket.endpoint) is not compatible with S3-style bucket parsing

hadoop-aliyun and S3-compatible implementations do not automatically split the bucket and endpoint

As a result, the same URI behaves differently across implementations, and fails in our current setup.

Solution

This PR introduces an explicit URI rewrite step to align our behavior with JindoFS:

When the scheme is oss:// or s3://

And the bucket is specified in the form bucket.endpoint

Rewrite it into the canonical form:

oss://bucket/path

HTTP and HTTPS URIs are returned unchanged to avoid affecting direct domain-based access.

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@CalvinKirs
Copy link
Member Author

run buildall

@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31034 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 45738091db3d19a7df893a57df6d557436d1a291, data reload: false

------ Round 1 ----------------------------------
q1	17630	4180	4053	4053
q2	2031	352	237	237
q3	10154	1269	685	685
q4	10234	892	319	319
q5	7551	2055	1850	1850
q6	187	170	140	140
q7	935	771	663	663
q8	9280	1380	1157	1157
q9	4835	4464	4452	4452
q10	6783	1831	1397	1397
q11	540	294	282	282
q12	684	734	575	575
q13	17763	3814	3030	3030
q14	283	310	269	269
q15	581	511	501	501
q16	677	659	626	626
q17	665	781	515	515
q18	6627	6464	6208	6208
q19	1147	957	572	572
q20	382	349	253	253
q21	2953	2325	2279	2279
q22	1040	1026	971	971
Total cold run time: 102962 ms
Total hot run time: 31034 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4084	4074	4042	4042
q2	325	404	317	317
q3	2087	2563	2227	2227
q4	1315	1734	1320	1320
q5	4051	3964	3996	3964
q6	213	174	133	133
q7	1864	1796	1655	1655
q8	2628	2571	2384	2384
q9	7281	7135	7178	7135
q10	2452	2724	2365	2365
q11	562	476	464	464
q12	739	834	625	625
q13	3666	4039	3265	3265
q14	288	312	408	312
q15	567	517	497	497
q16	645	695	627	627
q17	1124	1357	1361	1357
q18	8069	8110	7786	7786
q19	863	833	824	824
q20	1957	2320	1864	1864
q21	4751	4478	4139	4139
q22	1054	1006	981	981
Total cold run time: 50585 ms
Total hot run time: 48283 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172694 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 45738091db3d19a7df893a57df6d557436d1a291, data reload: false

query5	4380	576	450	450
query6	331	229	218	218
query7	4213	462	256	256
query8	329	249	229	229
query9	8732	2606	2644	2606
query10	516	375	325	325
query11	15238	15093	14877	14877
query12	174	117	116	116
query13	1264	499	388	388
query14	6263	2993	2751	2751
query14_1	2644	2659	2580	2580
query15	199	189	169	169
query16	1004	452	408	408
query17	1071	644	540	540
query18	2576	424	327	327
query19	217	215	190	190
query20	123	116	113	113
query21	215	140	115	115
query22	3911	4188	4055	4055
query23	16315	15486	15324	15324
query23_1	15420	15614	15562	15562
query24	7393	1554	1188	1188
query24_1	1170	1188	1180	1180
query25	539	461	388	388
query26	1238	255	152	152
query27	2783	442	278	278
query28	4584	2117	2124	2117
query29	827	580	434	434
query30	307	239	211	211
query31	793	618	546	546
query32	77	65	67	65
query33	529	330	283	283
query34	877	890	529	529
query35	732	745	664	664
query36	871	883	795	795
query37	120	89	81	81
query38	2754	2685	2659	2659
query39	779	759	742	742
query39_1	725	713	708	708
query40	219	130	114	114
query41	64	60	64	60
query42	104	106	104	104
query43	457	461	441	441
query44	1310	720	717	717
query45	189	184	172	172
query46	848	960	585	585
query47	1451	1478	1386	1386
query48	316	311	237	237
query49	602	418	315	315
query50	670	267	204	204
query51	3790	3829	3799	3799
query52	104	108	94	94
query53	293	331	274	274
query54	281	256	249	249
query55	78	77	68	68
query56	276	293	292	292
query57	1030	994	881	881
query58	267	258	243	243
query59	2151	2044	2018	2018
query60	330	316	295	295
query61	161	151	151	151
query62	400	352	311	311
query63	300	266	269	266
query64	5002	1327	980	980
query65	3739	3746	3688	3688
query66	1395	414	303	303
query67	14533	15925	14865	14865
query68	2699	1050	735	735
query69	443	359	314	314
query70	994	877	942	877
query71	323	305	267	267
query72	5829	3594	3613	3594
query73	622	715	294	294
query74	8755	8790	8600	8600
query75	2764	2764	2452	2452
query76	2382	1046	636	636
query77	375	362	280	280
query78	9748	9782	9188	9188
query79	1134	836	590	590
query80	1276	563	475	475
query81	541	262	230	230
query82	996	140	105	105
query83	340	256	231	231
query84	251	116	101	101
query85	923	509	454	454
query86	410	304	319	304
query87	2830	2817	2774	2774
query88	3306	2232	2205	2205
query89	402	347	313	313
query90	1970	148	147	147
query91	164	170	145	145
query92	67	66	61	61
query93	1105	891	527	527
query94	636	327	287	287
query95	559	384	296	296
query96	581	457	202	202
query97	2326	2361	2308	2308
query98	211	203	208	203
query99	582	582	510	510
Total cold run time: 247342 ms
Total hot run time: 172694 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 62.50% (15/24) 🎉
Increment coverage report
Complete coverage report

@CalvinKirs
Copy link
Member Author

run external

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants