Support Iceberg REST catalog storage credentials in ForwardingFileIo#28425
Support Iceberg REST catalog storage credentials in ForwardingFileIo#28425kaveti wants to merge 2 commits intotrinodb:masterfrom
Conversation
|
#Note: This change is the Trino-side prerequisite for Iceberg REST catalog storage credentials support. |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/fileio/ForwardingFileIo.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/fileio/ForwardingFileIo.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/fileio/ForwardingFileIo.java
Show resolved
Hide resolved
|
@findinpath i have addressed your review comments |
0a7c730 to
882184c
Compare
...test/java/io/trino/plugin/iceberg/catalog/rest/TestIcebergStorageCredentialsRestCatalog.java
Show resolved
Hide resolved
...eberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/TrinoIcebergRestCatalogFactory.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/TrinoRestCatalog.java
Outdated
Show resolved
Hide resolved
f6a6771 to
1e7f22e
Compare
|
@findinpath could you please review again |
1 similar comment
|
@findinpath could you please review again |
| for (StorageCredential credential : credentials) { | ||
| merged.putAll(credential.config()); | ||
| } | ||
| return merged.buildKeepingLast(); |
There was a problem hiding this comment.
This logic makes not much sense to me.
We have the storageCredentials additionally to the fileIo.properties() (considered as a fallback) - meaning that they're separate, not mixed.
There was a problem hiding this comment.
agree. we need to retain prefix scoped creds
| RESTSessionCatalog icebergCatalogInstance = new RESTSessionCatalog( | ||
| config -> HTTPClient.builder(config) | ||
| .uri(config.get(CatalogProperties.URI)) | ||
| .withHeaders(RESTUtil.configHeaders(config)) | ||
| .build(), | ||
| (context, config) -> { | ||
| ConnectorIdentity currentIdentity = (context.wrappedIdentity() != null) |
There was a problem hiding this comment.
It seems that the changes from apache/iceberg#12591 disregarded the fact that the ioBuilder is not receiving now the storageCredentials.
@nastra was it intentional to skip it during the implementation phase?
There was a problem hiding this comment.
@findinpath this was not intentional. I believe Trino is the only engine that goes through that path. Can you please open a PR with a fix?
There was a problem hiding this comment.
opened PR: apache/iceberg#15752 , cc @nastra
| : ConnectorIdentity.ofUser("fake"); | ||
| return fileIoFactory.create(fileSystemFactory.create(currentIdentity, config), true, config); | ||
| }); | ||
| null); |
There was a problem hiding this comment.
I think we should have a preparatory commit doing this change to see better whether it makes any sense to have it.
| @@ -45,15 +52,23 @@ | |||
| import static java.util.stream.Collectors.joining; | |||
|
|
|||
| public class ForwardingFileIo | |||
| implements SupportsBulkOperations | |||
| implements SupportsBulkOperations, SupportsStorageCredentials | |||
There was a problem hiding this comment.
I'm looking at S3FileIO
and am rather thinking that we should have a prefixed map of TrinoFileSystem actually (and probably the default one we used to have as fallback).
findinpath
left a comment
There was a problem hiding this comment.
I think the PR needs to be intensively reshaped to have it moving forward.
| warehouseLocation = Files.createTempDirectory(null); | ||
| closeAfterClass(() -> deleteRecursively(warehouseLocation, ALLOW_INSECURE)); | ||
|
|
||
| DelegatingRestSessionCatalog delegatingCatalog = DelegatingRestSessionCatalog.builder() |
There was a problem hiding this comment.
This was introduced as part of the Iceberg REST catalog spec and is used by catalogs like Unity Catalog, Polaris, and Gravitino.
Isn't there an actual implementation we can test against instead of relying on mocked code?
1e7f22e to
937535c
Compare
| private final String staticRegion; | ||
| private final String staticEndpoint; | ||
| private final boolean staticCrossRegionAccessEnabled; | ||
| private final Map<ClientKey, S3Client> dynamicClients; |
There was a problem hiding this comment.
This is not the right design. S3FileSystemLoader is already responsible for creating and managing clients. We shouldn't have a second system here for doing that.
|
Waiting on apache/iceberg#15752 Consider rebasing in the meantime on #26640 |
Make ForwardingFileIo implement SupportsStorageCredentials so that storage credentials from the Iceberg REST catalog LoadTableResponse are captured and their config is merged into properties(). Storage credential keys take precedence over base config properties.
This allows storage credential keys (e.g. s3.access-key-id, gcs.oauth2.token) to flow through io().properties() into fileIoProperties, where downstream consumers like
IcebergRestCatalogFileSystemFactory can use them.
Description
Support Iceberg REST catalog storage credentials in ForwardingFileIo
Why this change
Iceberg REST catalogs can return short-lived, scoped credentials alongside table metadata so that
the query engine can access the underlying storage (S3, GCS, ADLS, etc.) without needing long-lived
static credentials. There are two mechanisms for this in the Iceberg REST spec:
Config-based vended credentials — credential keys are embedded directly in the
configmapof the
loadTableresponse. Trino already supports this path today.Storage credentials — a dedicated
storage-credentialsfield in theloadTableresponsethat carries a list of
StorageCredentialobjects, each scoped to a storage prefix with itsown config map. This was introduced as part of the Iceberg REST catalog spec and is used by
catalogs like Unity Catalog, Polaris, and Gravitino.
Trino currently ignores storage credentials entirely. The Iceberg library's
RESTSessionCatalogcalls
FileIO.setCredentials()after loading a table, but Trino'sForwardingFileIodoes notimplement the
SupportsStorageCredentialsinterface — so those credentials are silently dropped.This means that any catalog server returning credentials via the
storage-credentialsfield(instead of or in addition to the
configmap) will fail with access denied errors when Trinotries to read the table data.
What this change does
ForwardingFileIonow implementsSupportsStorageCredentials. When the Iceberg library callssetCredentials(), the credential configs are captured and merged intoproperties().This is the same
io().properties()map that Trino already reads asfileIoPropertiesthroughoutthe Iceberg connector (splits, page sources, metadata operations). So all existing downstream
credential handling (e.g.
IcebergRestCatalogFileSystemFactoryfor S3) works without anyadditional changes.
Storage credential keys take precedence over base config properties when there is a conflict.
For example, if the base config has
s3.access-key-id=static-keyand a storage credentialhas
s3.access-key-id=vended-key, the vended key wins.Example
A REST catalog
loadTableresponse might include:{ "metadata": { ... }, "config": { "s3.access-key-id": "fallback-key" }, "storage-credentials": [ { "prefix": "s3://my-bucket/warehouse/", "config": { "s3.access-key-id": "AKIA...", "s3.secret-access-key": "...", "s3.session-token": "..." } } ] }Before this change, Trino would only see the
configmap (fallback-key). After this change,Trino sees the merged result where storage credentials take precedence (
AKIA...).This works the same way for GCS (
gcs.oauth2.token), Azure (adls.sas-token.*), or any otherprovider — the change is cloud-agnostic.
Spec references
LoadTableResultschema withstorage-credentialsfield:https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml
Additional context
#When storage-credentials is used vs config properties:
The Iceberg REST catalog has two mechanisms for vending credentials:
config map (older, ad-hoc) The REST server puts credential keys like s3.access-key-id, s3.secret-access-key directly into the config map of LoadTableResponse. This is what TestIcebergVendingRestCatalogConnectorSmokeTest exercises — the tabulario/iceberg-rest backend only supports this approach.
storage-credentials list (Iceberg REST Spec) The REST server returns a structured list of StorageCredential objects in LoadTableResponse.credentials(). Each credential has:
A prefix (e.g., s3://my-bucket/warehouse/) scoping the credential to a specific storage location
A config map with the actual credential key-value pairs
This enables:
Prefix-scoped credentials — least-privilege access per storage location
Multi-cloud support — different credentials for S3, GCS, ADLS in the same response
Short-lived, table-specific tokens — the catalog can vend narrowly-scoped STS tokens per table
#Which catalogs use storage-credentials?
Apache Polaris (Snowflake Open Catalog)
Unity Catalog (Databricks)
AWS Glue Data Catalog (REST endpoint)
Gravitino
The tabulario/iceberg-rest test backend predates this spec addition and only returns credentials via config, which is why you see properties being filled but not storageCredentials.
I've added TestIcebergStorageCredentialsRestCatalog which uses a mock REST catalog that injects storage-credentials into LoadTableResponse to validate this flow end-to-end. I've also updated the issue description with this clarification.
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: