[WIP] feat(lance): integrate lance-namespace API#27481
[WIP] feat(lance): integrate lance-namespace API#27481jja725 wants to merge 2 commits intoprestodb:masterfrom
Conversation
…pport Replace hand-rolled filesystem-based namespace operations in LanceNamespaceHolder with the proper LanceNamespace API from lance-namespace-core. This enables pluggable namespace implementations (dir, rest, glue, etc.) and aligns with the lance-trino connector design. Key changes: - LanceNamespaceHolder now uses LanceNamespace.connect() for table discovery, creation, and deletion - Table paths are resolved via namespace API instead of hardcoded filesystem conventions - LanceTableHandle/LanceWritableTableHandle carry tablePath and tableId resolved once in getTableHandle() - LanceConnectorFactory passes through all lance.* properties to the namespace implementation - Config property lance.root-url replaced by lance.root (passed through to namespace, not a dedicated config field) - Added lance.parent config for multi-level namespace support Co-Authored-By: Claude Opus 4.6 <[email protected]>
Reviewer's GuideIntegrates the generic lance-namespace API into the Presto Lance connector, replacing filesystem-specific logic with pluggable namespace implementations, threading tablePath/tableId through table and write handles, and updating configuration and tests to support flexible namespace backends. Sequence diagram for table handle resolution via LanceNamespacesequenceDiagram
actor PrestoEngine
participant LanceMetadata
participant LanceNamespaceHolder
participant LanceNamespace
PrestoEngine->>LanceMetadata: getTableHandle(session, schemaTableName)
LanceMetadata->>LanceMetadata: schemaExists(schemaName)
LanceMetadata->>LanceNamespaceHolder: schemaExists(schemaName)
LanceNamespaceHolder->>LanceNamespaceHolder: prestoSchemaToLanceNamespace(schemaName)
LanceNamespaceHolder->>LanceNamespace: namespaceExists(NamespaceExistsRequest)
LanceNamespace-->>LanceNamespaceHolder: namespaceExists response
LanceNamespaceHolder-->>LanceMetadata: boolean
alt schemaExists
LanceMetadata->>LanceNamespaceHolder: getTablePath(schemaName, tableName)
LanceNamespaceHolder->>LanceNamespaceHolder: getTableId(schemaName, tableName)
LanceNamespaceHolder->>LanceNamespace: describeTable(DescribeTableRequest)
LanceNamespace-->>LanceNamespaceHolder: DescribeTableResponse(location)
LanceNamespaceHolder-->>LanceMetadata: tablePath
LanceMetadata->>LanceNamespaceHolder: getTableId(schemaName, tableName)
LanceNamespaceHolder-->>LanceMetadata: tableId
LanceMetadata-->>PrestoEngine: LanceTableHandle(schemaName, tableName, tablePath, tableId)
else schemaMissingOrTableMissing
LanceMetadata-->>PrestoEngine: null
end
Class diagram for updated Lance namespace integrationclassDiagram
class LanceNamespaceHolder {
+String DEFAULT_SCHEMA
-BufferAllocator allocator
-LanceNamespace namespace
-boolean singleLevelNs
-Optional~List~ parentPrefix
-Map~String,String~ namespaceStorageOptions
+LanceNamespaceHolder(LanceConfig config, Map~String,String~ namespaceProperties)
+void shutdown()
+BufferAllocator getAllocator()
+LanceNamespace getNamespace()
+boolean isSingleLevelNs()
+List~String~ prestoSchemaToLanceNamespace(String schema)
+List~String~ addParentPrefix(List~String~ namespaceId)
+List~String~ getTableId(String schemaName, String tableName)
+List~String~ listSchemaNames()
+boolean schemaExists(String schema)
+String getTablePath(String schemaName, String tableName)
+boolean tableExists(String schemaName, String tableName)
+Map~String,String~ getStorageOptionsForTable(List~String~ tableId)
+Schema describeTable(String tablePath)
+List~String~ listTables(String schemaName)
+String createTable(String schemaName, String tableName, Schema arrowSchema)
+void dropTable(List~String~ tableId)
+void commitAppend(String tablePath, List~FragmentMetadata~ fragments)
+List~Fragment~ getFragments(String tablePath)
}
class LanceConfig {
-String impl
-boolean singleLevelNs
-String parent
-int readBatchSize
-int maxRowsPerFile
-int maxRowsPerGroup
-int writeBatchSize
+String getImpl()
+LanceConfig setImpl(String impl)
+boolean isSingleLevelNs()
+LanceConfig setSingleLevelNs(boolean singleLevelNs)
+String getParent()
+LanceConfig setParent(String parent)
}
class LanceConnectorFactory {
-Set~String~ KNOWN_CONFIG_PROPERTIES
+String getName()
+Connector create(String catalogName, Map~String,String~ config, ConnectorContext context)
}
class LanceMetadata {
-LanceNamespaceHolder namespaceHolder
-JsonCodec commitTaskDataCodec
+boolean schemaExists(ConnectorSession session, String schemaName)
+List~String~ listSchemaNames(ConnectorSession session)
+ConnectorTableHandle getTableHandle(ConnectorSession session, SchemaTableName tableName)
+ConnectorTableMetadata getTableMetadata(ConnectorSession session, ConnectorTableHandle table)
+List~SchemaTableName~ listTables(ConnectorSession session, Optional~String~ schemaName)
+Map~String,ColumnHandle~ getColumnHandles(ConnectorSession session, ConnectorTableHandle tableHandle)
+ConnectorOutputTableHandle beginCreateTable(ConnectorSession session, ConnectorTableMetadata tableMetadata, Optional~ConnectorTableHandle~ existingTableHandle)
+Optional~ConnectorOutputMetadata~ finishCreateTable(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
+ConnectorInsertTableHandle beginInsert(ConnectorSession session, ConnectorTableHandle tableHandle)
+Optional~ConnectorOutputMetadata~ finishInsert(ConnectorSession session, ConnectorInsertTableHandle insertHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
+void dropTable(ConnectorSession session, ConnectorTableHandle tableHandle)
}
class LanceTableHandle {
-String schemaName
-String tableName
-String tablePath
-List~String~ tableId
+LanceTableHandle(String schemaName, String tableName, String tablePath, List~String~ tableId)
+String getSchemaName()
+String getTableName()
+String getTablePath()
+List~String~ getTableId()
+int hashCode()
+boolean equals(Object obj)
+String toString()
}
class LanceWritableTableHandle {
-String schemaName
-String tableName
-String tablePath
-List~String~ tableId
-String schemaJson
-List~LanceColumnHandle~ inputColumns
+LanceWritableTableHandle(String schemaName, String tableName, String tablePath, List~String~ tableId, String schemaJson, List~LanceColumnHandle~ inputColumns)
+String getSchemaName()
+String getTableName()
+String getTablePath()
+List~String~ getTableId()
+String getSchemaJson()
+List~LanceColumnHandle~ getInputColumns()
+int hashCode()
+boolean equals(Object obj)
+String toString()
}
class LanceNamespaceProperties {
<<annotation>>
}
class LanceNamespace {
<<external>>
+static LanceNamespace connect(String impl, Map~String,String~ properties, BufferAllocator allocator)
+ListNamespacesResponse listNamespaces(ListNamespacesRequest request)
+void namespaceExists(NamespaceExistsRequest request)
+ListTablesResponse listTables(ListTablesRequest request)
+DescribeTableResponse describeTable(DescribeTableRequest request)
+CreateEmptyTableResponse createEmptyTable(CreateEmptyTableRequest request)
+void dropTable(DropTableRequest request)
}
LanceNamespaceHolder --> LanceNamespace : uses
LanceNamespaceHolder --> LanceConfig : configures
LanceMetadata --> LanceNamespaceHolder : delegates
LanceConnectorFactory --> LanceConfig : bootstraps
LanceConnectorFactory --> LanceNamespaceProperties : injects
LanceTableHandle --> LanceNamespaceHolder : tableId built by
LanceWritableTableHandle --> LanceTableHandle : created from
LanceNamespaceProperties <|.. annotation : binding
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
- Add tableId to equals/hashCode in LanceTableHandle and LanceWritableTableHandle (#1, prestodb#5) - Add logging to exception handlers in getTableMetadata and getColumnHandles instead of silently swallowing (#2, #3) - Cap Arrow allocator at 8 GB instead of unbounded (prestodb#4) - Return ImmutableMap from getStorageOptionsForTable (prestodb#8) - Document $ delimiter choice for parent prefix (prestodb#9) - Make prestoSchemaToLanceNamespace and addParentPrefix package-private (prestodb#10) - Validate lance.root is set when impl=dir (prestodb#11) - Move lance dependency versions to root pom dependencyManagement (prestodb#12) - Clean up namespace entry if Dataset.create fails (prestodb#7) - Add comments to KNOWN_CONFIG_PROPERTIES explaining sync requirement (prestodb#6) - Add TestLanceNamespaceHolder with tests for multi-level namespace and parent prefix behavior Co-Authored-By: Claude Opus 4.6 <[email protected]>
| if (!namespaceHolder.tableExists(lanceTable.getTableName())) { | ||
| return null; | ||
| try { | ||
| Schema arrowSchema = namespaceHolder.describeTable(lanceTable.getTablePath()); |
There was a problem hiding this comment.
Does this require describeTable endpoint to return arrowSchema?
There was a problem hiding this comment.
From lance schema spec:
The schema describes the structure of a Lance table, including all fields, their data types, and metadata. Schemas use a logical type system where data types are represented as strings that map to Apache Arrow data types. Each field in the schema has a unique identifier (field ID) that enables robust schema evolution and version tracking.
So I assume it's in arrow format
There was a problem hiding this comment.
Later on in line 114 we would take this arrow type and transfer to presto type
Summary
WIP - Integrate the
lance-namespaceAPI into the Presto Lance connector.LanceNamespaceHolderwith theLanceNamespaceAPI fromlance-namespace-corelance.implconfigLanceTableHandlenow carriestablePathandtableIdresolved once via namespace APIlance.root-urlreplaced bylance.root(passed through to namespace)Test plan
🤖 Generated with Claude Code
Summary by Sourcery
Integrate the Presto Lance connector with the LanceNamespace API for pluggable table namespaces and path resolution, replacing direct filesystem access and propagating namespace-aware table identifiers throughout metadata and write paths.
New Features:
Enhancements:
Build:
Tests: