Skip to content

[WIP] feat(lance): integrate lance-namespace API#27481

Draft
jja725 wants to merge 2 commits intoprestodb:masterfrom
jja725:worktree-lance-namespace
Draft

[WIP] feat(lance): integrate lance-namespace API#27481
jja725 wants to merge 2 commits intoprestodb:masterfrom
jja725:worktree-lance-namespace

Conversation

@jja725
Copy link
Copy Markdown
Contributor

@jja725 jja725 commented Apr 1, 2026

Summary

WIP - Integrate the lance-namespace API into the Presto Lance connector.

  • Replace hand-rolled filesystem namespace ops in LanceNamespaceHolder with the LanceNamespace API from lance-namespace-core
  • Enable pluggable namespace implementations (dir, rest, glue, etc.) via lance.impl config
  • Align with the lance-trino connector's namespace design
  • LanceTableHandle now carries tablePath and tableId resolved once via namespace API
  • Config property lance.root-url replaced by lance.root (passed through to namespace)

Test plan

  • All 18 existing unit tests pass
  • Integration test with REST namespace
  • End-to-end test with LanceQueryRunner

🤖 Generated with Claude Code

Summary by Sourcery

Integrate the Presto Lance connector with the LanceNamespace API for pluggable table namespaces and path resolution, replacing direct filesystem access and propagating namespace-aware table identifiers throughout metadata and write paths.

New Features:

  • Support pluggable Lance namespace implementations configured via connector properties and backed by the LanceNamespace API.
  • Expose schema and table identification through namespace-derived table paths and IDs in connector handles to avoid repeated namespace lookups.

Enhancements:

  • Refactor LanceNamespaceHolder to delegate schema, table lifecycle, and fragment operations to LanceNamespace instead of manual directory management.
  • Extend connector configuration to support multi-level namespaces via a parent prefix and to pass through arbitrary lance.* properties to the namespace implementation.
  • Adjust metadata, split, page source/sink, and writable handle logic to use resolved table paths and IDs rather than computing filesystem paths on demand.

Build:

  • Add lance-namespace-core and lance-namespace-apache-client dependencies and update tests and query runner configuration to use the new lance.root property instead of lance.root-url.

Tests:

  • Update existing unit tests and query runner helpers to construct handles and namespace holders via the new namespace-based configuration and APIs.

…pport

Replace hand-rolled filesystem-based namespace operations in
LanceNamespaceHolder with the proper LanceNamespace API from
lance-namespace-core. This enables pluggable namespace implementations
(dir, rest, glue, etc.) and aligns with the lance-trino connector design.

Key changes:
- LanceNamespaceHolder now uses LanceNamespace.connect() for table
  discovery, creation, and deletion
- Table paths are resolved via namespace API instead of hardcoded
  filesystem conventions
- LanceTableHandle/LanceWritableTableHandle carry tablePath and tableId
  resolved once in getTableHandle()
- LanceConnectorFactory passes through all lance.* properties to the
  namespace implementation
- Config property lance.root-url replaced by lance.root (passed through
  to namespace, not a dedicated config field)
- Added lance.parent config for multi-level namespace support

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Apr 1, 2026

Reviewer's Guide

Integrates the generic lance-namespace API into the Presto Lance connector, replacing filesystem-specific logic with pluggable namespace implementations, threading tablePath/tableId through table and write handles, and updating configuration and tests to support flexible namespace backends.

Sequence diagram for table handle resolution via LanceNamespace

sequenceDiagram
    actor PrestoEngine
    participant LanceMetadata
    participant LanceNamespaceHolder
    participant LanceNamespace

    PrestoEngine->>LanceMetadata: getTableHandle(session, schemaTableName)
    LanceMetadata->>LanceMetadata: schemaExists(schemaName)
    LanceMetadata->>LanceNamespaceHolder: schemaExists(schemaName)
    LanceNamespaceHolder->>LanceNamespaceHolder: prestoSchemaToLanceNamespace(schemaName)
    LanceNamespaceHolder->>LanceNamespace: namespaceExists(NamespaceExistsRequest)
    LanceNamespace-->>LanceNamespaceHolder: namespaceExists response
    LanceNamespaceHolder-->>LanceMetadata: boolean

    alt schemaExists
        LanceMetadata->>LanceNamespaceHolder: getTablePath(schemaName, tableName)
        LanceNamespaceHolder->>LanceNamespaceHolder: getTableId(schemaName, tableName)
        LanceNamespaceHolder->>LanceNamespace: describeTable(DescribeTableRequest)
        LanceNamespace-->>LanceNamespaceHolder: DescribeTableResponse(location)
        LanceNamespaceHolder-->>LanceMetadata: tablePath
        LanceMetadata->>LanceNamespaceHolder: getTableId(schemaName, tableName)
        LanceNamespaceHolder-->>LanceMetadata: tableId
        LanceMetadata-->>PrestoEngine: LanceTableHandle(schemaName, tableName, tablePath, tableId)
    else schemaMissingOrTableMissing
        LanceMetadata-->>PrestoEngine: null
    end
Loading

Class diagram for updated Lance namespace integration

classDiagram
    class LanceNamespaceHolder {
        +String DEFAULT_SCHEMA
        -BufferAllocator allocator
        -LanceNamespace namespace
        -boolean singleLevelNs
        -Optional~List~ parentPrefix
        -Map~String,String~ namespaceStorageOptions
        +LanceNamespaceHolder(LanceConfig config, Map~String,String~ namespaceProperties)
        +void shutdown()
        +BufferAllocator getAllocator()
        +LanceNamespace getNamespace()
        +boolean isSingleLevelNs()
        +List~String~ prestoSchemaToLanceNamespace(String schema)
        +List~String~ addParentPrefix(List~String~ namespaceId)
        +List~String~ getTableId(String schemaName, String tableName)
        +List~String~ listSchemaNames()
        +boolean schemaExists(String schema)
        +String getTablePath(String schemaName, String tableName)
        +boolean tableExists(String schemaName, String tableName)
        +Map~String,String~ getStorageOptionsForTable(List~String~ tableId)
        +Schema describeTable(String tablePath)
        +List~String~ listTables(String schemaName)
        +String createTable(String schemaName, String tableName, Schema arrowSchema)
        +void dropTable(List~String~ tableId)
        +void commitAppend(String tablePath, List~FragmentMetadata~ fragments)
        +List~Fragment~ getFragments(String tablePath)
    }

    class LanceConfig {
        -String impl
        -boolean singleLevelNs
        -String parent
        -int readBatchSize
        -int maxRowsPerFile
        -int maxRowsPerGroup
        -int writeBatchSize
        +String getImpl()
        +LanceConfig setImpl(String impl)
        +boolean isSingleLevelNs()
        +LanceConfig setSingleLevelNs(boolean singleLevelNs)
        +String getParent()
        +LanceConfig setParent(String parent)
    }

    class LanceConnectorFactory {
        -Set~String~ KNOWN_CONFIG_PROPERTIES
        +String getName()
        +Connector create(String catalogName, Map~String,String~ config, ConnectorContext context)
    }

    class LanceMetadata {
        -LanceNamespaceHolder namespaceHolder
        -JsonCodec commitTaskDataCodec
        +boolean schemaExists(ConnectorSession session, String schemaName)
        +List~String~ listSchemaNames(ConnectorSession session)
        +ConnectorTableHandle getTableHandle(ConnectorSession session, SchemaTableName tableName)
        +ConnectorTableMetadata getTableMetadata(ConnectorSession session, ConnectorTableHandle table)
        +List~SchemaTableName~ listTables(ConnectorSession session, Optional~String~ schemaName)
        +Map~String,ColumnHandle~ getColumnHandles(ConnectorSession session, ConnectorTableHandle tableHandle)
        +ConnectorOutputTableHandle beginCreateTable(ConnectorSession session, ConnectorTableMetadata tableMetadata, Optional~ConnectorTableHandle~ existingTableHandle)
        +Optional~ConnectorOutputMetadata~ finishCreateTable(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
        +ConnectorInsertTableHandle beginInsert(ConnectorSession session, ConnectorTableHandle tableHandle)
        +Optional~ConnectorOutputMetadata~ finishInsert(ConnectorSession session, ConnectorInsertTableHandle insertHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
        +void dropTable(ConnectorSession session, ConnectorTableHandle tableHandle)
    }

    class LanceTableHandle {
        -String schemaName
        -String tableName
        -String tablePath
        -List~String~ tableId
        +LanceTableHandle(String schemaName, String tableName, String tablePath, List~String~ tableId)
        +String getSchemaName()
        +String getTableName()
        +String getTablePath()
        +List~String~ getTableId()
        +int hashCode()
        +boolean equals(Object obj)
        +String toString()
    }

    class LanceWritableTableHandle {
        -String schemaName
        -String tableName
        -String tablePath
        -List~String~ tableId
        -String schemaJson
        -List~LanceColumnHandle~ inputColumns
        +LanceWritableTableHandle(String schemaName, String tableName, String tablePath, List~String~ tableId, String schemaJson, List~LanceColumnHandle~ inputColumns)
        +String getSchemaName()
        +String getTableName()
        +String getTablePath()
        +List~String~ getTableId()
        +String getSchemaJson()
        +List~LanceColumnHandle~ getInputColumns()
        +int hashCode()
        +boolean equals(Object obj)
        +String toString()
    }

    class LanceNamespaceProperties {
        <<annotation>>
    }

    class LanceNamespace {
        <<external>>
        +static LanceNamespace connect(String impl, Map~String,String~ properties, BufferAllocator allocator)
        +ListNamespacesResponse listNamespaces(ListNamespacesRequest request)
        +void namespaceExists(NamespaceExistsRequest request)
        +ListTablesResponse listTables(ListTablesRequest request)
        +DescribeTableResponse describeTable(DescribeTableRequest request)
        +CreateEmptyTableResponse createEmptyTable(CreateEmptyTableRequest request)
        +void dropTable(DropTableRequest request)
    }

    LanceNamespaceHolder --> LanceNamespace : uses
    LanceNamespaceHolder --> LanceConfig : configures
    LanceMetadata --> LanceNamespaceHolder : delegates
    LanceConnectorFactory --> LanceConfig : bootstraps
    LanceConnectorFactory --> LanceNamespaceProperties : injects
    LanceTableHandle --> LanceNamespaceHolder : tableId built by
    LanceWritableTableHandle --> LanceTableHandle : created from
    LanceNamespaceProperties <|.. annotation : binding
Loading

File-Level Changes

Change Details Files
Replace filesystem-based LanceNamespaceHolder with LanceNamespace API and add namespace utilities.
  • Inject LanceNamespace using LanceNamespace.connect based on lance.impl and filtered lance.* catalog properties, including default options for dir implementation.
  • Add support for single-level and multi-level namespaces via prestoSchemaToLanceNamespace, getTableId, and optional parent prefix handling from LanceConfig.lance.parent.
  • Implement schema and table operations (listSchemaNames, schemaExists, getTablePath, tableExists, listTables, createTable, dropTable, commitAppend, getFragments) using lance-namespace model requests/responses instead of direct filesystem access.
  • Add getStorageOptionsForTable to resolve per-table storage options via describeTable with fallback to connector-level storage options.
  • Ensure proper shutdown of namespace (Closeable) and allocator, and update describeTable/commitAppend/getFragments signatures to work with resolved table paths.
presto-lance/src/main/java/com/facebook/presto/lance/LanceNamespaceHolder.java
Propagate resolved tablePath and tableId through metadata, table handles, writable handles, and execution paths.
  • Extend LanceTableHandle to carry tablePath and tableId, update JSON serialization, equals/hashCode/toString, and adjust all call sites to construct it with namespace-resolved identifiers.
  • Extend LanceWritableTableHandle with tablePath and tableId, update JSON serialization, equals/hashCode/toString, and ensure beginCreateTable/beginInsert pass through these fields.
  • Update LanceMetadata to use namespaceHolder.getTablePath and getTableId when building table handles, and to use tablePath-based describeTable and commitAppend/dropTable calls.
  • Update LancePageSourceProvider, LancePageSinkProvider, and LanceSplitManager to use tablePath from handles instead of recomputing paths by table name.
  • Adjust tests to obtain handles via metadata or LanceNamespaceHolder, assert presence of tablePath/tableId, and update constructors and expectations accordingly.
presto-lance/src/main/java/com/facebook/presto/lance/LanceMetadata.java
presto-lance/src/main/java/com/facebook/presto/lance/LanceTableHandle.java
presto-lance/src/main/java/com/facebook/presto/lance/LanceWritableTableHandle.java
presto-lance/src/main/java/com/facebook/presto/lance/LancePageSinkProvider.java
presto-lance/src/main/java/com/facebook/presto/lance/LancePageSourceProvider.java
presto-lance/src/main/java/com/facebook/presto/lance/LanceSplitManager.java
presto-lance/src/test/java/com/facebook/presto/lance/TestLanceMetadata.java
presto-lance/src/test/java/com/facebook/presto/lance/TestLanceFragmentPageSource.java
presto-lance/src/test/java/com/facebook/presto/lance/TestLanceWritableTableHandle.java
presto-lance/src/test/java/com/facebook/presto/lance/TestLanceTableHandle.java
Revise LanceConfig and connector factory to separate connector config from namespace pass-through properties and support multi-level namespaces.
  • Remove rootUrl from LanceConfig and instead treat lance.* properties as generic namespace configuration, adding parent and updating singleLevelNs semantics/documentation.
  • Add lance.parent config to model higher-level namespace prefixes with $-delimited segments and adjust tests for new defaults and explicit mappings.
  • In LanceConnectorFactory, keep an immutable copy of catalog properties, filter only known lance.* properties into Bootstrap (LanceConfig), and bind the full properties map via LanceNamespaceProperties so namespace holder can see free-form options.
  • Introduce LanceNamespaceProperties binding annotation to inject the raw lance.* property map into LanceNamespaceHolder.
presto-lance/src/main/java/com/facebook/presto/lance/LanceConfig.java
presto-lance/src/main/java/com/facebook/presto/lance/LanceConnectorFactory.java
presto-lance/src/main/java/com/facebook/presto/lance/LanceNamespaceProperties.java
presto-lance/src/test/java/com/facebook/presto/lance/TestLanceConfig.java
Update tests and plugin/query runner wiring for new configuration surface and namespace usage.
  • Adjust unit tests to construct LanceNamespaceHolder with a namespace properties map (e.g., lance.root) instead of LanceConfig.rootUrl, and resolve tablePath/tableId via namespace APIs.
  • Update LanceQueryRunner and TestLancePlugin to use lance.root instead of lance.root-url when configuring the catalog.
  • Ensure metadata tests use metadata.getTableHandle to obtain fully-populated LanceTableHandle instances before calling metadata methods.
presto-lance/src/test/java/com/facebook/presto/lance/TestLanceMetadata.java
presto-lance/src/test/java/com/facebook/presto/lance/TestLanceFragmentPageSource.java
presto-lance/src/test/java/com/facebook/presto/lance/TestLanceWritableTableHandle.java
presto-lance/src/test/java/com/facebook/presto/lance/TestLanceConfig.java
presto-lance/src/test/java/com/facebook/presto/lance/TestLanceTableHandle.java
presto-lance/src/test/java/com/facebook/presto/lance/LanceQueryRunner.java
presto-lance/src/test/java/com/facebook/presto/lance/TestLancePlugin.java
Add lance-namespace-core and apache-client dependencies needed for namespace integration.
  • Include org.lance:lance-namespace-core:0.6.1 and org.lance:lance-namespace-apache-client:0.6.1 as new dependencies in presto-lance module pom, removing prior exclusion of the apache client.
  • Ensure the connector has access to LanceNamespace and REST client implementations for pluggable namespaces.
presto-lance/pom.xml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

- Add tableId to equals/hashCode in LanceTableHandle and
  LanceWritableTableHandle (#1, prestodb#5)
- Add logging to exception handlers in getTableMetadata and
  getColumnHandles instead of silently swallowing (#2, #3)
- Cap Arrow allocator at 8 GB instead of unbounded (prestodb#4)
- Return ImmutableMap from getStorageOptionsForTable (prestodb#8)
- Document $ delimiter choice for parent prefix (prestodb#9)
- Make prestoSchemaToLanceNamespace and addParentPrefix
  package-private (prestodb#10)
- Validate lance.root is set when impl=dir (prestodb#11)
- Move lance dependency versions to root pom
  dependencyManagement (prestodb#12)
- Clean up namespace entry if Dataset.create fails (prestodb#7)
- Add comments to KNOWN_CONFIG_PROPERTIES explaining sync
  requirement (prestodb#6)
- Add TestLanceNamespaceHolder with tests for multi-level
  namespace and parent prefix behavior

Co-Authored-By: Claude Opus 4.6 <[email protected]>
if (!namespaceHolder.tableExists(lanceTable.getTableName())) {
return null;
try {
Schema arrowSchema = namespaceHolder.describeTable(lanceTable.getTablePath());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require describeTable endpoint to return arrowSchema?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From lance schema spec:
The schema describes the structure of a Lance table, including all fields, their data types, and metadata. Schemas use a logical type system where data types are represented as strings that map to Apache Arrow data types. Each field in the schema has a unique identifier (field ID) that enables robust schema evolution and version tracking.

So I assume it's in arrow format

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later on in line 114 we would take this arrow type and transfer to presto type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants