Skip to content

Allow avro_schema_url property alongside partitioning#27490

Draft
denodo-research-labs wants to merge 1 commit intoprestodb:masterfrom
denodo-research-labs:AvroPartirtions
Draft

Allow avro_schema_url property alongside partitioning#27490
denodo-research-labs wants to merge 1 commit intoprestodb:masterfrom
denodo-research-labs:AvroPartirtions

Conversation

@denodo-research-labs
Copy link
Copy Markdown
Contributor

@denodo-research-labs denodo-research-labs commented Apr 1, 2026

Description

This PR enables the creation of Hive tables in AVRO format using the avro_schema_url property in conjunction with partitioning.

Previously, providing an external Avro schema URL blocked the use of partitioning. This change updates the validation logic to allow partitioned_by columns to coexist with an external schema URL.

Example:

CREATE TABLE test_avro_partitioned (
  dummy_col VARCHAR,
  p_col VARCHAR
) WITH (
  format='AVRO',
  partitioned_by=ARRAY['p_col'],
  avro_schema_url='url'
)

Motivation and Context

The prepareTable method in HiveMetadata threw a NOT_SUPPORTED error if either bucketing or partitioning was present with an Avro schema URL.

Impact

If a user attempts to create a partitioned Hive table using an external Avro schema URL, the operation fails with a PrestoException. This PR fixes the validation logic in HiveMetadata to allow partitioning while still restricting bucketing.

Calling CREATE TABLE with both partitioned_by and avro_schema_url currently throws:

com.facebook.presto.common.type.PrestoException: Bucketing/Partitioning columns not supported when Avro schema url is set

This change will allow users to create partitioned Hive tables using an external Avro schema URL. The logic in HiveMetadata will be updated to specifically target bucketing for the restriction, enabling partitioning support.

Before:

if ((bucketProperty.isPresent() || !partitionedBy.isEmpty()) && getAvroSchemaUrl(tableMetadata.getProperties()) != null) {
    throw new PrestoException(NOT_SUPPORTED, "Bucketing/Partitioning columns not supported when Avro schema url is set");
}

After:

if (bucketProperty.isPresent() && getAvroSchemaUrl(tableMetadata.getProperties()) != null) {
    throw new PrestoException(NOT_SUPPORTED, "Bucketing columns not supported when Avro schema url is set");
}

Test Plan

Verified the fix by:

  • Creating a partitioned Avro table with avro_schema_url and confirming it no longer throws NOT_SUPPORTED.
  • Confirming that attempting to create a bucketed table with avro_schema_url still correctly throws the expected PrestoException.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Hive Connector Changes
* Allow creating partitioned tables using AVRO format and avro_schema_url property.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Apr 1, 2026

Reviewer's Guide

Adjusts Hive AVRO table validation so avro_schema_url only restricts bucketing, adds a new product test suite for partitioned Avro tables using external schemas, and updates existing tests and error messages accordingly.

Sequence diagram for CREATE TABLE AVRO with avro_schema_url and partitioning/bucketing

sequenceDiagram
    actor User
    participant PrestoCoordinator
    participant HiveMetadata
    participant HiveMetastore

    User->>PrestoCoordinator: CREATE TABLE ... format=AVRO, partitioned_by, avro_schema_url
    PrestoCoordinator->>HiveMetadata: prepareTable(session, tableMetadata)

    HiveMetadata->>HiveMetadata: getPartitionedBy(properties)
    HiveMetadata->>HiveMetadata: getBucketProperty(properties)
    HiveMetadata->>HiveMetadata: getAvroSchemaUrl(properties)

    alt bucketProperty present AND avro_schema_url not null
        HiveMetadata-->>PrestoCoordinator: throw PrestoException(NOT_SUPPORTED, bucketing not supported)
        PrestoCoordinator-->>User: error Bucketing columns not supported when Avro schema url is set
    else only partitioned_by present with avro_schema_url
        HiveMetadata->>HiveMetastore: createTable(table)
        HiveMetastore-->>PrestoCoordinator: success
        PrestoCoordinator-->>User: table created successfully
    end
Loading

Updated class diagram for HiveMetadata validation and Avro partitioned tests

classDiagram

class HiveMetadata {
    - Table prepareTable(ConnectorSession session, ConnectorTableMetadata tableMetadata)
    - List~String~ getPartitionedBy(Map~String,Object~ properties)
    - Optional~HiveBucketProperty~ getBucketProperty(Map~String,Object~ properties)
    - String getAvroSchemaUrl(Map~String,Object~ properties)
}

class HiveBucketProperty {
}

class TestAvroPartitioned {
    + void testCreatePartitionedAvroTableWithSchemaUrl()
    + void testBucketedAvroTableWithSchemaUrlFails()
}

TestAvroPartitioned ..> HiveMetadata : uses
HiveMetadata o--> HiveBucketProperty
Loading

File-Level Changes

Change Details Files
Relax HiveMetadata validation so avro_schema_url blocks bucketing but allows partitioning, and update the associated error message.
  • Change prepareTable validation to check only for presence of bucket_property when avro_schema_url is set, removing the partitionedBy check from the condition.
  • Update the thrown PrestoException message to say "Bucketing columns not supported when Avro schema url is set" to reflect the new behavior.
presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java
Align existing Hive integration test expectations with the new, bucketing-only restriction.
  • Update the bucketed-table-with-avro_schema_url smoke test to expect the new error message text.
  • Remove the test that asserted partitioned tables fail when avro_schema_url is set, since partitioning is now supported.
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveIntegrationSmokeTest.java
Introduce product tests that exercise partitioned Avro tables created with avro_schema_url, including DML and metadata operations.
  • Add a new product test class that creates a partitioned Avro table using avro_schema_url and inserts initial data before tests run.
  • Add tests that select from the partitioned Avro table, inspect column metadata (including partition key), and insert an additional partition to verify multi-partition behavior.
  • Ensure cleanup by dropping the test table after tests complete.
presto-product-tests/src/main/java/com/facebook/presto/tests/hive/TestAvroPartitioned.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, looks good. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants