Skip to content

add support for platform_instance in DataProcessInstance #16907

@sgomezvillamor

Description

@sgomezvillamor

#12751 adds support of platform_instance in many, but missing the support in DataProcessInstance.

The DataProcessInstanceKey class only includes 3 fields for URN hash generation:

class DataProcessInstanceKey(DatahubKey):
    cluster: Optional[str] = None
    orchestrator: str
    id: str

Source: dataprocess_instance.py

The URN is generated in _post_init_:

self.urn = DataProcessInstanceUrn(
    id=DataProcessInstanceKey(
        cluster=self.cluster,
        orchestrator=self.orchestrator,
        id=self.id,
    ).guid()
)

Source: dataprocess_instance.py#L78-L84

Impact

platform_instance is NOT included in DataProcessInstanceKey, so PLATFORM_A and PLATFORM_B Airflow instances running the same DAG generate identical URNs.

Originally posted by @q30327 in #13358

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions