Skip to content

MultipleAspectTransformer silently drops non-matching MCPWs in pipeline #16939

@loustler

Description

@loustler

Describe the bug
MultipleAspectTransformer silently drops all non-matching MetadataChangeProposalWrapper (MCPW) records passing through BaseTransformer.transform(). For example, a transformer that handles globalTags will swallow domains, subTypes, dataPlatformInstance, and any other aspect MCPWs that don't match its target. This means source-level domain configuration has no effect when a MultipleAspectTransformer is present in the pipeline.

To Reproduce
Steps to reproduce the behavior:

from datahub.ingestion.transformer.base_transformer import BaseTransformer, MultipleAspectTransformer
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.ingestion.api.common import EndOfStream, PipelineContext, RecordEnvelope
from datahub.metadata.schema_classes import DomainsClass, GlobalTagsClass

class MyTransformer(BaseTransformer, MultipleAspectTransformer):
    def entity_types(self): return ["dataset"]
    def aspect_name(self): return "globalTags"
    def transform_aspects(self, entity_urn, aspect_name, aspect): yield (aspect_name, aspect)
    @classmethod
    def create(cls, config_dict, ctx): return cls()
    def __init__(self): super().__init__()

ctx = PipelineContext(run_id="test")
transformer = MyTransformer.create({}, ctx)
urn = "urn:li:dataset:(urn:li:dataPlatform:hive,db.table,PROD)"
tags_mcp = MetadataChangeProposalWrapper(entityUrn=urn, aspect=GlobalTagsClass(tags=[]))
domain_mcp = MetadataChangeProposalWrapper(entityUrn=urn, aspect=DomainsClass(domains=["urn:li:domain:domm"]))
inputs = [RecordEnvelope(r, metadata={}) for r in [tags_mcp, domain_mcp, EndOfStream()]]
outputs = list(transformer.transform(inputs))
domain_outputs = [o.record for o in outputs if isinstance(o.record, MetadataChangeProposalWrapper) and isinstance(o.record.aspect, DomainsClass)]
assert len(domain_outputs) == 1  # FAILS — domain_outputs is empty

Expected behavior
MCPWs whose aspect does not match the transformer's target aspect should pass through unchanged, just as SingleAspectTransformer does.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment

  • DataHub Version: acryl-datahub==1.5.0.1

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBug report

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions