Skip to content

Bug: Data Loss During Network Package Deduplication #40

@Jaydeep869

Description

@Jaydeep869

Description

Network package deduplication in pkg/resolver/network/*.go is first seen by PURL. If a lightweight metadata trace is processed before a richer archive trace, later fields like checksum/hash can be lost.

Steps to Reproduce

  1. Process attestation traces where package metadata API call appears before archive download.
  2. Ensure first event has no BodyHash, while later download event has checksum/hash.
  3. Generate SBOM and inspect final package entry.

Expected Behavior

When duplicate PURLs are found, package entries are merged so richer later metadata (hashes, download URL) is preserved.

Actual Behavior

First-seen record wins, later richer data is discarded, and final SBOM may miss checksums.

Environment

  • sbomit version: current main branch
  • Go version: any supported version
  • OS: Linux/macOS/Windows

Additional Context

  • Area: pkg/resolver/network/*.go
  • Suggested fix:
    • Replace first-wins dedupe with metadata merge logic.
    • Prefer non-empty fields from later traces (BodyHash, DownloadURL, archive-specific metadata).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions