Backport 2287 to release/0.4.z#2294
Conversation
Reviewer's GuideBackports the normalized, pre-expanded license infrastructure and query refactor to release/0.4.z, replacing runtime expansion/CTE-based logic with dictionary/junction tables, COALESCE-based querying, and new migrations plus tests, while fixing backport-specific type and performance issues. Sequence diagram for SBOM ingestion and expanded license usagesequenceDiagram
actor User
participant API as SbomIngestionAPI
participant Ingestor as SbomIngestor
participant DB as Database
participant LicenseSvc as LicenseService
User->>API: POST /api/v1/sboms (SPDX or CycloneDX)
API->>Ingestor: parse_and_store_sbom(document)
Ingestor->>DB: insert sbom, packages, licenses, sbom_package_license
Ingestor->>DB: populate_expanded_license(sbom_id)
activate DB
DB-->>DB: INSERT expanded texts into expanded_license
DB-->>DB: INSERT mappings into sbom_license_expanded
deactivate DB
Ingestor-->>API: ingestion_result(sbom_id)
API-->>User: 201 Created (sbom_id)
User->>API: GET /api/v1/licenses?q=license:MIT
API->>LicenseSvc: licenses(search, paginated, connection)
LicenseSvc->>DB: SELECT via UNION
DB-->>LicenseSvc: rows from expanded_license and license
LicenseSvc-->>API: PaginatedResults<LicenseText>
API-->>User: normalized license list
ER diagram for expanded_license and sbom_license_expanded normalizationerDiagram
LICENSE {
uuid id
text text
}
SBOM {
uuid sbom_id
}
SBOM_PACKAGE_LICENSE {
uuid sbom_id
uuid license_id
uuid node_id
text license_type
}
EXPANDED_LICENSE {
uuid id
text expanded_text
text text_hash
}
SBOM_LICENSE_EXPANDED {
uuid sbom_id
uuid license_id
uuid expanded_license_id
}
SBOM ||--o{ SBOM_PACKAGE_LICENSE : has_package_license
LICENSE ||--o{ SBOM_PACKAGE_LICENSE : used_by_package
SBOM ||--o{ SBOM_LICENSE_EXPANDED : has_expanded_license
LICENSE ||--o{ SBOM_LICENSE_EXPANDED : normalized_for
EXPANDED_LICENSE ||--o{ SBOM_LICENSE_EXPANDED : referenced_by
SBOM_PACKAGE_LICENSE ||--o{ SBOM_LICENSE_EXPANDED : maps_to_expanded_via_keys
Class diagram for license querying and expanded license ingestionclassDiagram
class LicenseService {
+get_license_ref_mapping(connection) Result~Option~Vec~LicenseRefMapping~~
+licenses(search, paginated, connection) Result~PaginatedResults~LicenseText~~
}
class SbomService {
+fetch_sboms(search, paginated, connection) Result~PaginatedResults~Sbom~~
+describes_packages(sbom_id, options, db) Result~R::Output~SbomPackageRelation~~
}
class PurlService {
+search_purls(search, paginated, connection) Result~PaginatedResults~Purl~~
}
class PurlDetails {
+from_model(qualified_package, connection) Result~PurlDetails~~
}
class LicenseFilterCommon {
<<module>>
+LICENSE : &str
+license_text_coalesce() SimpleExpr
}
class ExpandedLicenseIngestor {
<<module>>
+populate_expanded_license(sbom_id, db) Result~(), DbErr~
}
class ExpandedLicenseEntity {
<<entity>>
+id : Uuid
+expanded_text : String
+text_hash : String
}
class SbomLicenseExpandedEntity {
<<entity>>
+sbom_id : Uuid
+license_id : Uuid
+expanded_license_id : Uuid
}
class SbomPackageLicenseEntity {
<<entity>>
+sbom_id : Uuid
+license_id : Uuid
+node_id : Uuid
+license_type : String
+Relation::License
+Relation::SbomLicenseExpanded
}
class LicenseEntity {
<<entity>>
+id : Uuid
+text : String
}
class SbomEntity {
<<entity>>
+sbom_id : Uuid
}
LicenseService --> LicenseFilterCommon : uses_license_text_coalesce
LicenseService --> SbomLicenseExpandedEntity : joins
LicenseService --> ExpandedLicenseEntity : joins
LicenseService --> LicenseEntity : joins
LicenseService --> SbomPackageLicenseEntity : joins
SbomService --> LicenseFilterCommon : uses_LICENSE_constant
SbomService --> SbomLicenseExpandedEntity : filters_via_subquery
SbomService --> ExpandedLicenseEntity : filters_via_subquery
SbomService --> SbomPackageLicenseEntity : filters_via_subquery
SbomService --> LicenseEntity : filters_cyclonedx
PurlService --> LicenseFilterCommon : uses_LICENSE_constant
PurlService --> SbomLicenseExpandedEntity : filters_via_subquery
PurlService --> ExpandedLicenseEntity : filters_via_subquery
PurlService --> SbomPackageLicenseEntity : filters_via_subquery
PurlService --> LicenseEntity : filters_cyclonedx
PurlDetails --> LicenseFilterCommon : uses_license_text_coalesce
PurlDetails --> SbomLicenseExpandedEntity : joins
PurlDetails --> ExpandedLicenseEntity : joins
PurlDetails --> SbomPackageLicenseEntity : joins
PurlDetails --> LicenseEntity : joins
ExpandedLicenseIngestor --> ExpandedLicenseEntity : inserts
ExpandedLicenseIngestor --> SbomLicenseExpandedEntity : inserts
ExpandedLicenseIngestor --> LicenseEntity : reads
ExpandedLicenseIngestor --> SbomPackageLicenseEntity : reads
ExpandedLicenseIngestor --> SbomEntity : uses_sbom_id_keys
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- There are unresolved merge conflict markers in
modules/ingestor/src/graph/sbom/cyclonedx.rs(e.g.<<<<<<< HEAD/>>>>>>> 49185eb8aroundComponentCreatorandComponentType), which must be resolved before this can be merged. - The normalization/backfill logic for expanded licenses is currently duplicated between the migration SQL (
m0002120_normalize_expanded_license/up.sql) and the ingestion helper (populate_expanded_license); consider centralizing this into a single SQL statement or helper to avoid future drift between the two code paths.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There are unresolved merge conflict markers in `modules/ingestor/src/graph/sbom/cyclonedx.rs` (e.g. `<<<<<<< HEAD` / `>>>>>>> 49185eb8` around `ComponentCreator` and `ComponentType`), which must be resolved before this can be merged.
- The normalization/backfill logic for expanded licenses is currently duplicated between the migration SQL (`m0002120_normalize_expanded_license/up.sql`) and the ingestion helper (`populate_expanded_license`); consider centralizing this into a single SQL statement or helper to avoid future drift between the two code paths.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
ctron
left a comment
There was a problem hiding this comment.
That seems to pull in all kinds of stuff from main which should not be in the PR.
1af2d7b to
a4dee30
Compare
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The raw SQL used to populate
expanded_license/sbom_license_expandedappears in both the migration backfill andpopulate_expanded_license; consider consolidating this logic or at least clearly documenting the intended differences to avoid subtle drift between ingestion-time and migration-time behavior. - In
LicenseService::licenses, the comment describingnon_sbom_querystill mentions CycloneDX licenses "that were never expanded", but CycloneDX ingestion now callspopulate_expanded_license; updating this comment to match the actual behavior will make the intent clearer for future maintainers.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The raw SQL used to populate `expanded_license`/`sbom_license_expanded` appears in both the migration backfill and `populate_expanded_license`; consider consolidating this logic or at least clearly documenting the intended differences to avoid subtle drift between ingestion-time and migration-time behavior.
- In `LicenseService::licenses`, the comment describing `non_sbom_query` still mentions CycloneDX licenses "that were never expanded", but CycloneDX ingestion now calls `populate_expanded_license`; updating this comment to match the actual behavior will make the intent clearer for future maintainers.
## Individual Comments
### Comment 1
<location path="modules/ingestor/src/graph/sbom/common/expanded_license.rs" line_range="19-28" />
<code_context>
+pub async fn populate_expanded_license(
</code_context>
<issue_to_address>
**suggestion (bug_risk):** populate_expanded_license should run in a transaction to avoid partial state between dictionary and junction tables
Because this runs two separate SQL statements (one for `expanded_license`, one for `sbom_license_expanded`), a failure in the second after the first succeeds can leave inconsistent data that ingestion won’t fix. Please wrap both in a single transaction (or take a `&DatabaseTransaction` instead of a generic `ConnectionTrait`) so they either both commit or both roll back.
Suggested implementation:
```rust
pub async fn populate_expanded_license(
sbom_id: Uuid,
db: &impl ConnectionTrait,
) -> Result<(), DbErr> {
// Run all dictionary and junction-table operations in a single transaction
let txn = db.begin().await?;
// Step 1: Insert into expanded_license dictionary
txn.execute(Statement::from_sql_and_values(
txn.get_database_backend(),
r#"
INSERT INTO expanded_license (expanded_text)
SELECT DISTINCT expand_license_expression_with_mappings(
l.text,
```
1. Replace all other uses of `db.execute(...)` within `populate_expanded_license` with `txn.execute(...)`, and use `txn.get_database_backend()` instead of `db.get_database_backend()` where needed.
2. Before the final `Ok(())` in `populate_expanded_license`, add `txn.commit().await?;` so the transaction is committed only after both the `expanded_license` and `sbom_license_expanded` insert statements succeed.
3. Ensure there are no early `return` statements in the middle of the function that would bypass the `txn.commit().await?;` (errors will naturally roll back the transaction).
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
@ctron I have aligned the changes and it requires backport from PR #2242 as well. Along with the couple of changes included for the resolve conflicts as well. PR is ready for review - Please let me know your thoughts.
|
a4dee30 to
272e73d
Compare
ctron
left a comment
There was a problem hiding this comment.
Minor nit with the ignores.
However, I'm just saying that that PR has quite an impact in how we handle query as well as ingestion. And we don't have any performance measurements, as far as I know.
And the goal here is to merge this into a patch release.
I think it would make sense to, at least manually, give it a try and see how it impacts performance with a larger set of data.
.gitignore
Outdated
| *-secret.* | ||
| *.tar | ||
| *.sql | ||
| #*.sql |
There was a problem hiding this comment.
Couldn't we just remove this line and the next one?
8df8be5 to
298a629
Compare
ctron
left a comment
There was a problem hiding this comment.
Thanks ❤️ Technically that PR is fine. I think someone needs to make a decision if we want to this merged without performance checks. Unless we already have some somewhere.
2bcd50c to
48cb905
Compare
Signed-off-by: mrizzi <mrizzi@redhat.com> Assisted-by: Claude Code
…h_mappings (TC-3591) Signed-off-by: mrizzi <mrizzi@redhat.com> Assisted-by: Claude Code
…on expansion (TC-3591) Signed-off-by: mrizzi <mrizzi@redhat.com> Assisted-by: Claude Code
…h test (TC-3591) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.> Assisted-by: Claude Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.> Assisted-by: Claude Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.> Assisted-by: Claude Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
I think we'd like to keep the Query processing logic consistent, i.e. DRY. So we can take a mutable reference to the Select struct and create the union before applying the sort filtering. And we use an expression for the license alias that effectively does the field name translation for us. I removed the test for an invalid sort direction as that behavior is inconsistent with every other TPA endpoint accepting a 'q' parameter. If we think we should accept invalid sort directions, we should make that change in the query module, but since the API is largely used internally, I think it's fine to expect a valid direction from callers.
…ced ExpandedLicenseCreator with a function Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This ensures both expanded_license dictionary and sbom_license_expanded junction table inserts are atomic - if either fails, both roll back. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.>
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.>
48cb905 to
6dd8564
Compare
Manual backport of PR #2287 to the release/0.4.z branch.
The automated backport failed due to merge conflicts. This PR contains the manually cherry-picked and conflict-resolved commits.
Changes
This backport includes 14 commits from PR #2287 and PR #2242
Resolved Conflicts
Merge conflicts were resolved in:
Custom Fix:
Summary
Fixes compatibility issues and restores performance optimizations lost during the backport of PR #2287 to release/0.4.z.
Summary by Sourcery
Normalize and pre-expand license expressions into dedicated tables, refactor license querying and filtering across services to use the new schema and COALESCE-based expressions, and add supporting migrations, entities, and tests while increasing recursion limits for large builds.
New Features:
Bug Fixes:
Enhancements:
Tests:
Summary by Sourcery
Backport license normalization and querying improvements to the 0.4.z release branch, including new expanded license storage, ingestion-time expansion, and performance-optimized license filtering across SBOM and PURL services.
New Features:
Enhancements:
Tests: