Skip to content

Implement reuse scanner plugin#11299

Draft
maennchen wants to merge 5 commits intooss-review-toolkit:mainfrom
maennchen:jm/reuse-analyzer
Draft

Implement reuse scanner plugin#11299
maennchen wants to merge 5 commits intooss-review-toolkit:mainfrom
maennchen:jm/reuse-analyzer

Conversation

@maennchen
Copy link
Contributor

Adds a scanner plugin for reuse-tool.

See individual commits.

SPDX documents without packages are valid per the specification.
This is needed to support REUSE tool output which only contains files.

Signed-off-by: Jonatan Männchen <[email protected]>
@codecov
Copy link

codecov bot commented Jan 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.44%. Comparing base (c59b261) to head (aa6f6aa).

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11299      +/-   ##
============================================
+ Coverage     57.41%   57.44%   +0.03%     
- Complexity     1705     1708       +3     
============================================
  Files           346      346              
  Lines         12875    12874       -1     
  Branches       1228     1227       -1     
============================================
+ Hits           7392     7396       +4     
+ Misses         5005     5002       -3     
+ Partials        478      476       -2     
Flag Coverage Δ
funTest-external-tools 13.67% <ø> (ø)
funTest-no-external-tools 30.96% <ø> (+0.02%) ⬆️
test-ubuntu-24.04 42.43% <ø> (+0.02%) ⬆️
test-windows-2025 42.41% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add SpdxTagValueParser to parse SPDX Tag:Value format and return an
SpdxDocument.

Signed-off-by: Jonatan Männchen <[email protected]>
Add REUSE tool (version 6.2.0) to the Docker image via pip.

Document the REUSE tool license (GPL-3.0-or-later) in /opt/licenses
to comply with GPL requirements. The tool is installed unmodified
from pip.

Signed-off-by: Jonatan Männchen <[email protected]>
Add a scanner plugin that uses the REUSE tool to detect license and
copyright information in source code projects following the REUSE
specification.

The plugin runs `reuse spdx` to generate an SPDX bill of materials and
uses SpdxTagValueParser to extract license and copyright findings.

Signed-off-by: Jonatan Männchen <[email protected]>
The REUSE scanner test fixtures contain files with custom license
identifiers (e.g., LicenseRef-BS) that are intentionally used to test
the scanner's behavior. Using `precedence = "override"` prevents REUSE
from requiring corresponding license files in the root LICENSES
directory for these test-only identifiers.

Signed-off-by: Jonatan Männchen <[email protected]>
@maennchen maennchen marked this pull request as ready for review January 13, 2026 11:35
@maennchen maennchen requested a review from a team as a code owner January 13, 2026 11:35

require(dataLicense.isNotBlank()) { "The data license must not be blank." }

require(packages.isNotEmpty()) { "At least one package must be listed in packages" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPDX documents without packages are valid per the specification.

For reference, please add a link to the spec to support that "claim". I'd probably use https://spdx.github.io/spdx-spec/v2.3/composition-of-an-SPDX-document/#522-package-information-section, with an emphasis on "If" in "If SPDX information is being used to describe packages, then one instance of the package information per package being described shall exist".

@@ -0,0 +1,226 @@
/*
* Copyright (C) 2017 The ORT Project Copyright Holders <https://github.com/oss-review-toolkit/ort/blob/main/NOTICE>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong Copyright year, I believe.

@@ -0,0 +1,226 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add SpdxTagValueParser to parse SPDX Tag:Value format and return an SpdxDocument.

Please don't just repeat in the commit message what's more or less obvious from the diff. Rather explain why this is being added, what's the planned use-case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'm a bit reluctant to add more custom code for an SPDX parser, as we decided to go into the direction of giving the new upstream SPDX Java library a try. So could that be used instead, with https://github.com/spdx/spdx-java-tagvalue-store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll give this a try. I'll put the PR in draft and pick it up at a later point when I have time for it.

@@ -0,0 +1,4 @@
BS License
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I review this commit in more detail, please document the motivation to add REUSE as a scanner in the commit message. Questions I'd like to see answered include:

  • Why do none of the other scanner implementations suffice? It seems that REUSE is a very simple "scanner", and other scanners, like Askalono or Licensee, probably cover the same feature set and are fast as well. Is it maybe so that for comparison or whatever reason you rely on the specifics of how REUSE reports things?
  • Letting REUSE produce SPDX in order to parse it and then create a scan result is a bit like taking the long way. But it indeed seems like SPDX, and in tag-value format (!), is the only way to get license and copyright findings out of the REUSE tool. So in a way, I wonder whether https://codeberg.org/fsfe/reuse-tool/issues/394 should be implemented first? But then again, that REUSE issue doe snot seem to move forward for 4 years.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do none of the other scanner implementations suffice?

The main reason is that this is less about generic license detection and more about semantic compatibility and trust.

Many projects already rely on reuse lint in their CI, and a lot of developers run it locally because it’s lightweight and easy to use. For those projects, REUSE is effectively the source of truth for licensing and copyright compliance.

REUSE is also intentionally simple. It checks compliance against a very explicit specification (SPDX headers, .license files, dep5, etc.) and reports deterministically whether a project follows those rules. It does not try to infer licenses from file contents. Tools like Askalono or Licensee solve a different problem by doing best-effort license detection via matching or classification.

Using the same tool inside ORT matters because it guarantees identical results. Even small semantic differences in how headers or edge cases are handled would lead to confusing discrepancies between reuse lint and ORT.

Letting REUSE produce SPDX in order to parse it and then create a scan result is a bit like taking the long way.

I agree that going via SPDX tag/value looks like the long way around. Unfortunately, at the moment it’s also the only supported way to extract structured findings from REUSE. I would much prefer a JSON output as well.

That said, my intent here is deliberately narrow: I want to use REUSE only as a scanner for ORT. REUSE is not trying to replicate what ORT provides around license policy evaluation, vulnerabilities, or dependency-wide analysis. ORT remains the system that aggregates results and applies compliance rules.

REUSE also has a clearly defined scope: it only works if a project (and optionally its dependencies) actually implements REUSE. If that’s not the case, this scanner simply isn’t applicable, which is expected.

Thanks for raising the comment, I'd like to see the reaction before proceeding here.

@maennchen maennchen marked this pull request as draft January 15, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants