Skip to content

nfpm.native_libs: new backend to generate pkg deps for nfpm packages#22861

Closed
cognifloyd wants to merge 57 commits intomainfrom
cognifloyd/nfpm-native_libs
Closed

nfpm.native_libs: new backend to generate pkg deps for nfpm packages#22861
cognifloyd wants to merge 57 commits intomainfrom
cognifloyd/nfpm-native_libs

Conversation

@cognifloyd
Copy link
Member

@cognifloyd cognifloyd commented Nov 6, 2025

This PR introduces a new backend: pants.backend.npm.native_libs
Initially, the backend will be availble as: pants.backend.experimental.nfpm.native_libs

I proposed this new backend (originally named bindeps) in this discussion: #22396

This backend inspects elf bin/lib files (like lib*.so) in packaged contents (for now, only in pex_binary targets) to identify package dependency metadata and inject that metadata on the relevant nfpm_deb_package or nfpm_rpm_package targets. Effectively, it provides an approximation of the native packagers features:

  • rpm: rpmdeps + elfdeps
  • deb: dh_shlibdeps + dpkg-shlibdeps (These substitute ${shlibs:Depends} in debian control files have)

Goal: Platform-agnostic package builds

This pants backend is designed to be platform-agnostic, like nFPM.

Native packaging tools are often restricted to a single release of a single distro. Unlike native package builders, this new pants backend does not use any of those distro-specific or distro-release-specific utilities or local package databases. This new backend should be able to build deb and rpm packages anywhere that pants can run (MacOS, rpm linux distros, deb linux distros, other linux distros, docker, ...).

To achieve the platform-agnostic goal, the scripts in this new backend use pure-python deps (elfdeps and pyelftools) to search the elf bin/lib files for provided and/or required SONAMEs. It also uses the official remote package search API for debian and ubuntu to map SONAMEs to the package that provides it. Thanks to pants' caching, and the stability of official package names for a given distro-release, these API calls are readily cacheable, which should minimize the traffic sent to these services.

Code Overview

pants.backend.experimental.nfpm.native_libs

Approximate logic flow (including rule calls):

  • inject_native_libs_dependencies_in_package_fields:

    • deb_depends_from_pex:

      • extract the wheels from the pex
      • elfdeps_analyze_pex_wheels: Inspect each wheel, searching for .so (ELF libs) and executable files, and analyzing any ELF metadata to collect SONAMEs (ELF library names, generally a .so file name) that are required.
      • deb_search_for_sonames: Lookup packages containing the required SONAMEs.
    • rpm_depends_from_pex:

      • extract the wheels from the pex
      • elfdeps_analyze_pex_wheels: rpm only needs the SONAMEs in requires, so they do not need to be mapped to package names like deb requires. rpm also tracks which SONAMEs are provided, not just those required.
        w
    • Inject depends (and provides) field(s) on the nfpm_deb_package or nfpm_rpm_package targets. This is used when generating the config passed to nFPM so that nFPM includes the package dependency metadata in the built system package.

pants.backend.experimental.nfpm.native_libs.efldeps

elfdeps is a subsystem (with a default elfdeps.lock file) that allows configuring an alternate version of elfdeps. This might be useful if an alternate version (or a fork?) has fixes/features that apply to libraries contained in the wheels. As with all other backends, we only test with one version in CI, so users may encounter issues with other versions, which they are welcome to report on GitHub or in Slack.

Approximate logic flow (including rule calls):

  • elfdeps_analyze_pex_wheels:
    • Run pex3 repository extract to create a directory with all of the wheels in the pex.
    • subsystem.setup_elfdeps_analyze_wheels_tool: Prepare a pex venv using elfdeps.lock
    • Run script pants.backend.nfpm.native_libs.elfdeps.analyze_wheels in pex venv:
      • iterates over contents each wheel as a zip file,
      • analyzes the contents with elfdeps
      • collects SONAMEs in requires and provides ELF metadata
      • returns collected results as JSON

pants.backend.experimental.nfpm.native_libs.deb

The script in nfpm.native_libs.deb requires aiohttp, aiohttp-retries, and beautifulsoup4. This script is treated as "internal" pants code for which users cannot change the dependency versions.

I couldn't think of any reasons to deal with the added complexity of yet another subsystem and lockfile. So, I added these deps to 3rdparty/python/requirements.txt, and then pull those package versions from the pants venv when running the script that needs them.

construct a pex in rule code that constrains the dependencies to locked versions present in the pants venv.

Approximate logic flow (including rule calls):

  • deb_search_for_sonames:
    • create a pex venv that:
      • includes the search_for_sonames script
      • includes script dependencies constrained to the locked versions present in the pants venv.
    • Run script pants.backend.nfpm.native_libs.deb.search_for_sonames in pex venv:
      • Issue HTTPS package search API calls which returns results in HTML
        (the HTML-based API design predates JSON and other web API technologies).
      • Retry API calls up to 5 times with jitter + exponential backoff
      • Use beautifulsoup4 to extract the package names from the table in the HTML API response
        (this data structure has been consistent for many years, so script updates/maintenance should be minimal).

Note

The search_for_sonames script uses asyncio (and therefore does not use synchronous requests) because it could involve many IO-bound API calls (all of which should be cached using standard pants caching). As the dep is included with pants, I chose to use aiohttp because it's smaller than httpx. But, httpx has a more modern feature set, so if something like HTTP/2 proves to be an important optimization, the script could be refactored to use it instead.

TODO:

and drop the BUILD copy pasta
Without the elfdeps req, we can't run pytest or mypy on the nfpm.native_libs backend's analyze_wheels.py script.

This commit regenerated 3rdparty/python/user_reqs.lock before rebasing
on main. After the rebase, the lockfile regeneration is batched in a
single commit later on.
I had to use a dummy backend in the new PythonTool(...) in generate_builtin_lockfiles.py to generate the initial lockfile,
because the backend isn't loadable until the lockfile exists. After generating the lockfile for the first time,
the backend could actually load, so I updated the PythonTool(...) entry to use the actual backend.
Regeneration works just fine after all of that.

Lockfile diff: elfdeps.lock [elfdeps]

==                      Added dependencies                      ==

  elfdeps                        0.2.0
  pyelftools                     0.32
move the sort logic into the class, so tests don't need to sort it at the usage site.
This commit was rebased on main. Before rebase, the lockfile was
regenerated. After rebase, the lockfile regeneration is batched into a
single follow-up commit.
The rule runs the deb_search_for_sonames.py script.
It pulls the pex requirements from the pants venv so that the script
runs with the same version of python
and a subset of the dists used to run pants itself.
This way, the deps are only defined once.
…ames rule

This is in the scripts integration test file instead of the one for rules to facilitate sharing TEST_CASES for both tests.
We need slightly different things for deb vs rpm. For deb, we need just
the soname to search for relevant packages, and parsing the so_info
string did not seem wise when I can just preserve the data as elfdeps
returned it. So, we now have a SOInfo dataclass (a limited mirror of the
elfdeps.SOInfo dataclass).
I reviewed dh_shlibdeps and dpkg-shlibdeps sources so that we can do
something similar without all the heavy-weight baggage of native tooling
(which includes a complete installation of a specific distro release,
because dpkg-shlibdeps uses symbols and shlibs files from other
installed packages to identify packages that provide required sonames.)

The whole point of nfpm is to be able to build the package for any
version of a distro on any version of that distro or some other distro
or even on a different OS. So, we necessarily cannot replicate all of
the dpkg-shlibdeps logic, but we can do quite a bit thanks to the
package search API.
We need to select the relevant package(s) based on the .so file path,
following (a simplified set of) standard LD lookup rules.
This commit updates the deb_search_for_sonames script and rule
so that .so file names are available for that selection process.
This reverts commit c1d34e11516715d41adcf32b23466d90fab9151c.
This only adds package dependencies without any version constraints.
It might be possible to add a version constraint based on the so_version.
But, this is good enough for now.
Next commit will move rules into that package
This encapsulates the logic, based on dpkg-shlibdeps, that filters SONAMEs.
And give the tests more time (API calls can be flaky and retries take time.
But once cached, the test shouldn't repeat).
This batches updates from previous commits that were rebased on main
into a single lockfile regeneration.

__________________________________________________________________
Lockfile diff: 3rdparty/python/user_reqs.lock [python-default]
__________________________________________________________________
==                    Upgraded dependencies                     ==
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
  graphql-core                   3.2.6        -->   3.2.7
  pbr                            7.0.1        -->   7.0.3
__________________________________________________________________
==                      Added dependencies                      ==
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
  aiohappyeyeballs               2.6.1
  aiohttp                        3.12.15
  aiohttp-retry                  2.9.1
  aiosignal                      1.4.0
  attrs                          25.4.0
  elfdeps                        0.2.0
  frozenlist                     1.8.0
  multidict                      6.7.0
  propcache                      0.4.1
  pyelftools                     0.32
  yarl                           1.22.0
This is a convenience method that handles retrieving a field that was
injected by an earlier rule in the chaing, or falling back to getting
the field from the target.
Add these fields to the `nfpm_deb_package` target:
- `distro`
- `distro_codename`

These are required as input parameters for this script:
- `pants.backend.nfpm.native_libs.deb.search_for_sonames`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant