Debian and Ubuntu packages - Call for testing and feedback #20042

ckastner · 2026-03-02T15:49:44Z

ckastner
Mar 2, 2026

Hi,

we have been working on shipping llama.cpp, whisper.cpp, ggml and other ggml-org projects directly from Debian's and Ubuntu's official repositories.

We want users to be able to just sudo apt-get install llama.cpp.

Our packages should be in good shape now (as tested by the CI we built for this task), so now we would like to ask you, the community, for feedback:

Is everything clear and intuitive? If not, what can we do better?
What are you missing? eg: which ackends (we've started with only with a few); documentation (what?)
Bugs, issues, etc.

Please check ggml-Debian for a summary of what we have, how to install/use it, and what we are working on.

Note that packages for trixie-backports and noble-backports are currently still being shipped from our own development repository (instructions included in the link above), but these will be made available in the official channels soon. We want to ship ~monthly updates this way.

Also note that newer GPUs might not be supported by HIP/CUDA in the above releases. Try the Vulkan backend in that case.

aendk · 2026-03-04T09:19:47Z

aendk
Mar 4, 2026

@ckastner
what exactly do you mean by

Also note that newer GPUs might not be supported by HIP/CUDA in the above releases. Try the Vulkan backend in that case.

?
That only on the *-backports, newer GPUs are not supported? And what does "new" mean in this context?

5 replies

ckastner Mar 4, 2026
Author

@aendk

@ckastner what exactly do you mean by
...
That only on the *-backports, newer GPUs are not supported? And what does "new" mean in this context?

In all cases, we build the packages against the ROCm/CUDA releases available in the target distribution, where "available" means the packages in the official Debian resp. Ubuntu repositories, and not from third-party AMD/NVIDIA repositories.

For example, for noble-backports, we build with CUDA 12.0, because that's the version of nvidia-cuda-toolkit in noble. For resolute, it's CUDA 12.4.

It's a bit more complicated for the HIP backend. Basically, for the current *-backports, RDNA2 and CDNA2 are supported. In unstable, support goes up to RDNA3 and CDNA3.

"Newer" GPUs means whatever is supported by those stack versions. An RTX 5090 is not supported because it requires CUDA 12.8, which neither Ubuntu nor Debian have yet.

One of the things we are considering is additionally building flavors of these packages with the vendor versions of the stacks, to get more recent versions of ROCm and CUDA. But because these packages would require third-party sources.list entries to work, we couldn't ship them via the official Debian and Ubuntu repos. Though we'd be fine with self-hosting them, or having the vendors ship them.

peterwwillis Mar 12, 2026

As a user of RDNA3, but not someone who wants to use unstable, I would very much like it if the packaged solution could work with LTS. Having the vendors ship them sounds great to me, but how does that work with the llama.cpp packages? What would be shipped by whom?

ckastner Mar 12, 2026
Author

(Assuming that with "shipping", we're always talking about an APT repository, so that apt-get update works)

The vendor continue to ship the CUDA resp. ROCm stack through their own APT repo.

For llama.cpp, the current strategy would be for us (the Debian maintainers) to ship the packages using our own APT server, following the example from the gist.

An alternative would be for HF/ggml.ai to set up their own APT server (which I could help with), and to ship them from there. Though this is a bit more tricky because of the build step, and distro integration. For example, in Debian we also re-test all reverse dependencies on every upload, to catch regressions, and do other maintenance work.

Another alternative would be for vendors to include our packages in their APT repo, the one where they ship CUDA resp. ROCm through. Though I'm not sure they'd be open to that.

peterwwillis Mar 12, 2026

Debian's llama.cpp package would need to link against the upstream GPU vendors' libs, right? If the vendors kept their own apt repo and packages, do you pull their packages from their apt repo to build your llama.cpp pkgs? If so, how does the package work for the user? I imagine they would need to first add the vendor apt pkg sources, and then install the Debian llama.cpp pkgs? I haven't heard of this being a thing before so I'm assuming I misunderstood something.

But if that's what's proposed, it does sound like the it could be the simplest option... A downside to it would be that upstream vendors would, like you mention, need to test the downstream dependencies for regressions before they release, and I personally doubt they would do that.

Is it at all possible for Debian to pull the apt pkg sources from the vendor's apt repo, and use that to build Debian pkgs of both the GPU vendors' code, as well as llama.cpp pkgs? That way if a vendor update breaks, it won't make it to the end user since you're building/testing everything in tandem. Perhaps if this were its own apt pkg repo, sort of not an official branch but still maintained by Debian?

ckastner Mar 12, 2026
Author

Debian's llama.cpp package would need to link against the upstream GPU vendors' libs, right? If the vendors kept their own apt repo and packages, do you pull their packages from their apt repo to build your llama.cpp pkgs? If so, how does the package work for the user? I imagine they would need to first add the vendor apt pkg sources, and then install the Debian llama.cpp pkgs? I haven't heard of this being a thing before so I'm assuming I misunderstood something.

No, you got it exactly right. Users would need to enable two APT repositories: the vendor one, and the one where we ship the llama.cpp packages, which won't be the official Debian APT (deb.debian.org) but rather our team's APT repo.

This reason why we can't use the official one to host these vendor-targeting builds is precisely because these packages would need dependencies from outside of the official sources. So they would be uninstallable with "pure" Debian or Ubuntu sources.list.

But if that's what's proposed, it does sound like the it could be the simplest option... A downside to it would be that upstream vendors would, like you mention, need to test the downstream dependencies for regressions before they release, and I personally doubt they would do that.

Luckily we can do that downstream, in our own CI. So we'd discover such regressions pretty fast.

Is it at all possible for Debian to pull the apt pkg sources from the vendor's apt repo, and use that to build Debian pkgs of both the GPU vendors' code, as well as llama.cpp pkgs?

Not(*) within Debian. We do have our own versions of the GPU vendor's code, that's what the llama.cpp packages in the official Debian/Ubuntu repos are built with. But these are release-specific, and thus get old. That's why we don't support RDNA3 in trixie-backports or noble-backports, because the vendor version is so old.

The vendors ship up-to-date versions of their stacks in their own APT repos. If they wanted to, with help from Debian people, they could ship backports in the official Debian repos, but I suspect it's simpler for them to just do their own APT.

(*) "Not" in the sense of practically not. It would be technically possible, but would require vendor commitment beyond what I suspect they are willing to do.

That way if a vendor update breaks, it won't make it to the end user since you're building/testing everything in tandem. Perhaps if this were its own apt pkg repo, sort of not an official branch but still maintained by Debian?

That would be one way I guess.

In any case, we're open to try many things.

The first attempt will be the one you described: we pull the vendor packages to build our llama.cpp packages, and ship them in our own APT repo. And we can iterate on that, based on user feedback.

ckastner · 2026-03-25T14:01:28Z

ckastner
Mar 25, 2026
Author

Here's an interesting problem that we've been hearing from our users, for which we could really use more feedback:

Which of ggml's compute backends should the llama.cpp package install by default?

Currently, it only installs the CPU backend. So people with GPUs won't benefit from them unless they know to additionally install the HIP, CUDA or Vulkan backends.

Possible strategies

1. ggml backend meta-package

This strategy involves creating a ggml meta-package libggml0-backend-all recommending all real backends, and have llama.cpp depend on libggml0-backend-all | libggml0-backend-<specific>.

With such a setup, by default, apt-get install llama.cpp would install all backends. Users could disable/override this with --no-install-recommends.

Pros
- Best hardware coverage
Cons
- Download and installation of many useless dependencies on any given system
- This won't work for CUDA because of its placement in Debian non-free resp. Ubuntu multiverse

2. llama.cpp meta-packages

This strategy adds new packages llama.cpp-cuda, llama.cpp-hip, and so on, where each package depends on the right ggml backend.

Pros
- Easy to find and install
Cons
- Can get confusing quite fast, if the number of meta-packages grows too large

3. Vulkan backend

This strategy installs the Vulkan backend by default.

Pros
- Should provide reasonable results on most systems
Cons
- Performance for GPUs probably slightly worse than native
- Installing the backend alone may be insufficient to get it working. For example, for NVIDIA cards, users would have to manually install nvidia-vulkan-icd.

22 replies

mbaudier Mar 31, 2026

I tend to agree with @ggerganov. The CPU backend always "just works", while a GPU backend requires some consideration.

For example, if the device is too small for a model, only some layers may be offloaded, which may not necessarily be better than running on plenty of modern CPUs.

I have the feeling that it should be a conscious choice to use anything else than the CPU backend.

ckastner Mar 31, 2026
Author

For me, the current way makes sense and I cannot say with confidence that any of the 3 new options would generally be better.

If our concern is that the current way is maybe not immediately obvious to everyone, then we should keep in mind that this could probably also be solved largely by an Option 4, documenting this upstream: by adding a "Debian/Ubuntu" bullet to the Quick Start list and having the popular solutions there, for example.

Are there any prior examples of other packages that solve a similar issue? Maybe we can follow their example.

The one @helmutg pointed out is for the X server and the video drivers it needs. That solution corresponds to Option 1 above: by default all drivers are depended on, but a user can then deselect drivers, as long as at least one remains installed.

The X drivers are pretty small though AFAICT. Compare this to installing the ggml HIP backend:

Summary:
  Upgrading: 3, Installing: 22, Removing: 0, Not Upgrading: 10
  Download size: 973 MB
  Space needed: 7403 MB / 308 GB available

ckastner Apr 2, 2026
Author

Just a quick ping, the packages have been updated:

ggml: 0.9.10
llama.cpp: b8611
llama.vim: 4103c5c

As announced, we plan frequent releases, so in future we'll only ping if something noteworthy happens.

ggerganov Apr 2, 2026
Maintainer

ggml: 0.9.10

There was an unfortunate regression in the CUDA backend just before that release (#20998 (comment)). I'm preparing v0.9.11 now.

Edit: it's up: https://github.com/ggml-org/ggml/releases/tag/v0.9.11

ckastner Apr 2, 2026
Author

There was an unfortunate regression in the CUDA backend just before that release (#20998 (comment)). I'm preparing v0.9.11 now.

No problem, I updated the ggml packages to 0.9.11.

mbaudier · 2026-04-05T05:43:21Z

mbaudier
Apr 5, 2026

Based on your feedback, we have been working over the last two weeks on the packaging of a ggml CUDA backend built against the Nvidia-provided repositories (cc @aendk). It is available for Debian unstable/sid and stable/trixie in the Debian AI team repository (both built against the 'debian13' Nvidia repo), with detailed installation instructions available here:

https://salsa.debian.org/deeplearning-team/ggml/-/blob/debian/unstable/debian/README.vendor-cuda.md

There are various use cases where using the Nvidia-provided packages can be useful or necessary, which in turn require a dedicated build of the ggml CUDA backend:

On cloud instances with Nvidia GPUs, only the Nvidia-provided driver supports the lightweight cloud kernel. It is possible to use the regular Debian 'contrib' and 'non-free' packages, but it requires switching to the full-fledged kernel, which is not trivial.
The Nvidia-provided repositories offer multiple versions of CUDA and of the Nvidia driver, which may be useful in order to support newer GPUs or pin down specific versions. This new ggml CUDA backend package is therefore dependent on a given CUDA version and marked as such (currently libggml0-backend-vendor-cuda-13-2 for CUDA v13.2).

Thanks to the ggml and llama.cpp architectures, all the other packages (currently in the Debian AI team repository, and soon in the trixie-backports repository) can be used as such. All this affects only the CUDA backend.

We have tried to provide it for Ubuntu noble-backports as well, but we ran into issues related to the package dependencies in the Nvidia-provided Ubuntu repositories, that we therefore cannot address on our side. Please do not use the Ubuntu packages, except in order to help us troubleshoot these issues.

These ggml CUDA backend packages for deployments using the Nvidia-provided repositories have been systematically tested with Debian Trixie on AWS instances with Nvidia GPUs (amd64 and arm64). As usual, your feedback is welcome and very useful!

2 replies

ckastner Apr 5, 2026
Author

We have tried to provide it for Ubuntu noble-backports as well, but we ran into issues related to the package dependencies in the Nvidia-provided Ubuntu repositories, that we therefore cannot address on our side. Please do not use the Ubuntu packages, except in order to help us troubleshoot these issues.

To expand on this a bit, we use an unversioned nvidia-compute as a package build-time dependency, and resulting backend this generates a versioned run-time dependency of libnvidia-compute-560, which causes installation problems on systems without a matching 560 driver.

However: I'm assuming ggml doesn't interact with the driver directly, only via the CUDA libraries. These all seem to have the same SOVER across driver versions, thus providing the same ABI. So we probably just need to loosen the runtime dependency to something that supports any libnvidia-compute-xxx.

But rather than tap in the dark, we'd thought we'd ping the thread and see what strategies others are using.

ckastner Apr 6, 2026
Author

We have tried to provide it for Ubuntu noble-backports as well, but we ran into issues related to the package dependencies in the Nvidia-provided Ubuntu repositories, that we therefore cannot address on our side

We found a solution to these issues. The libggml0-backend-vendor-cuda-13-2 backend provided in the vendor builds should now work with any installed Nvidia driver version.

We only build targeting CUDA 13.2 for now, but we can also build for other versions.

Debian and Ubuntu packages - Call for testing and feedback #20042

Uh oh!

ckastner Mar 2, 2026

Replies: 3 comments · 29 replies

Uh oh!

aendk Mar 4, 2026

Uh oh!

ckastner Mar 4, 2026 Author

Uh oh!

peterwwillis Mar 12, 2026

Uh oh!

ckastner Mar 12, 2026 Author

Uh oh!

peterwwillis Mar 12, 2026

Uh oh!

ckastner Mar 12, 2026 Author

Uh oh!

ckastner Mar 25, 2026 Author

Possible strategies

Uh oh!

Uh oh!

mbaudier Mar 31, 2026

Uh oh!

Uh oh!

ckastner Mar 31, 2026 Author

Uh oh!

Uh oh!

ckastner Apr 2, 2026 Author

Uh oh!

Uh oh!

ggerganov Apr 2, 2026 Maintainer

Uh oh!

ckastner Apr 2, 2026 Author

Uh oh!

mbaudier Apr 5, 2026

Uh oh!

Uh oh!

ckastner Apr 5, 2026 Author

Uh oh!

ckastner Apr 6, 2026 Author

ckastner
Mar 2, 2026

Replies: 3 comments 29 replies

aendk
Mar 4, 2026

ckastner Mar 4, 2026
Author

ckastner Mar 12, 2026
Author

ckastner Mar 12, 2026
Author

ckastner
Mar 25, 2026
Author

ckastner Mar 31, 2026
Author

ckastner Apr 2, 2026
Author

ggerganov Apr 2, 2026
Maintainer

ckastner Apr 2, 2026
Author

mbaudier
Apr 5, 2026

ckastner Apr 5, 2026
Author

ckastner Apr 6, 2026
Author