Auto tuning for different models #21053

monksy · 2026-03-27T04:18:45Z

monksy
Mar 27, 2026

I've been running Ollama for a while and I'm hosting several models. I'm VRAM poor (12gb) and system ram rich (96gb ddr5). Ollama appears to auto fit the model that I choose and has retries. I'm running llama.cpp/ollama I'm running this in docker.

I've been experimenting with Llama.cpp - Server and I'm finding out that I have to tune each model individually.

Is there a process to help auto tune this? It seems like it would be a bit of work to adjust parameters for every model that I would use.

How do others manage the presets?

am17an · 2026-03-27T05:20:51Z

am17an
Mar 27, 2026
Collaborator

llama_fit_params already does this and it's on by default. What issue are you seeing?

5 replies

monksy Mar 27, 2026
Author

How would I run this within docker?

Does llama_fit_params create the preset model presets, or is llama_fit_params something that should execute before the model is loaded?

am17an Mar 27, 2026
Collaborator

It executes based on the number of available GPUs and context size requested. This has the relevant documentation #18049

monksy Mar 27, 2026
Author

I'm running Gemma3 27B with Q4_K_M vs Q5_K_M.

Q4_K_M works, and I believe because it just barely fits under my systems' gpu vram.

I've got the logs attached.
fulllogslama.log

With Q5 it tries to allocate 15gb on a 12gb card with 600mb already used on it.

monksy Mar 27, 2026
Author

I found out that I had fixed settings enabled on the environment variables. Disabling that now autofits.

am17an Mar 27, 2026
Collaborator

Also by the way, since you said you are ram-rich. Don't forget --no-mmap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto tuning for different models #21053

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Auto tuning for different models #21053

Uh oh!

monksy Mar 27, 2026

Replies: 1 comment · 5 replies

Uh oh!

am17an Mar 27, 2026 Collaborator

Uh oh!

monksy Mar 27, 2026 Author

Uh oh!

am17an Mar 27, 2026 Collaborator

Uh oh!

monksy Mar 27, 2026 Author

Uh oh!

monksy Mar 27, 2026 Author

Uh oh!

am17an Mar 27, 2026 Collaborator

monksy
Mar 27, 2026

Replies: 1 comment 5 replies

am17an
Mar 27, 2026
Collaborator

monksy Mar 27, 2026
Author

am17an Mar 27, 2026
Collaborator

monksy Mar 27, 2026
Author

monksy Mar 27, 2026
Author

am17an Mar 27, 2026
Collaborator