Replies: 1 comment 5 replies
-
|
|
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've been running Ollama for a while and I'm hosting several models. I'm VRAM poor (12gb) and system ram rich (96gb ddr5). Ollama appears to auto fit the model that I choose and has retries. I'm running llama.cpp/ollama I'm running this in docker.
I've been experimenting with Llama.cpp - Server and I'm finding out that I have to tune each model individually.
Is there a process to help auto tune this? It seems like it would be a bit of work to adjust parameters for every model that I would use.
How do others manage the presets?
Beta Was this translation helpful? Give feedback.
All reactions