Skip to content

Ollama/LM Studio hosting section could use more setup detail and compatibility notes #58

@rmdodhia

Description

@rmdodhia

Context
The README helpfully notes that vLLM requires 24GB+ VRAM and points users with lower-VRAM GPUs toward Ollama/LM Studio with GGUF quantized models. However, I ran into some difficulties getting this path to work end-to-end and wanted to share feedback that might help other users.

Experience
I tried hosting the GGUF model via Ollama on an 8GB laptop GPU. While the server started, fara-cli failed when making its first model call. Since Fara-7B is a vision-language model (Qwen2.5-VL) that sends base64-encoded screenshots via the OpenAI image_url content type on every step, it's possible Ollama's OpenAI-compatible endpoint doesn't fully support this for Qwen2.5-VL GGUF models — though I'm not 100% sure this was the root cause vs. VRAM constraints.

It would help to know whether the team has validated this path end-to-end, and if so, what configuration was used.

Suggestions for the documentation

1. Example commands

The section says to specify the correct --base_url, --api_key, and --model but does not provide concrete values. Adding something like this would reduce trial and error:

ollama pull <exact_model_name>

fara-cli \
  --task "..." \
  --base_url http://localhost:11434/v1 \
  --api_key ollama \
  --model <model_name>

2. VRAM guidance

The advice to select the largest model that fits your GPU is reasonable, but a rough table would help users choose a quantization level more confidently:

VRAM Suggested quantization Notes
8GB Q4_K_M (~4.5GB) Tight with KV cache
12GB Q5_K_M / Q6_K
16GB Q8_0 or FP16
24GB+ FP16 via vLLM Recommended path

3. Vision model compatibility note

Since GGUF quantization and llama.cpp-based servers may handle vision inputs differently than vLLM, it would help to clarify whether any quality or compatibility trade-offs should be expected compared to the vLLM path.

4. Modelfile reference

There is a Modelfile in the repository root that is not mentioned in the README. If it is intended for Ollama use, a short note linking to it would make the workflow clearer.

Not a blocker. This is not a critical issue, just sharing this in case it helps improve onboarding for users who start with the Ollama or LM Studio path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions