Runnable examples demonstrating how to use ElBruno.LocalLLMs in different scenarios. Each sample is a standalone .NET project you can build and run directly.
- .NET 8.0+ SDK
- ~2-4 GB free disk space (models are downloaded on first run)
- CPU is sufficient; GPU (CUDA/DirectML) is optional
What it demonstrates: The simplest possible usage — create a client and ask a question.
Run it:
dotnet run --project samples/HelloChatKey code:
using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;
// Create a local chat client (downloads Phi-3.5 mini on first run)
using var client = await LocalChatClient.CreateAsync();
var response = await client.GetResponseAsync([
new(ChatRole.User, "What is the capital of France?")
]);
Console.WriteLine(response.Text);What happens:
CreateAsync()downloads the default model (Phi-3.5 mini) from HuggingFace on first run- Sends a single user message through the ONNX Runtime GenAI inference pipeline
- Prints the complete response
Expected output:
The capital of France is Paris. Paris is the largest city in France and serves as
the country's political, economic, and cultural center.
Note: First run takes longer due to the model download (~2.4 GB). Subsequent runs start much faster.
What it demonstrates: Real-time token streaming — see the response appear word by word as it's generated.
Run it:
dotnet run --project samples/StreamingChatKey code:
using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;
using var client = await LocalChatClient.CreateAsync(new LocalLLMsOptions
{
Model = KnownModels.Phi35MiniInstruct
});
Console.WriteLine("Streaming response:");
await foreach (var update in client.GetStreamingResponseAsync([
new(ChatRole.System, "You are a helpful assistant."),
new(ChatRole.User, "Explain quantum computing in simple terms.")
]))
{
Console.Write(update.Text);
}
Console.WriteLine();What happens:
- Creates a client with an explicit model selection (
Phi35MiniInstruct) - Sends a system prompt and user message
- Uses
await foreachoverGetStreamingResponseAsyncto receiveChatResponseUpdateobjects - Each update contains one or more tokens — printed immediately with
Console.Write
Expected output:
Streaming response:
Quantum computing is a type of computing that uses quantum bits, or qubits, instead
of classical bits. While a classical bit can be either 0 or 1, a qubit can be both
0 and 1 at the same time — a property called superposition...
Tip: Streaming is ideal for chatbot UIs where users expect to see the response progressively.
What it demonstrates: Using different models for the same question — compare model outputs side by side.
Run it:
dotnet run --project samples/MultiModelChatKey code:
using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;
var question = new ChatMessage(ChatRole.User, "What is machine learning? Answer in one sentence.");
// Try with Phi-3.5 mini
Console.WriteLine("=== Phi-3.5 mini ===");
using (var client = await LocalChatClient.CreateAsync(new LocalLLMsOptions
{
Model = KnownModels.Phi35MiniInstruct
}))
{
var response = await client.GetResponseAsync([question]);
Console.WriteLine(response.Text);
}
// Try with Qwen 2.5 0.5B (tiny model)
Console.WriteLine("\n=== Qwen 2.5 0.5B ===");
using (var client = await LocalChatClient.CreateAsync(new LocalLLMsOptions
{
Model = KnownModels.Qwen25_05BInstruct
}))
{
var response = await client.GetResponseAsync([question]);
Console.WriteLine(response.Text);
}What happens:
- Defines a single question as a
ChatMessage - Creates a
LocalChatClientwith Phi-3.5 mini, asks the question, and prints the answer - Creates a second
LocalChatClientwith Qwen 2.5 0.5B, asks the same question - Each client downloads its model on first use (if not already cached)
Expected output:
=== Phi-3.5 mini ===
Machine learning is a subset of artificial intelligence where algorithms learn patterns
from data to make predictions or decisions without being explicitly programmed.
=== Qwen 2.5 0.5B ===
Machine learning is a field of AI that enables computers to learn from data and improve
their performance over time without explicit programming.
Note: Larger models generally produce more nuanced and accurate responses.
What it demonstrates: Using AddLocalLLMs() to register IChatClient in ASP.NET Core's dependency injection container, then injecting it into endpoints.
Run it:
dotnet run --project samples/DependencyInjectionThen test with:
curl -X POST http://localhost:5000/chat -d "What is the capital of France?"Key code:
using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddLocalLLMs(options =>
{
options.Model = KnownModels.Phi35MiniInstruct;
});
var app = builder.Build();
app.MapPost("/chat", async (IChatClient client, HttpContext ctx) =>
{
using var reader = new StreamReader(ctx.Request.Body);
var message = await reader.ReadToEndAsync();
var response = await client.GetResponseAsync([
new ChatMessage(ChatRole.User, message)
]);
return response.Text;
});
app.MapGet("/", () => "ElBruno.LocalLLMs — POST /chat with a message to chat!");
app.Run();What happens:
AddLocalLLMs()registersLocalChatClientasIChatClientin the DI container- The model is lazily initialized on the first request (no blocking during startup)
- The
/chatendpoint receivesIChatClientvia constructor injection - Any request body text is sent as a user message, and the model's response is returned
Expected output:
# Terminal shows:
info: Microsoft.Hosting.Lifetime[14]
Now listening on: http://localhost:5000
# curl response:
The capital of France is Paris.
Tip: Because
LocalChatClientimplementsIChatClient, you can swap it for Azure OpenAI, Ollama, or any other MEAI provider without changing your service code.
- 📖 Getting Started — full setup guide with GPU configuration
- 🎯 Supported Models — find the right model for your use case
- 📊 Benchmarks — measure performance on your hardware
- 🏗️ Architecture — understand the internal design