Hi, I'm Kevin.
AI Engineer @ Narrative AI — building AI that helps fashion brands coordinate with their manufacturers without drowning in WhatsApp messages and Excel trackers.
HKUST Computer Engineering + AI. Based in Hong Kong.
Note
I build production AI systems, and I've learned a few things the hard way. If you can't measure whether your model is getting better or worse, you're guessing — I built a 34-task LLM-as-judge benchmark because "looks good to me" isn't a test suite. Agents that silently produce garbage when they get stuck are more dangerous than no agent at all, so I design failure recovery into agentic workflows from day one. And inference cost isn't a DevOps afterthought — it's a design constraint that shapes everything from model selection to deployment architecture.
| Project | What it does | Why I built it |
|---|---|---|
| story-bench | 34-task LLM narrative benchmark with a 3-model judge ensemble | Existing evals weren't catching the failure modes I care about |
| llm-cloud-inference | Production vLLM serving Qwen3-8B-AWQ on Google Cloud Run | Wanted an OpenAI-compatible endpoint that scales to zero instead of idling at $2/hr |
| document-mcp | 26 MCP tools for large-scale document management | Documents are the atomic unit of most business workflows — needed structured ops on unstructured content |
| stockchat | DSPy-powered stock analysis agent with RAG and technical indicators | Built to explore DSPy's optimization pipeline on a real domain with messy data |
- APRU × Google Tech Policy Hackathon 2025 — AI-powered agricultural credit scoring with NASA satellite imagery and SHAP explainability
- Email: chilonchin@gmail.com
- LinkedIn: linkedin.com/in/clchinkc




