Skip to content
View clchinkc's full-sized avatar

Highlights

  • Pro

Block or report clchinkc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
clchinkc/README.md

Hi, I'm Kevin.

AI Engineer @ Narrative AI — building AI that helps fashion brands coordinate with their manufacturers without drowning in WhatsApp messages and Excel trackers.

HKUST Computer Engineering + AI. Based in Hong Kong.

Note

I build production AI systems, and I've learned a few things the hard way. If you can't measure whether your model is getting better or worse, you're guessing — I built a 34-task LLM-as-judge benchmark because "looks good to me" isn't a test suite. Agents that silently produce garbage when they get stuck are more dangerous than no agent at all, so I design failure recovery into agentic workflows from day one. And inference cost isn't a DevOps afterthought — it's a design constraint that shapes everything from model selection to deployment architecture.


Projects

Project What it does Why I built it
story-bench 34-task LLM narrative benchmark with a 3-model judge ensemble Existing evals weren't catching the failure modes I care about
llm-cloud-inference Production vLLM serving Qwen3-8B-AWQ on Google Cloud Run Wanted an OpenAI-compatible endpoint that scales to zero instead of idling at $2/hr
document-mcp 26 MCP tools for large-scale document management Documents are the atomic unit of most business workflows — needed structured ops on unstructured content
stockchat DSPy-powered stock analysis agent with RAG and technical indicators Built to explore DSPy's optimization pipeline on a real domain with messy data

Toolbox


Hackathons

  • APRU × Google Tech Policy Hackathon 2025 — AI-powered agricultural credit scoring with NASA satellite imagery and SHAP explainability

Get in touch


GitHub contribution snake

Profile views

Pinned Loading

  1. story-bench story-bench Public

    Story Theory Benchmark - LLM Narrative Generation Evaluation

    Python 4

  2. llm-cloud-inference llm-cloud-inference Public

    vLLM-powered Qwen3-8B-AWQ inference API on Google Cloud Run. OpenAI-compatible, scale-to-zero serverless.

    Python

  3. document-mcp document-mcp Public

    Model Context Protocol Tools for Document Management

    Python

  4. stockchat stockchat Public archive

    Personal project, Generative AI, Python, FastAPI, React, Docker/Docker Compose

    TypeScript 149 27

  5. open-spec-ralph-wiggum open-spec-ralph-wiggum Public

    Multi-LLM creative collaboration experiment with 5-model jury evaluation, iterative hill-climbing optimization, protocol synthesis, and cross-session strategy memory.

    Python

  6. story_crowdsource_preference story_crowdsource_preference Public

    Personal project, Generative AI, Python, Streamlit, Supabase, PyTorch

    Python 1