WeirdBench

WeirdBench is an unconventional LLM benchmarking site.

The site reads benchmark scores from Neon Postgres. Benchmark execution happens locally.

How It Works

Benchmark definitions live in lib/benchmarks.ts.
The website reads leaderboard data from Neon through lib/benchmark-store.ts.
Benchmark runner scripts execute locally, use your local env vars, and write scores into the database.
Scores are cached in Postgres by (benchmark_id, model_id), so an existing model score is never recomputed unless you explicitly delete it.

Current Benchmarks

nutrition-prediction
- Source: The Nutrition Prediction Benchmark
- Task: predict calories, protein, carbs, and fat from ingredient lists for a fixed 50-dish Nutrition5k sample
- Scoring: 0.6 * accuracy + 0.4 * correlation, where accuracy = 100 / (1 + avg_mape_percent)
- Ranking: higher is better
semantic-diversity
- Source: The Semantic Diversity Benchmark
- Task: generate exactly 20 English words that are maximally semantically unrelated
- Scoring: average pairwise semantic similarity
- Ranking: lower is better
orthographic-diversity
- Task: output 20 real English words that are maximally different in spelling under deterministic validity and overlap penalties
- Scoring: average pairwise Levenshtein distance minus penalties
- Ranking: higher is better

Environment

Expected in .env.local:

DATABASE_URL
OPENROUTER_API_KEY

Install

pnpm install

Run The Site

pnpm dev

Initialize The DB

pnpm db:init

This is required before running the app or benchmark scripts against a fresh database. Runtime code does not auto-create tables.

Add A Model To Nutrition Prediction

pnpm benchmark:nutrition-prediction <model-id>

Examples:

pnpm benchmark:nutrition-prediction openai/gpt-oss-120b
pnpm benchmark:nutrition-prediction anthropic/claude-opus-4.1 openai/gpt-oss-120b

Behavior:

Runs locally.
Uses OPENROUTER_API_KEY.
Fetches Nutrition5k metadata from the public Google bucket and deterministically samples the same 50 dishes each time.
Writes the score to Neon using DATABASE_URL.
Returns cached data immediately if that model already has a stored score.

Add A Model To Semantic Diversity

pnpm benchmark:semantic-diversity <model-id>

Examples:

pnpm benchmark:semantic-diversity google/gemini-2.5-pro
pnpm benchmark:semantic-diversity anthropic/claude-opus-4.1
pnpm benchmark:semantic-diversity openai/gpt-5
pnpm benchmark:semantic-diversity google/gemini-2.5-pro,anthropic/claude-opus-4.1,openai/gpt-5
pnpm benchmark:semantic-diversity google/gemini-2.5-pro anthropic/claude-opus-4.1 openai/gpt-5

Behavior:

Runs locally.
Uses OPENROUTER_API_KEY.
Writes the score to Neon using DATABASE_URL.
Returns cached data immediately if that model already has a stored score.

Common Commands

pnpm dev
pnpm lint
pnpm build
pnpm db:init
pnpm benchmark:nutrition-prediction <model-id>
pnpm benchmark:semantic-diversity <model-id>

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
app		app
lib		lib
public		public
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WeirdBench

How It Works

Current Benchmarks

Environment

Install

Run The Site

Initialize The DB

Add A Model To Nutrition Prediction

Add A Model To Semantic Diversity

Common Commands

Relevant Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WeirdBench

How It Works

Current Benchmarks

Environment

Install

Run The Site

Initialize The DB

Add A Model To Nutrition Prediction

Add A Model To Semantic Diversity

Common Commands

Relevant Files

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages