RunPod infrastructure costs: charged $8.65 for pods that never ran training

## Problem

Participants using RunPod to evaluate their submissions are being charged for GPU provisioning/boot time, even when pods never reach a usable state. This makes iterating on submissions prohibitively expensive for individual contributors.

## My Experience

Over 5 pod creation attempts across two days (March 25-26, 2026), I was charged **$8.65** without completing a single training run:

| Pod ID | Cloud Tier | Timeout | Result |
|--------|-----------|---------|--------|
| `dh821zsbo1s1ee` | SECURE | ~15min manual kill | Never booted |
| `vrz82l0ml9qans` | SECURE | ~15min manual kill | Never booted |
| `tx6ibgui70rl5u` | ALL | ~15min manual kill | Never booted |
| `owotqbnqk6el4v` | ALL | 120s auto-kill | Never booted |
| `gesb3y7hq454zq` | ALL | 180s auto-kill | Never booted |

- **Starting balance:** $25.00
- **Ending balance:** $16.35
- **Pods that completed training:** 0 out of 5
- **Current spend rate:** $0.00/hr (confirmed no pods running)

## Key Issue

RunPod's own API reports `stockStatus: "High"` for 8xH100 SXM at the time of launch, yet pods consistently fail to boot within 3 minutes. Billing begins during the "provisioning" phase before the container is usable, meaning competitors are charged for infrastructure they never actually use.

```
🔍 H100 SXM x8 → Stock: 🟢 High ($21.52/hr on-demand)
```

Despite "High" availability, no pod ever reached SSH-ready state.

## Impact on Competition Fairness

- The competition requires 8xH100 GPUs for official evaluation (10-minute wallclock constraint)
- Individual participants with limited budgets ($25-50) can exhaust their credits just attempting to boot pods, before any training happens
- This creates an uneven playing field where only participants with large cloud budgets or institutional backing can afford to iterate on submissions
- OpenAI covers 100% of RunPod's evaluation costs, but individual participants testing their code bear the full risk of failed provisioning charges

## Suggestions

1. **Provide official evaluation infrastructure** — A shared evaluation endpoint where participants can submit their `train_gpt.py` and receive `val_bpb` results without managing cloud GPU provisioning themselves
2. **Document recommended cloud providers** with reliable 8xH100 availability and fair billing (no charges during provisioning)
3. **Provide evaluation credits** to active participants so failed provisioning attempts don't block participation
4. **Add a local evaluation mode** — Even a rough approximation on smaller hardware (e.g., single GPU short run) would help participants validate their code before committing to expensive 8xH100 runs


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RunPod infrastructure costs: charged $8.65 for pods that never ran training #821

Problem

My Experience

Key Issue

Impact on Competition Fairness

Suggestions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pod ID	Cloud Tier	Timeout	Result
`dh821zsbo1s1ee`	SECURE	~15min manual kill	Never booted
`vrz82l0ml9qans`	SECURE	~15min manual kill	Never booted
`tx6ibgui70rl5u`	ALL	~15min manual kill	Never booted
`owotqbnqk6el4v`	ALL	120s auto-kill	Never booted
`gesb3y7hq454zq`	ALL	180s auto-kill	Never booted

RunPod infrastructure costs: charged $8.65 for pods that never ran training #821

Description

Problem

My Experience

Key Issue

Impact on Competition Fairness

Suggestions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions