-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Open
Description
Problem
Participants using RunPod to evaluate their submissions are being charged for GPU provisioning/boot time, even when pods never reach a usable state. This makes iterating on submissions prohibitively expensive for individual contributors.
My Experience
Over 5 pod creation attempts across two days (March 25-26, 2026), I was charged $8.65 without completing a single training run:
| Pod ID | Cloud Tier | Timeout | Result |
|---|---|---|---|
dh821zsbo1s1ee |
SECURE | ~15min manual kill | Never booted |
vrz82l0ml9qans |
SECURE | ~15min manual kill | Never booted |
tx6ibgui70rl5u |
ALL | ~15min manual kill | Never booted |
owotqbnqk6el4v |
ALL | 120s auto-kill | Never booted |
gesb3y7hq454zq |
ALL | 180s auto-kill | Never booted |
- Starting balance: $25.00
- Ending balance: $16.35
- Pods that completed training: 0 out of 5
- Current spend rate: $0.00/hr (confirmed no pods running)
Key Issue
RunPod's own API reports stockStatus: "High" for 8xH100 SXM at the time of launch, yet pods consistently fail to boot within 3 minutes. Billing begins during the "provisioning" phase before the container is usable, meaning competitors are charged for infrastructure they never actually use.
π H100 SXM x8 β Stock: π’ High ($21.52/hr on-demand)
Despite "High" availability, no pod ever reached SSH-ready state.
Impact on Competition Fairness
- The competition requires 8xH100 GPUs for official evaluation (10-minute wallclock constraint)
- Individual participants with limited budgets ($25-50) can exhaust their credits just attempting to boot pods, before any training happens
- This creates an uneven playing field where only participants with large cloud budgets or institutional backing can afford to iterate on submissions
- OpenAI covers 100% of RunPod's evaluation costs, but individual participants testing their code bear the full risk of failed provisioning charges
Suggestions
- Provide official evaluation infrastructure β A shared evaluation endpoint where participants can submit their
train_gpt.pyand receiveval_bpbresults without managing cloud GPU provisioning themselves - Document recommended cloud providers with reliable 8xH100 availability and fair billing (no charges during provisioning)
- Provide evaluation credits to active participants so failed provisioning attempts don't block participation
- Add a local evaluation mode β Even a rough approximation on smaller hardware (e.g., single GPU short run) would help participants validate their code before committing to expensive 8xH100 runs
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels