A hand-curated collection of resources for Prompt Engineering and Context Engineering β covering papers, tools, models, APIs, benchmarks, courses, and communities for working with Large Language Models.
New to prompt engineering? Follow this path:
- Learn the basics β ChatGPT Prompt Engineering for Developers (free, ~90 min)
- Read the guide β Prompt Engineering Guide by DAIR.AI (open-source, comprehensive)
- Study provider docs β OpenAI Prompt Engineering Guide Β· Anthropic Prompt Engineering Guide
- Understand where the field is heading β Anthropic: Effective Context Engineering for AI Agents
- Read the research β The Prompt Report β taxonomy of 58+ prompting techniques from 1,500+ papers
- Papers
- Major Surveys
- Prompt Optimization and Automatic Prompting
- Prompt Compression
- Reasoning Advances
- In-Context Learning
- Agentic Prompting and Multi-Agent Systems
- Multimodal Prompting
- Structured Output and Format Control
- Prompt Injection and Security
- Applications of Prompt Engineering
- Text-to-Image Generation
- Text-to-Music/Audio Generation
- Foundational Papers (Pre-2024)
- Tools and Code
- APIs
- Datasets and Benchmarks
- Models
- AI Content Detectors
- Books
- Courses
- Tutorials and Guides
- Videos
- Communities
- How to Contribute
π
- The Prompt Report: A Systematic Survey of Prompting Techniques [2024] β Most comprehensive survey: taxonomy of 58 text and 40 multimodal prompting techniques from 1,500+ papers. Co-authored with OpenAI, Microsoft, Google, Stanford.
- A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications [2024] β 44 techniques across application areas with per-task performance summaries.
- A Survey of Prompt Engineering Methods in LLMs for Different NLP Tasks [2024] β 39 prompting methods across 29 NLP tasks.
- A Survey of Automatic Prompt Engineering: An Optimization Perspective [2025] β Formalizes auto-PE methods as discrete/continuous/hybrid optimization problems.
- Efficient Prompting Methods for Large Language Models: A Survey [2024] β Survey of efficiency-oriented prompting (compression, optimization, APE) for reducing compute and latency.
- Navigate through Enigmatic Labyrinth: A Survey of Chain of Thought Reasoning [2023, ACL 2024] β Systematic CoT survey.
- Demystifying Chains, Trees, and Graphs of Thoughts [2024] β Unified framework for multi-prompt reasoning topologies.
- Towards Goal-oriented Prompt Engineering for Large Language Models: A Survey [2024] β Focuses on prompts designed around explicit task goals.
- Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning LLMs [2025] β Distinguishes Long CoT from Short CoT in o1/R1-era models.
- OPRO: Large Language Models as Optimizers [2023, NeurIPS 2024] β Uses LLMs as optimizers via meta-prompts; optimized prompts outperform human-designed ones by up to 50% on BBH.
- DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines [2023, ICLR 2024] β Framework for programming (not prompting) LLMs with automatic prompt optimization.
- MIPRO: Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs [2024, EMNLP 2024] β Bayesian optimization for multi-stage LM programs; up to 13% accuracy gains.
- TextGrad: Automatic "Differentiation" via Text [2024] β Treats compound AI systems as computation graphs with textual feedback as gradients. Published in Nature.
- EvoPrompt [2023, ACL 2024] β Evolutionary algorithm approach for automatically optimizing discrete prompts.
- Meta Prompting for AI Systems [2023, ICLR 2024 Workshop] β Example-agnostic structural templates formalized using category theory.
- Prompt Engineering a Prompt Engineer (PEΒ²) [2024, ACL Findings] β Uses LLMs to meta-prompt themselves, refining prompts with step-by-step templates to significantly improve reasoning.
- Large Language Models Are Human-Level Prompt Engineers [2022] β Automatic prompt generation via APE.
- Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning [2023]
- SPO: Self-Supervised Prompt Optimization [2025] β Competitive performance at 1β6% of the cost of prior methods.
- LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression [2024, ACL 2024] β 3xβ6x faster than LLMLingua with GPT-4 data distillation.
- LongLLMLingua [2023, ACL 2024] β Question-aware compression for long contexts; 21.4% performance boost with 4x fewer tokens.
- Prompt Compression for Large Language Models: A Survey [2024] β Comprehensive survey of hard and soft prompt compression methods.
- Scaling LLM Test-Time Compute Optimally [2024] β Shows optimal test-time compute allocation can outperform 14x larger models.
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [2025] β Pure RL-trained reasoning model matching o1; open-source with distilled variants.
- s1: Simple Test-Time Scaling [2025] β SFT on just 1,000 examples creates competitive reasoning model via "budget forcing."
- Reasoning Language Models: A Blueprint [2025] β Systematic framework organizing reasoning LM approaches.
- Demystifying Long Chain-of-Thought Reasoning in LLMs [2025] β Analyzes long CoT behavior in modern reasoning models.
- Graph of Thoughts: Solving Elaborate Problems with LLMs [2023, AAAI 2024] β Models thoughts as arbitrary graphs; 62% quality improvement over ToT on sorting.
- Tree of Thoughts: Deliberate Problem Solving with LLMs [2023, NeurIPS 2023] β Tree search over reasoning paths.
- Everything of Thoughts [2023] β Integrates CoT, ToT, and external solvers via MCTS.
- Skeleton-of-Thought [2023] β Parallel decoding via answer skeleton generation for up to 2.69x speedup.
- Chain of Thought Prompting Elicits Reasoning in Large Language Models [2022] β The foundational CoT paper.
- Self-Consistency Improves Chain of Thought Reasoning [2022] β Aggregating multiple CoT outputs for reliability.
- Large Language Models are Zero-Shot Reasoners [2022] β "Let's think step by step" as a zero-shot reasoning trigger.
- ReAct: Synergizing Reasoning and Acting in Language Models [2022] β Interleaving reasoning and tool use.
- Many-Shot In-Context Learning [2024, NeurIPS 2024 Spotlight] β Significant gains scaling ICL to hundreds/thousands of examples; introduces Reinforced and Unsupervised ICL.
- Many-Shot In-Context Learning in Multimodal Foundation Models [2024] β Scales multimodal ICL to ~2,000 examples across 14 datasets.
- Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? [2022]
- Fantastically Ordered Prompts and Where to Find Them [2021] β Overcoming few-shot prompt order sensitivity.
- Calibrate Before Use: Improving Few-Shot Performance of Language Models [2021]
- Agentic Large Language Models: A Survey [2025] β Comprehensive survey organizing agentic LLMs by reasoning, acting, and interacting capabilities.
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges [2024] β Covers profiling, communication, and growth mechanisms.
- Multi-Agent Collaboration Mechanisms: A Survey of LLMs [2025] β Reviews debate and cooperation strategies in LLM-based multi-agent systems.
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation [2023] β Microsoft's foundational multi-agent framework paper.
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-World APIs [2023, ICLR 2024] β Trains LLMs to use massive real-world API collections.
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [2023, ICLR 2024] β The benchmark driving agentic coding progress.
- AgentBench: Evaluating LLMs as Agents [2023, ICLR 2024] β Benchmark across 8 environments.
- PAL: Program-aided Language Models [2023] β Offloading computation to code interpreters.
- Visual Prompting in Multimodal Large Language Models: A Survey [2024] β First comprehensive survey on visual prompting methods in MLLMs.
- Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V [2023] β Visual markers dramatically improve visual grounding.
- A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks [2024] β Covers text, image, video, audio MLLMs.
- Multimodal Chain-of-Thought Reasoning in Language Models [2023]
- From Prompt Engineering to Prompt Craft [2024] β Design-research view of prompt "craft" for diffusion models.
- Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of LLMs [2024] β Examines how constraining outputs to structured formats impacts reasoning performance.
- Batch Prompting: Efficient Inference with LLM APIs [2023]
- Structured Prompting: Scaling In-Context Learning to 1,000 Examples [2022]
- Formalizing and Benchmarking Prompt Injection Attacks and Defenses [2023, USENIX Security 2024] β Formal framework with systematic evaluation of 5 attacks and 10 defenses across 10 LLMs.
- The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions [2024] β OpenAI's priority-level training for injection defense.
- AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses [2024] β Realistic agent scenario benchmark.
- InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents [2024]
- SecAlign: Defending Against Prompt Injection with Preference Optimization [2024] β DPO-based defense.
- WASP: Benchmarking Web Agent Security Against Prompt Injection [2025] β Security benchmark for web/computer-use agents.
- Many-Shot Jailbreaking [2024] β Scaling harmful examples in long-context windows enables jailbreaking (Anthropic Technical Report).
- Constitutional AI: Harmlessness from AI Feedback [2022]
- Ignore Previous Prompt: Attack Techniques For Language Models [2022]
- Artificial Intelligence and Cybersecurity: Documented Risks, Enterprise Guardrails, and Emerging Threats in 2024β2025 [2025] β Survey of real prompt-injection incidents with practical governance prompt patterns.
- Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [2023]
- Legal Prompt Engineering for Multilingual Legal Judgement Prediction [2023]
- Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems [2022]
- Commonsense-Aware Prompting for Controllable Empathetic Dialogue Generation [2023]
- PLACES: Prompting Language Models for Social Conversation Synthesis [2023]
- Medical Image Segmentation Using Transformer Encoders and Prompt-Based Learning: A Systematic Review [2025]
- TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning [2025] β SQL-based interface preserving tabular structure for multi-hop queries.
- A Taxonomy of Prompt Modifiers for Text-To-Image Generation [2022]
- Design Guidelines for Prompt Engineering Text-to-Image Generative Models [2021]
- High-Resolution Image Synthesis with Latent Diffusion Models [2021]
- DALLΒ·E: Creating Images from Text [2021]
- Investigating Prompt Engineering in Diffusion Models [2022]
- MusicLM: Generating Music From Text [2023]
- ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models [2023]
- AudioLM: A Language Modeling Approach to Audio Generation [2023]
- Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models [2023]
These papers established the core concepts that modern prompt engineering builds on:
- Language Models are Few-Shot Learners (GPT-3) [2020] β Demonstrated few-shot prompting at scale.
- Prefix-Tuning: Optimizing Continuous Prompts for Generation [2021]
- The Power of Scale for Parameter-Efficient Prompt Tuning [2021]
- Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm [2021]
- Show Your Work: Scratchpads for Intermediate Computation with Language Models [2021]
- Generated Knowledge Prompting for Commonsense Reasoning [2021]
- Making Pre-trained Language Models Better Few-shot Learners [2021]
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts [2020]
- How Can We Know What Language Models Know? [2020]
- A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT [2023]
- Synthetic Prompting: Generating Chain-of-Thought Demonstrations for LLMs [2023]
- Progressive Prompts: Continual Learning for Language Models [2023]
- Successive Prompting for Decompleting Complex Questions [2022]
- Decomposed Prompting: A Modular Approach for Solving Complex Tasks [2022]
- PromptChainer: Chaining Large Language Model Prompts through Visual Programming [2022]
- Ask Me Anything: A Simple Strategy for Prompting Language Models [2022]
- Prompting GPT-3 To Be Reliable [2022]
- On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning [2022]
π§
| Name | Description | Link |
|---|---|---|
| Promptfoo | Open-source CLI for testing, evaluating, and red-teaming LLM prompts. YAML configs, CI/CD integration, adversarial testing. ~9K+ β | GitHub |
| Promptify | Solve NLP Problems with LLM's & Easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more with Promptify | [Github] |
| Agenta | Open-source LLM developer platform for prompt management, evaluation, human feedback, and deployment. | GitHub |
| PromptLayer | Version, test, and monitor every prompt and agent with robust evals, tracing, and regression sets. | Website |
| Helicone | Production prompt monitoring and optimization platform. | Website |
| LangGPT | Framework for structured and meta-prompt design. 10K+ β | GitHub |
| ChainForge | Visual toolkit for building, testing, and comparing LLM prompt responses without code. | GitHub |
| LMQL | A query language for LLMs making complex prompt logic programmable. | GitHub |
| Promptotype | Platform for developing, testing, and managing structured LLM prompts. | Website |
| PromptPanda | AI-powered prompt management system for streamlining prompt workflows. | Website |
| Promptimize AI | Browser extension to automatically improve user prompts for any AI model. | Website |
| PROMPTMETHEUS | Web-based "Prompt Engineering IDE" for iteratively creating and running prompts. | Website |
| Better Prompt | Test suite for LLM prompts before pushing to production. | GitHub |
| OpenPrompt | Open-source framework for prompt-learning research. | GitHub |
| Prompt Source | Toolkit for creating, sharing, and using natural language prompts. | GitHub |
| Prompt Engine | NPM utility library for creating and maintaining prompts for LLMs (Microsoft). | GitHub |
| PromptInject | Framework for quantitative analysis of LLM robustness to adversarial prompt attacks. | GitHub |
| Name | Description | Link |
|---|---|---|
| DeepEval | Open-source evaluation framework covering RAG, agents, and conversations with CI/CD integration. ~7K+ β | GitHub |
| Ragas | RAG evaluation with knowledge-graph-based test set generation and 30+ metrics. ~8K+ β | GitHub |
| LangSmith | LangChain's platform for debugging, testing, evaluating, and monitoring LLM applications. | Website |
| Langfuse | Open-source LLM observability with tracing, prompt management, and human annotation. ~7K+ β | GitHub |
| Braintrust | End-to-end AI evaluation platform, SOC2 Type II certified. | Website |
| Arize AI / Phoenix | Real-time LLM monitoring with drift detection and tracing. | GitHub |
| TruLens | Evaluating and explaining LLM apps; tracks hallucinations, relevance, groundedness. | GitHub |
| InspectAI | Purpose-built for evaluating agents against benchmarks (UK AISI). | GitHub |
| Opik | Evaluate, test, and ship LLM applications across dev and production lifecycles. | GitHub |
| Name | Description | Link |
|---|---|---|
| LangChain / LangGraph | Most widely adopted LLM app framework; LangGraph adds graph-based multi-step agent workflows. ~100K+ / ~10K+ β | GitHub Β· LangGraph |
| CrewAI | Role-playing AI agent orchestration with 700+ integrations. ~44K+ β | GitHub |
| AutoGen (AG2) | Microsoft's multi-agent conversational framework. ~40K+ β | GitHub |
| DSPy | Stanford's framework for programming LLMs with automatic prompt/weight optimization. ~22K+ β | GitHub |
| OpenAI Agents SDK | Official agent framework with function calling, guardrails, and handoffs. ~10K+ β | GitHub |
| Semantic Kernel | Microsoft's AI framework powering M365 Copilot; C#, Python, Java. ~24K+ β | GitHub |
| LlamaIndex | Data framework for RAG and agent capabilities. ~40K+ β | GitHub |
| Haystack | Open-source NLP framework with pipeline architecture for RAG and agents. ~20K+ β | GitHub |
| Agno (formerly Phidata) | Python agent framework with microsecond instantiation. ~20K+ β | GitHub |
| Smolagents | Hugging Face's minimalist code-centric agent framework (~1000 LOC). ~15K+ β | GitHub |
| Pydantic AI | Type-safe agent framework using Pydantic for structured validation. ~8K+ β | GitHub |
| Mastra | TypeScript AI agent framework with assistants, RAG, and observability. ~20K+ β | GitHub |
| Google ADK | Agent Development Kit deeply integrated with Gemini and Google Cloud. | GitHub |
| Strands Agents (AWS) | Model-agnostic framework with deep AWS integrations. | GitHub |
| Langflow | Node-based visual agent builder with drag-and-drop. ~50K+ β | GitHub |
| n8n | Workflow automation with AI agent capabilities and 400+ integrations. ~60K+ β | GitHub |
| Dify | All-in-one backend for agentic workflows with tool-using agents and RAG. | GitHub |
| PraisonAI | Multi-AI Agents framework with 100+ LLM support, MCP integration, and built-in memory. | GitHub |
| Neurolink | Multi-provider AI agent framework unifying 12+ providers with workflow orchestration. | GitHub |
| Composio | Connect 100+ tools to AI agents with zero setup. | GitHub |
| Name | Description | Link |
|---|---|---|
| DSPy | Multiple optimizers (MIPROv2, BootstrapFewShot, COPRO) for automatic prompt tuning. ~22K+ β | GitHub |
| TextGrad | Automatic differentiation via text (Stanford). ~2K+ β | GitHub |
| OPRO | Google DeepMind's optimization by prompting. | GitHub |
| Name | Description | Link |
|---|---|---|
| Garak (NVIDIA) | LLM vulnerability scanner for hallucination, injection, and jailbreaks β the "nmap for LLMs." ~3K+ β | GitHub |
| PyRIT (Microsoft) | Python Risk Identification Tool for automated red-teaming. ~3K+ β | GitHub |
| DeepTeam | 40+ vulnerabilities, 10+ attack methods, OWASP Top 10 support. | GitHub |
| LLM Guard | Security toolkit for LLM I/O validation. ~2K+ β | GitHub |
| NeMo Guardrails (NVIDIA) | Programmable guardrails for conversational systems. ~5K+ β | GitHub |
| Guardrails AI | Define strict output formats (JSON schemas) to ensure system reliability. | Website |
| Lakera | AI security platform for real-time prompt injection detection. | Website |
| Purple Llama (Meta) | Open-source LLM safety evaluation including CyberSecEval. | GitHub |
| GPTFuzz | Automated jailbreak template generation achieving >90% success rates. | GitHub |
| Rebuff | Open-source tool for detection and prevention of prompt injection. | GitHub |
MCP is an open standard developed by Anthropic (Nov 2024, donated to Linux Foundation Dec 2025) for connecting AI assistants to external data sources and tools through a standardized interface. It has 97M+ monthly SDK downloads and has been adopted by GitHub, Google, and most major AI providers.
| Name | Description | Link |
|---|---|---|
| MCP Specification | The core protocol specification and SDKs. ~15K+ β | GitHub |
| MCP Reference Servers | Official implementations: fetch, filesystem, GitHub, Slack, Postgres. | GitHub |
| FastMCP (Python) | High-level Pythonic framework for building MCP servers. ~5K+ β | GitHub |
| GitHub MCP Server | GitHub's official MCP server for repo, issue, PR, and Actions interaction. ~15K+ β | GitHub |
| Awesome MCP Servers | Curated list of 10,000+ community MCP servers. ~30K+ β | GitHub |
| Context7 | MCP server providing version-specific documentation to reduce code hallucination. | GitHub |
| GitMCP | Creates remote MCP servers for any GitHub repo by changing the domain. | Website |
| MCP Inspector | Visual testing tool for MCP server development. | GitHub |
| Name | Description | Link |
|---|---|---|
| Claude Code | Anthropic's command-line AI coding tool; widely considered one of the best AI coding assistants (2026). | Docs |
| Cursor | AI-native code editor; Composer feature generates entire applications from natural language. | Website |
| Windsurf (Codeium) | "First agentic IDE" with multi-file editing and project-wide context. | Website |
| GitHub Copilot | AI pair programmer; ~30% of new GitHub code comes from Copilot. | Website |
| Aider | Open-source terminal AI pair programmer with Git integration. ~25K+ β | GitHub |
| Cline | Open-source VS Code AI assistant connecting editor and terminal through MCP. ~20K+ β | GitHub |
| Continue | Open-source IDE extensions for custom AI code assistants. ~22K+ β | GitHub |
| OpenAI Codex CLI | Lightweight terminal coding agent. | GitHub |
| Gemini CLI | Google's open-source terminal AI agent. | GitHub |
| Bolt.new | Browser-based prompt-to-app generation with one-click deployment. | Website |
| Lovable | Full-stack apps from natural language descriptions. | Website |
| v0 (Vercel) | AI assistant for building Next.js frontend components from text. | Website |
| Firebase Studio | Google's agentic cloud-based development environment. | Website |
| Name | Description | Link |
|---|---|---|
| Prompt Engineering Guide (DAIR.AI) | The definitive open-source guide and resource hub. 3M+ learners. ~55K+ β | GitHub |
| Awesome ChatGPT Prompts / Prompts.chat | World's largest open-source prompt library. 1000s of prompts for all major models. | GitHub |
| 12-Factor Agents | Principles for building production-grade LLM-powered software. ~17K+ β | GitHub |
| NirDiamant/Prompt_Engineering | 22 hands-on Jupyter Notebook tutorials. ~3K+ β | GitHub |
| Context Engineering Repository | First-principles handbook for moving beyond prompt engineering to context design. | GitHub |
| AI Agent System Prompts Library | Collection of system prompts from production AI coding agents (Claude Code, Gemini CLI, Cline, Aider, Roo Code). | GitHub |
| Awesome Vibe Coding | Curated list of 245+ tools and resources for building software through natural language prompts. | GitHub |
| OpenAI Cookbook | Official recipes for prompts, tools, RAG, and evaluations. | GitHub |
| Embedchain | Framework to create ChatGPT-like bots over your dataset. | GitHub |
| ThoughtSource | Framework for the science of machine thinking. | GitHub |
| Promptext | Extracts and formats code context for AI prompts with token counting. | GitHub |
| Price Per Token | Compare LLM API pricing across 200+ models. | Website |
π»
| Model | Context | Price (Input/Output per 1M tokens) | Key Feature |
|---|---|---|---|
| GPT-5.2 / 5.2 Thinking | 400K | $1.75 / $14 | Latest flagship, 90% cached discount, configurable reasoning |
| GPT-5.1 | 400K | $1.25 / $10 | Previous generation flagship |
| GPT-4.1 / 4.1 mini / nano | 1M | $2 / $8 | Best non-reasoning model, 40% faster and 80% cheaper than GPT-4o |
| o3 / o3-pro | 200K | Varies | Reasoning models with native tool use |
| o4-mini | 200K | Cost-efficient | Fast reasoning, best on AIME at its cost class |
| GPT-OSS-120B / 20B | 128K | $0.03 / $0.30 | First open-weight models, Apache 2.0 |
Key features: Responses API, Agents SDK, Structured Outputs, function calling, prompt caching (90% discount), Batch API (50% discount), MCP support. Platform Docs
| Model | Context | Price (Input/Output per 1M tokens) | Key Feature |
|---|---|---|---|
| Claude Opus 4.6 | 1M (beta) | $5 / $25 | Most powerful, state-of-the-art coding and agentic tasks |
| Claude Sonnet 4.5 | 200K | $3 / $15 | Best coding model, 61.4% OSWorld (computer use) |
| Claude Haiku 4.5 | 200K | Fast tier | Near-frontier, fastest model class |
| Claude Opus 4 / Sonnet 4 | 200K | $15/$75 (Opus) | Opus: 72.5% SWE-bench, Sonnet 4 powers GitHub Copilot |
Key features: Extended Thinking with tool use, Computer Use, MCP (originated here), prompt caching, Claude Code CLI, available on AWS Bedrock and Google Vertex AI. API Docs
| Model | Context | Price (Input/Output per 1M tokens) | Key Feature |
|---|---|---|---|
| Gemini 3 Pro Preview | 1M | $2 / $12 | Most intelligent Google model, deployed to 2B+ Search users |
| Gemini 2.5 Pro | 1M | $1.25 / $10 | Best for coding/agentic tasks, thinking model |
| Gemini 2.5 Flash / Flash-Lite | 1M | $0.30/$1.50 Β· $0.10/$0.40 | Price-performance leaders |
Key features: Thinking (all 2.5+ models), Google Search grounding, code execution, Live API (real-time audio/video), context caching. Google AI Studio
| Model | Architecture | Context | Key Feature |
|---|---|---|---|
| Llama 4 Scout | 109B MoE / 17B active | 10M | Fits single H100, multimodal, open-weight |
| Llama 4 Maverick | 400B MoE / 17B active, 128 experts | 1M | Beats GPT-4o, open-weight |
| Llama 3.3 70B | Dense | 128K | Matches Llama 3.1 405B |
Available on 25+ cloud partners, Hugging Face, and inference APIs. Llama
| Provider | Description | Link |
|---|---|---|
| Mistral AI | Mistral Large 3 (675B MoE), Devstral 2, Ministral 3. Apache 2.0. | Website |
| DeepSeek | V3.2 (671B MoE), R1 (reasoning, MIT license). $0.15/$0.75 per 1M tokens. | Website |
| xAI (Grok) | Grok 4.1 Fast: 2M context, $0.20/$0.50 per 1M tokens. | Website |
| Cohere | Command A (111B, 256K context), Embed v4, Rerank 4.0. Excels at RAG. | Website |
| Together AI | 200+ open models with sub-100ms latency. | Website |
| Groq | LPU hardware with ~300+ tokens/sec inference. | Website |
| Fireworks AI | Fast inference with HIPAA + SOC2 compliance. | Website |
| OpenRouter | Unified API for 300+ models from all providers. | Website |
| Cerebras | Wafer-scale chips with best total response time. | Website |
| Perplexity AI | Search-augmented API with citations. | Website |
| Amazon Bedrock | Managed multi-model service with Claude, Llama, Mistral, Cohere. | Website |
| Hugging Face Inference | Access to open models via API. | Website |
πΎ
| Name | Description | Link |
|---|---|---|
| Chatbot Arena / LM Arena | 6M+ user votes for Elo-rated pairwise LLM comparisons. De facto standard for human preference. | Website |
| MMLU-Pro | 12,000+ graduate-level questions across 14 domains. NeurIPS 2024 Spotlight. | GitHub |
| GPQA | 448 "Google-proof" STEM questions; non-expert validators achieve only 34%. | arXiv |
| SWE-bench Verified | Human-validated 500-task subset for real-world GitHub issue resolution. | Website |
| SWE-bench Pro | 1,865 tasks across 41 professional repos; best models score only ~23%. | Leaderboard |
| Humanity's Last Exam (HLE) | 2,500 expert-vetted questions; top AI scores only ~10β30%. | Website |
| BigCodeBench | 1,140 coding tasks across 7 domains; AI achieves ~35.5% vs. 97% human success. | Leaderboard |
| LiveBench | Contamination-resistant with frequently updated questions. | Paper |
| FrontierMath | Research-level math; AI solves only ~2% of problems. | Research |
| ARC-AGI v2 | Abstract reasoning measuring fluid intelligence. | Research |
| IFEval | Instruction-following evaluation with formatting/content constraints. | arXiv |
| MLE-bench | OpenAI's ML engineering evaluation via Kaggle-style tasks. | GitHub |
| PaperBench | Evaluates AI's ability to replicate 20 ICML 2024 papers from scratch. | GitHub |
| Name | Description | Link |
|---|---|---|
| Hugging Face Open LLM Leaderboard v2 | Evaluates open models on MMLU-Pro, GPQA, IFEval, MATH. | Leaderboard |
| Artificial Analysis Intelligence Index v3 | Aggregates 10 evaluations. | Website |
| SEAL by Scale AI | Hosts SWE-bench Pro and agentic evaluations. | Leaderboard |
| Name | Description | Link |
|---|---|---|
| P3 (Public Pool of Prompts) | Prompt templates for 270+ NLP tasks used to train T0 and similar models. | HuggingFace |
| System Prompts Dataset | 944 system prompt templates for agent workflows (by Daniel Rosehill, Aug 2025). | HuggingFace |
| OpenAssistant Conversations (OASST) | 161,443 messages in 35 languages with 461,292 quality ratings. | HuggingFace |
| UltraChat / UltraFeedback | Large-scale synthetic instruction and preference datasets for alignment training. | HuggingFace |
| SoftAge Prompt Engineering Dataset | 1,000 diverse prompts across 10 categories for benchmarking prompt performance. | HuggingFace |
| Text Transformation Prompt Library | Comprehensive collection of text transformation prompts (May 2025). | HuggingFace |
| Writing Prompts | ~300K human-written stories paired with prompts from r/WritingPrompts. | Kaggle |
| Midjourney Prompts | Text prompts and image URLs scraped from MidJourney's public Discord. | HuggingFace |
| CodeAlpaca-20k | 20,000 programming instruction-output pairs. | HuggingFace |
| ProPEX-RAG | Dataset for prompt optimization in RAG workflows. | HuggingFace |
| NanoBanana Trending Prompts | 1,000+ curated AI image prompts from X/Twitter, ranked by engagement. | GitHub |
| Name | Description | Link |
|---|---|---|
| HarmBench | 510 harmful behaviors across standard, contextual, copyright, and multimodal categories. | Website |
| JailbreakBench | Open robustness benchmark for jailbreaking with 100 prompts. | Research |
| AgentHarm | 110 malicious agent tasks across 11 harm categories. | arXiv |
| DecodingTrust | 243,877 prompts evaluating trustworthiness across 8 perspectives. | Research |
| SafetyPrompts.com | Aggregator tracking 50+ safety/red-teaming datasets. | Website |
π§
| Model | Provider | Context | Key Strength |
|---|---|---|---|
| GPT-5.2 | OpenAI | 400K | General intelligence, 100% AIME 2025 |
| Claude Opus 4.6 | Anthropic | 1M (beta) | Coding, agentic tasks, extended thinking |
| Gemini 3 Pro | 1M | #1 LMArena (~1500 Elo), multimodal | |
| Grok 4.1 | xAI | 2M | #2 LMArena (1483 Elo), low hallucination |
| Mistral Large 3 | Mistral AI | 256K | Best open-weight (675B MoE/41B active), Apache 2.0 |
| DeepSeek-V3.2 | DeepSeek | 128K | Best value (671B MoE/37B active), MIT license |
| Llama 4 Maverick | Meta | 1M | Beats GPT-4o (400B MoE/17B active), open-weight |
| Model | Key Detail |
|---|---|
| OpenAI o3 / o3-pro | 87.7% GPQA Diamond. Native tool use. |
| OpenAI o4-mini | Best AIME at its cost class with visual reasoning. |
| DeepSeek-R1 / R1-0528 | Open-weight, RL-trained. 87.5% on AIME 2025. MIT license. |
| QwQ (Qwen with Questions) | 32B reasoning model. Apache 2.0. Comparable to R1. |
| Gemini 2.5 Pro/Flash (Thinking) | Built-in reasoning with configurable thinking budget. |
| Claude Extended Thinking | Hybrid mode with visible chain-of-thought and tool use. |
| Phi-4 Reasoning / Plus | 14B reasoning models rivaling much larger models. Open-weight. |
| GPT-OSS-120B | OpenAI's open-weight with CoT. Near-parity with o4-mini. Apache 2.0. |
| Model | Provider | Key Detail |
|---|---|---|
| Qwen3-235B-A22B | Alibaba | Flagship MoE. Strong reasoning/code/multilingual. Apache 2.0. Most downloaded family on HuggingFace. |
| Gemma 3 | 270M to 27B. Multimodal. 128K context. 140+ languages. | |
| OLMo 2/3 | Allen AI | Fully open (data, code, weights, logs). OLMo 2 32B surpasses GPT-3.5. Apache 2.0. |
| SmolLM3-3B | Hugging Face | Outperforms Llama-3.2-3B. Dual-mode reasoning. 128K context. |
| Kimi K2 | Moonshot AI | 32B active. Open-weight. Tailored for coding/agentic use. |
| Llama 4 Scout | Meta | 109B MoE/17B active. 10M token context. Fits single H100. |
| Model | Key Detail |
|---|---|
| Qwen3-Coder (480B-A35B) | 69.6% SWE-bench β milestone for open-source coding. 256K context. Apache 2.0. |
| Devstral 2 (123B) | 72.2% SWE-bench Verified. 7x more cost-efficient than Claude Sonnet. |
| Codestral 25.01 | Mistral's code model. 80+ languages. Fill-in-the-Middle support. |
| DeepSeek-Coder-V2 | 236B MoE / 21B active. 338 programming languages. |
| Qwen 2.5-Coder | 7B/32B. 92 programming languages. 88.4% HumanEval. Apache 2.0. |
These models established key concepts but are largely superseded for practical use:
| Model | Provider | Significance |
|---|---|---|
| BLOOM 176B | BigScience | First major open multilingual LLM (2022) |
| GLM-130B | Tsinghua | Open bilingual English/Chinese LLM (2023) |
| Falcon 180B | TII | Large open generative model (2023) |
| Mixtral 8x7B | Mistral AI | Pioneered MoE architecture for open models (2023) |
| GPT-NeoX-20B | EleutherAI | Early open autoregressive LLM |
| GPT-J-6B | EleutherAI | Early open causal language model |
π
| Name | Accuracy | Key Feature | Link |
|---|---|---|---|
| GPTZero | 99% claimed | 10M+ users, #1 on G2 (2025). Detects GPT-4/5, Gemini, Claude, Llama. Free tier available. | Website |
| Originality.ai | 98β100% (peer-reviewed) | Consistently rated most accurate. Combines AI detection + plagiarism + fact checking. From $14.95/month. | Website |
| Turnitin AI Detection | 98%+ on unmodified AI text | Dominant in academia. Launched AI bypasser/humanizer detection (Aug 2025). Institutional licensing. | Website |
| Copyleaks | 99%+ claimed | Enterprise tool detecting AI in 30+ languages. LMS integrations. | Website |
| Winston AI | 99.98% claimed | OCR for scanned documents, AI image/deepfake detection. 11 languages. | Website |
| Pangram Labs | 99.3% (COLING 2025) | Highest score in COLING 2025 Shared Task. 100% TPR on "humanized" text. 97.7% adversarial robustness. | Website |
| Name | Description | Link |
|---|---|---|
| Binoculars | Open-source research detector using cross-perplexity between two LLMs. | arXiv |
| DetectGPT / Fast-DetectGPT | Statistical method comparing log-probabilities of original text vs. perturbations. | arXiv |
| Openai Detector | AI classifier for indicating AI-written text (OpenAI Detector Python wrapper) | [GitHub] |
| Sapling AI Detector | Free browser-based detector (up to 2,000 chars). 97% accuracy in some studies. | Website |
| QuillBot AI Detector | Free, no sign-up required. | Website |
| Writer AI Content Detector | Free tool with color-coded results. | Website |
| ZeroGPT | Popular free detector evaluated in multiple academic studies. | Website |
| Name | Description | Link |
|---|---|---|
| SynthID (Google DeepMind) | Watermarking for AI text, images, and audio via statistical token sampling. Deployed in Google products. | Website |
| OpenAI Text Watermarking | Developed but still experimental as of 2025. Research shows fragility concerns. | Experimental |
Important caveat: No detector claims 100% accuracy. Mixed human/AI text remains hardest to detect (50β70% accuracy). Adversarial robustness varies widely. The AI detection market is projected to grow from ~$2.3B (2025) to $15B by 2035.
π
| Title | Author(s) | Publisher | Year |
|---|---|---|---|
| Prompt Engineering for LLMs | John Berryman & Albert Ziegler | O'Reilly | 2024 |
| Prompt Engineering for Generative AI | James Phoenix & Mike Taylor | O'Reilly | 2024 |
| Prompt Engineering for LLMs | Thomas R. Caldwell | Independent | 2025 |
| Title | Author(s) | Publisher | Year |
|---|---|---|---|
| AI Engineering: Building Applications with Foundation Models | Chip Huyen | O'Reilly | 2025 |
| Build a Large Language Model (From Scratch) | Sebastian Raschka | Manning | 2024 |
| Building LLMs for Production | Louis-FranΓ§ois Bouchard & Louie Peters | O'Reilly | 2024 |
| LLM Engineer's Handbook | Paul Iusztin & Maxime Labonne | Packt | 2024 |
| The Hundred-Page Language Models Book | Andriy Burkov | Self-Published | 2025 |
| Title | Author(s) | Publisher | Year |
|---|---|---|---|
| Building Applications with AI Agents | Michael Albada | O'Reilly | 2025 |
| AI Agents and Applications | Roberto Infante | Manning | 2025 |
| AI Agents in Action | Micheal Lanham | Manning | 2025 |
| Title | Author(s) | Publisher | Year |
|---|---|---|---|
| LLMs in Production | Christopher Brousseau & Matthew Sharp | Manning | 2025 |
| Building Reliable AI Systems | Rush Shahani | Manning | 2025 |
| The Developer's Playbook for LLM Security | Steve Wilson | O'Reilly | 2024 |
π©βπ«
- ChatGPT Prompt Engineering for Developers β Co-taught by Andrew Ng and OpenAI's Isa Fulford. The foundational starting point. (DeepLearning.AI)
- Building Systems with the ChatGPT API β Multi-step LLM system design for production. (DeepLearning.AI)
- AI Agents in LangGraph β Agentic dataflows with tool use and research agents. (DeepLearning.AI)
- Building Agentic RAG with LlamaIndex β RAG research agent construction. (DeepLearning.AI)
- Functions, Tools and Agents with LangChain β Function calling and agent building. (DeepLearning.AI)
- Prompt Engineering for Vision Models β Visual prompting techniques. (DeepLearning.AI)
- Prompt Engineering Specialization (Vanderbilt) β 3-course series by Dr. Jules White covering foundational to advanced PE. (Coursera)
- Generative AI with LLMs (DeepLearning.AI + AWS) β LLM lifecycle, transformers, RLHF, deployment. (Coursera)
- Stanford CS336: Language Modeling from Scratch β Build an LLM end-to-end. (Stanford, 2024β2026)
- MIT 6.S191: Introduction to Deep Learning β Annual course including LLMs and generative AI. (MIT, 2024β2026)
- The Complete Prompt Engineering for AI Bootcamp β Covers GPT-5, DSPy, LangGraph, agent architectures. 58K+ ratings. (Udemy, updated Feb 2026)
- Google Prompting Essentials β 5-step prompt design, meta-prompting, Gemini. Under 6 hours.
- Microsoft Azure AI Fundamentals: Generative AI β Free learning path covering LLMs, prompts, agents, Azure OpenAI.
- Hugging Face LLM Course β Community-driven course covering transformers, fine-tuning, building reasoning models.
- Hugging Face AI Agents Course β Agent theory to practice. 100K+ registered students.
- ChatGPT for Everyone
- Introduction to Prompt Engineering
- Advanced Prompt Engineering
- Introduction to Prompt Hacking
- Advanced Prompt Hacking
- Introduction to Generative AI Agents for Business Professionals
- AI Safety
π
- OpenAI Prompt Engineering Guide β Comprehensive, covering GPT-4.1/5 prompting, reasoning models, structured outputs, agentic workflows. Continuously updated.
- OpenAI GPT-4.1 Prompting Guide [2025] β Structured agent-like prompt design: goal persistence, tool integration, long-context processing.
- Anthropic Prompt Engineering Overview β Iterative prompt design, XML tags, chain-of-thought, role assignment. Includes prompt generator.
- Anthropic Claude 4 Best Practices [2025β2026] β Parallel tool execution, thinking capabilities, image processing.
- Anthropic: Effective Context Engineering for AI Agents [2025] β The evolution from prompt engineering to context engineering: agent state, memory, tools, MCP.
- Google Gemini Prompting Strategies β Multimodal prompting for Gemini via Vertex AI and AI Studio.
- Microsoft Prompt Engineering in Azure AI Studio β Tool calling, function design, few-shot prompting, prompt chaining.
- Prompt Engineering Guide (DAIR.AI / promptingguide.ai) β Most comprehensive open-source guide. 18+ techniques, model-specific guides, research papers. 3M+ learners. Now includes context engineering.
- Learn Prompting (learnprompting.org) β Structured free platform. Beginner to advanced PE, AI security, HackAPrompt competition.
- IBM 2026 Guide to Prompt Engineering [2026] β Curated tools, tutorials, real-world examples with Python code.
- Anthropic Interactive Tutorial β 9-chapter Jupyter notebook course with hands-on exercises.
- Lilian Weng's Prompt Engineering Guide [2023] β Highly respected technical blog from OpenAI researcher.
- Google Prompt Engineering Guide (68-page PDF) [2025] β Internal-style best-practice guide for Gemini with concrete patterns.
- DigitalOcean: Prompt Engineering Best Practices [2025] β Updated guide summarizing techniques: few-shot, chain-of-thought, role prompting, etc.
- Aakash Gupta: Prompt Engineering in 2025 [2025] β Practical guide with wisdom from shipping AI at OpenAI, Shopify, and Google.
- Best practices for prompt engineering with OpenAI API β OpenAI's introductory best practices.
- OpenAI Cookbook β Official recipes for function calling, RAG, evaluation, and complex workflows.
- Microsoft Prompt Engineering Docs β Microsoft's open prompt engineering resources.
- DALLE Prompt Book β Visual guide for text-to-image prompting.
- Best 100+ Stable Diffusion Prompts β Community-curated image generation prompts.
- Vibe Engineering (Manning) β Book by Tomasz Lelek & Artur Skowronski on building software through natural language prompts.
π₯
- Andrej Karpathy: "Deep Dive into LLMs" & "How I Use LLMs" [2024β2025] β Two of the most influential AI videos of 2024β2025. Comprehensive technical deep dive followed by practical usage patterns.
- Karpathy: "Software in the Era of AI" (YC AI Startup School) [2025] β Coined "vibe coding" (Feb 2025) and championed "context engineering" (Jun 2025).
- Karpathy: Neural Networks: Zero to Hero [2023β2024] β Full lecture series building from backpropagation to GPT.
- 3Blue1Brown: Neural Networks Series [Updated 2024] β Iconic animated visual explanations of transformers and attention mechanisms. 7M+ subscribers.
- AI Explained [2024β2025] β Long-form analysis breaking down papers, model capabilities, and PE developments.
- Sam Witteveen [2024β2025] β Practical tutorials on prompt engineering, LangChain, RAG, and agents.
- Matthew Berman [2024β2025] β Popular channel covering model releases and practical LLM usage. 600K+ subscribers.
- DeepLearning.AI YouTube [2024β2026] β Structured lessons, course previews, and Andrew Ng talks on agents and AI careers.
- Lex Fridman Podcast (AI Episodes) [2024β2025] β Long-form interviews with Altman, Hinton, Amodei on LLMs, prompting, and safety.
- ICSE 2025: AIware Prompt Engineering Tutorial [2025] β Conference tutorial covering prompt patterns, fragility, anti-patterns, and optimization DSLs.
- CMU Advanced NLP 2022: Prompting β Foundational academic lecture on prompting methods.
- ChatGPT: 5 Prompt Engineering Secrets For Beginners β Accessible intro for beginners.
π€
- Learn Prompting β 40,000+ members. Largest PE Discord with courses, hackathons, HackAPrompt competitions.
- PromptsLab Discord - Community
- Midjourney β 1M+ members. Primary hub for text-to-image prompt sharing.
- OpenAI Discord β Official community with channels for GPTs, Sora, DALL-E, and API help.
- Anthropic Discord β Official Claude community for AI development collaboration.
- Hugging Face Discord β Model discussions, library support, community events.
- FlowGPT β 33K+ members. 100K+ prompts across ChatGPT, DALL-E, Stable Diffusion, Claude.
- r/PromptEngineering β Dedicated subreddit for prompt crafting techniques and discussions.
- r/ChatGPT β 10M+ members. Primary hub for ChatGPT users and prompt sharing.
- r/LocalLLaMA β Highly technical community for running open-source LLMs locally.
- r/ClaudeAI β Anthropic's Claude community: prompt sharing, API tips, model comparisons.
- r/MachineLearning β Academic-oriented ML research discussions.
- r/OpenAI β OpenAI product and API discussions.
- r/StableDiffusion β 450K+ members for AI art prompting and workflows.
- r/ChatGPTPromptGenius β 35K+ members sharing and refining prompts.
- OpenAI Developer Community β Official forum for API help, best practices, project sharing.
- Hugging Face Community β Hub for open-source AI collaboration.
- DeepLearning.AI Community β Forum for learners discussing courses and AI careers.
- LessWrong β In-depth technical posts on AI capabilities and safety.
- AI Alignment Forum β Specialized alignment research discussions.
- CivitAI β Generative AI creators platform for sharing models, LoRAs, and prompts.
- LangChain β Open-source LLM app framework. 100K+ stars.
- Promptslab β Generative Models | Prompt-Engineering | LLMs
- Hugging Face β Central hub: Transformers, Diffusers, Datasets, TRL.
- DSPy (Stanford NLP) β Growing community for systematic prompt optimization.
- OpenAI β Open-source models, benchmarks, and tools.
We welcome contributions to this list! Before contributing, please take a moment to review our contribution guidelines. These guidelines will help ensure that your contributions align with our objectives and meet our standards for quality and relevance.
What we're looking for:
- New high-quality papers, tools, or resources with a brief description of why they matter
- Updates to existing entries (broken links, outdated information)
- Corrections to star counts, pricing, or model details
- Translations and accessibility improvements
Quality standards:
- All tools should be actively maintained (updated within the last 6 months)
- Papers should be from peer-reviewed venues or have significant community adoption
- Datasets should be publicly accessible
- Please include a one-line description explaining why the resource is valuable
Thank you for your interest in contributing to this project!
Maintained by PromptsLab Β· Star this repo if you find it useful!
