Merged
Conversation
This commit introduces a comprehensive redesign of the PChatBot architecture: ## New Architecture Components ### 1. LLM Backend Abstraction (src/core/llm/) - Abstract LLMProvider base class with unified interface - Support for multiple backends: Snowflake Cortex, AWS Bedrock, Anthropic Direct - Factory pattern with auto-detection from environment variables - Configurable model parameters (temperature, max_tokens, etc.) ### 2. Service Layer (src/core/services/) - GenerationService: UI-agnostic code generation with two-stage machine generation - CompilationService: P compiler integration with structured error parsing - FixerService: Intelligent error fixing with retry logic and human-in-the-loop ### 3. Workflow Engine (src/core/workflow/) - Step-based workflow execution with retry and skip logic - Event-driven architecture for observability - Support for pause/resume with human intervention - Pre-built workflows: full_generation, compile_and_fix, full_verification ### 4. MCP Server Integration (src/ui/mcp/) - Full MCP server for Cursor IDE integration - Tools: generate_*, compile, check, fix_*, syntax_help, workflows - Preview-then-save workflow for code review before saving - New fix_buggy_program tool for automatic PChecker error analysis and fixing ### 5. Validation Pipeline (src/core/validation/) - Input validators for design documents - P code validators for syntax and semantic checks - Composable validation pipeline ### 6. Compilation Utilities (src/core/compilation/) - PCompilerErrorParser: Structured parsing of P compiler output - PErrorFixer: Specialized fixes for common P errors - CheckerErrorParser: Parse PChecker traces for debugging - CheckerFixer: Auto-fix common runtime errors (null_target, unhandled_event) ## Configuration - env.template: Template for environment configuration - cursor-mcp-settings.example.json: Example MCP settings for Cursor - mcp-config.json: MCP server configuration with env var references - .gitignore: Exclude secrets and local configuration ## Documentation - Updated README with new architecture overview - DESIGN_DOCUMENT.md with detailed component specifications - CLAUDE.md for AI assistant context
This commit enhances the PChatBot architecture by introducing refined error handling mechanisms and validation processes across various services. Key updates include: - Enhanced error handling in GenerationService and CompilationService. - Improved input validation in the validation pipeline. - Updates to documentation reflecting these changes. These improvements aim to increase the robustness and reliability of the PChatBot system.
… parsing, RAG examples - Snowflake provider: removed auto-discovery, default to claude-sonnet-4-5 - Factory: simplified Snowflake config, removed CORTEX_USE_LATEST_MODEL - Generation: robust _extract_p_code with 4 fallback strategies (XML, markdown, bare blocks) - Compilation: added [Error:][file.p:line:col] parser pattern (Pattern 3) - Fixer: resolve relative error paths, parse combined stdout+stderr - Compilation: clear error message when PChecker finds no test declarations - Workflow factory: improved machine name extraction from numbered component lists - Added curated RAG examples (73 files): tutorials + generated Paxos/2PC - Added MCP E2E protocol test harness (scripts/mcp_e2e_protocols.py) - Added Snowflake model selection tests
…th 5 protocols - load_dotenv(override=True) in all entrypoints (MCP, Streamlit, CLI) - Removed os.chdir() from server.py and cli/app.py - Fixed TypeConsistencyChecker: replaced broken check_project with working cross-file check - Generation retry: all generation methods retry up to 2x on extraction failure - Fixer: cross-file context passed to LLM, spiral detection (3x same error = stop) - Error messages: strip ~~ [PTool] ~~ trailer before passing to fixer - Post-processor: bare halt; -> raise halt; fix - Post-processor: forbidden keyword detection in spec monitors (this/send/new) - Post-processor: Timer(this) wiring warning in PTst scenario machines - Instructions: machines must handle/ignore/defer all receivable events - Instructions: specs cannot use this/send/new/announce/receive - Instructions: test files must not re-declare PSrc machines or invent events - Instructions: scenario machines must be simple launchers, correct wiring order - Regression suite: 5 protocols (Paxos, 2PC, MessageBroker, DistributedLock, Hotel) - Regression suite: scoring 0-100, baseline comparison, protocol-level retry - Suppress Streamlit warnings in non-Streamlit mode - Baseline: 450/500 (90%) — 3 protocols at 100/100
…validation - Parallel machine generation: generate_machines_parallel() with ThreadPoolExecutor Uses shared context snapshot; wired into generate_complete_project - Incremental regeneration: when fixer detects spiral (3x same error), rewrites the failing file from scratch with all project files as context - Spec validation: validate_spec_events() checks that all events in spec 'observes' clauses exist in the types file; runs in generate_complete_project - Design doc validation: validate_design_doc() checks required sections, component extraction, scenario count; blocks generation on invalid docs - Added concurrent.futures import for parallel execution
- generate_complete_project: uses types_result.filename, spec_result.filename, test_result.filename instead of hardcoded Enums_Types_Events.p/Safety.p/TestDriver.p - Workflow steps: types_events_filename propagated through context dict - GenerateMachineStep/GenerateSpecStep: read types filename from context - SaveGeneratedFilesStep._collect_all_context: uses dynamic types filename - GenerateTypesEventsStep.can_skip: checks for any .p in PSrc, not hardcoded name - Post-processor in generate_complete_project: detects PTst files by path - All hardcoded names converted to fallback defaults (or 'X.p') only used when LLM doesn't provide a filename
…nd generation pipeline
- Renamed Src/PChatBot/ directory to Src/PeasyAI/ - Renamed report/PChatBot_Analysis_Report.md to PeasyAI_Analysis_Report.md - Renamed evaluate_chatbot.py to evaluate_peasyai.py - Updated all PChatBot/pchatbot/p-chatbot/P-ChatBot references in source code - Updated MCP server name, UI titles, documentation, and config files - Replaced generic 'chatbot' references with 'AI assistant' where appropriate - Updated CLAUDE.md with new paths and naming
…nk test - Fix sys.path in tests/rag/test_rag_index.py to point to src/rag/ - Add pytest.importorskip for faiss graceful skip when not installed - Update create_rag_index.py import from langchain.text_splitter to langchain_text_splitters (current package layout) - Fix test_create_chunks to use text longer than chunk_size (500 chars) so the splitter actually produces multiple chunks
- Update path triggers from Src/PChatBot/** to Src/PeasyAI/** - Update working-directory from Src/PChatBot to Src/PeasyAI - Update workflow name to PeasyAI Contract Tests The directory was renamed but the workflow still referenced the old name, so CI would never trigger and would fail if run manually.
- Add ~/.peasyai/settings.json config system (like ~/.claude/settings.json) replacing .env for LLM provider credentials and settings - Add pyproject.toml: pip-installable package with peasyai-mcp CLI entry point - Add src/core/config.py: config loader with env var fallback - Add src/ui/mcp/entry.py: CLI with init, config, and serve sub-commands - Add .peasyai-schema.json for IDE autocomplete in settings file - Update MCP server to load config from ~/.peasyai/settings.json - Update env validation tool to check for settings.json - Update ResourceLoader to find bundled resources in installed wheels - Update README with install steps for Cursor and Claude Code - Deprecate env.template in favor of peasyai-mcp init
Delete: - Config (Amazon Brazil build config, not relevant to open-source) - cursor-mcp-settings.example.json, mcp-config.json (superseded by peasyai-mcp CLI) - env.template (deprecated in favor of ~/.peasyai/settings.json) - analyze-checker-errors.py, analyze-errors.py, compute_metrics.py, visualize-pk-vs-tokens.py (one-off analysis scripts at project root) - resources/pipeline.json (old pipeline config, unused) - src/resources/p_syntax_rules.txt (unused duplicate) - src/rag/ (old faiss-based RAG scripts, replaced by src/core/rag/) Move: - evaluate_peasyai.py → scripts/evaluate_peasyai.py Update .gitignore: - Add generated_projects/, .peasyai_workflows.json - Remove stale cursor-mcp-settings.json entry
Major improvements to the code generation pipeline based on regression analysis across 9 protocols (Paxos, 2PC, MessageBroker, DistributedLock, HotelManagement, ClientServer, FailureDetector, EspressoMachine, Raft). Generation pipeline: - Add p_code_utils.py with brace-balanced extraction replacing fragile regex for function bodies, state bodies, and LLM response parsing - LLM-based machine name extraction from design docs (replaces brittle regex that misidentified "Front Desk" as "Front", "Lock Server" as "Lock", etc.) - Auto-inject Common_Timer template when design doc mentions timers, heartbeats, or timeouts — prevents LLM from reinventing Timer machine - Pass expected_name to code extraction for reliable filenames - Inject spec monitor names into test generation context - Enrich RAG facet derivation from already-generated context files RAG retrieval: - Add timer/heartbeat/appliance/leader-election pattern facets - Cross-derive related facets (failure-detector → timer-timeout, raft → broadcast + leader-election + timer-timeout) - Fix timer hint to reference CreateTimer/StartTimer/CancelTimer API Ensemble scoring: - Add compile-check verification for top-3 candidates (+50 bonus) - Penalize illegal var init, redeclared events/types, forbidden keywords in specs - Cap defer/ignore scoring to prevent verbose-but-wrong candidates PChecker fix loop: - Feed trace analysis back into targeted regeneration as checker_feedback - Add LLM-based fallback fixer when specialized fixer fails - Add assertion_failure support to PCheckerErrorFixer - Rank traces by error category priority across all failing tests - Increase re-check schedules from 20 to 50 - Improve spiral detection with error message normalization - Add build_checker_feedback() as core utility Post-processor: - Broaden single-field tuple fix to all contexts (new, raise, type annotations, function params) not just send statements - Broaden _ensure_test_declarations to detect scenario machines by send/name patterns, with fallback to all machines Prompts and design docs: - Strengthen spec generation with assert requirement, empty function ban, and working MutualExclusion example - Strengthen test generation checklist (assert SpecName required) - Add tuple construction guidance to machine generation prompt - Add 4 new regression protocols with design docs - Improve 2PC design doc with explicit constructor signatures, timer module guidance, and state handling instructions Co-authored-by: Cursor <cursoragent@cursor.com>
- Add 6 portfolio RAG examples (BenOr, ChangRoberts, DAO, German, Streamlet, TokenRing) for broader protocol coverage - Add p_documentation_reference.txt for comprehensive P language docs - Enhance p_corpus.py with faceted indexing, multi-lane retrieval, and richer metadata extraction for RAG examples - Update MCP tools (compilation, query, rag_tools, workflows) with improved error handling and API consistency - Update workflow p_steps with enhanced step implementations - Update regression baseline with latest results - Update embeddings with caching improvements - Remove stale CMakeLists.txt and PeasyAI_Analysis_Report.md Co-authored-by: Cursor <cursoragent@cursor.com>
…handling The generate_machine tool was failing silently with an opaque ' ' error because P code examples in instruction templates contained unescaped curly braces that collided with Python's str.format(). Added _safe_format() helpers that fall back to manual substitution when str.format() raises KeyError/ValueError. Also improved all exception handlers across the service layer (generation, compilation, fixer) to include the exception type name and full traceback in logs, preventing opaque error messages. Includes design doc migration from .txt to .md, post-processor enhancements, validation updates, and a verified BasicPaxos tutorial generated end-to-end via the MCP tools. Co-authored-by: Cursor <cursoragent@cursor.com>
All MCP tool names now follow the peasy-ai-<action> convention (e.g., peasy-ai-compile, peasy-ai-gen-machine, peasy-ai-fix-compile-error) to avoid collisions with other MCP servers and improve discoverability. Updated tool definitions, descriptions, cross-references, docs, and tests. Co-authored-by: Cursor <cursoragent@cursor.com>
Add a GitHub Actions workflow that builds and attaches PeasyAI wheels to GitHub Releases on peasyai-v* tags, so developers can install via pip without cloning the full P repo. Also adds the LICENSE file referenced by pyproject.toml and updates the README install instructions. Co-authored-by: Cursor <cursoragent@cursor.com>
Point users to the P installation guide for .NET SDK 8.0, Java, and the P compiler. Add Quick Start section, troubleshooting table, upgrade instructions, and reorganize development sections. Co-authored-by: Cursor <cursoragent@cursor.com>
- Replace ad-hoc code review with unified 4-stage validation pipeline: Stage 1: PCodePostProcessor (deterministic regex auto-fixes) Stage 2: Structured validator chain (13 validators with auto-fix) Stage 3: LLM wiring review for test files (circular deps, init order) Stage 4: LLM spec correctness review (observes completeness, assertions) - Add NamedTupleConstructionValidator for cross-file type checking - Add extraneous-semicolon auto-fix in SyntaxValidator - Fix ValidationPipeline context merging for preview-time cross-file validation - Update Timer template with bounded delays for liveness property support - Update FailureDetector design doc: liveness property, hot states, Timer as component - Add review_test_wiring and review_spec_correctness LLM review prompts - Improve generation prompts: named tuples, circular dependency patterns, helper fns - Fix streamlit lazy-import issue for non-UI contexts - Remove stale tools (simulator, trace_explorer) and dead code - Increase PChecker per-test timeout from 20s to 300s - Update CLAUDE.md with pipeline architecture docs and fix stale .env references - Add regression test support for wiring_fixes and spec_fixes Made-with: Cursor
Made-with: Cursor
…eview The old approach copied text verbatim from design docs into comments, which was redundant. The new approach uses an LLM review step (Stage 5) that reads both the generated code and design doc, then writes insightful comments explaining invariants, protocol steps, and design rationale. - Remove ~500 lines of regex documentation methods from PCodePostProcessor - Add GenerationService.review_code_documentation() as new LLM review step - Add review_code_documentation.txt instruction prompt - Wire into all 4 MCP generation tools and all workflow steps - Update tests and CLAUDE.md pipeline documentation Made-with: Cursor
…rage to 217 tests - Remove ~400 lines of dead/deprecated code across regex_utils, compile_utils, file_utils, string_utils, log_utils, generate_p_code, and pipelines - Delete dead modules: interactive.py, DesignDocInputMode.py, pipelining/examples.py - Fix broken InteractiveMode reference in app.py (would crash at runtime) - Remove debug print statements and unused imports from pipelines.py, pchecker_mode.py - Fix var-declaration-order detection bug in VarDeclarationOrderValidator and PCodePostProcessor: the detection loop broke early, missing cases where vars appeared after statements or were interleaved with statements - Add test_config.py: 21 tests for settings loading, env var overrides, defaults, malformed config, provider aliases, all 3 provider types - Add test_validators_extended.py: 49 tests covering all 7 previously-untested validators (InlineInit, VarDeclOrder, CollectionOps, SpecObservesConsistency, DuplicateDecl, SpecForbiddenKeyword, PayloadField, TestFile) plus 10 post-processor fix categories - Add test_error_parsers.py: 29 tests for compiler error parsing, categorization, CompilationResult, checker trace parsing, MachineState, EventInfo - Update CI workflow to run unit tests alongside contract tests in parallel jobs - Update release workflow to gate on full test suite (unit + contract) - Fix stale test assertion for Snowflake default model (claude-opus-4-6) Made-with: Cursor
The LLM-based code documentation review (Stage 5) was silently failing for every generation call due to three root causes: 1. Snowflake provider capped max_tokens at 8192, causing response truncation before the closing </documented_code> tag 2. GenerateTypesEventsParams was missing the context_files field, causing an AttributeError swallowed by except Exception 3. No visibility into failures — callers had no way to distinguish "doc review succeeded with no comments" from "doc review crashed" Changes: - Raise Snowflake provider token cap from 8192 to 20000 - Raise doc review request from 8192 to 16384 tokens - Add retry on truncation (doubles max_tokens, up to 20k cap) - Return structured Dict with status/code/reason instead of Optional[str] - Surface doc_review_status field in all MCP generation responses - Extract shared _run_doc_review() helper to replace 4 duplicated try/except blocks - Add context_files field to GenerateTypesEventsParams - Add 2 new truncation test cases (53 total tests pass) Made-with: Cursor
Infrastructure: - Upgrade CI workflow (checkout@v4, setup-python@v5, pip caching) - Pin MkDocs dependencies in Docs/requirements.txt - Fix deprecated emoji imports, duplicate extensions in mkdocs.yml - Add dark mode toggle, update copyright to 2026, switch to HTTPS Content refresh: - Replace all net6.0 references with net8.0 across 7 files - Update stale forward-looking statements in whatisP.md - Remove "coming soon" text, fix placeholder alt text - Add CACM 2025 "Systems Correctness Practices at AWS" to case studies and publications - Update toolchain image and rewrite What is P page to match new 4-stage pipeline New content: - Add PeasyAI documentation page (getstarted/peasyai.md) - Write full Paxos tutorial from Tutorial/5_Paxos source code - Expand videos page with re:Invent 2023 talk Beautification (MkDocs Material features): - Redesign home page with grid cards, admonitions, and navigation cards - Add Material icons, horizontal rules, and grid cards across all pages - Improve all tutorial pages with section icons and page titles - Beautify PObserve (9 pages) and PVerifier (9 pages) documentation - Restructure navigation: PObserve and PVerifier as distinct subsections Polish: - Rename foriegntypesfunctions.md to foreigntypesfunctions.md - Update IDE recommendations to Peasy IDE - Add deprecation banners to old/ directory files - Clean up Paxos references, restore to navigation Made-with: Cursor
These are internal architecture documents that should not be part of the public documentation update. Made-with: Cursor
These internal PeasyAI resource files are not needed in the documentation update PR. Made-with: Cursor
These are local development configuration files that should not be in the documentation update PR. Made-with: Cursor
Add prominent "What's New" section featuring PeasyAI and PObserve, toolchain image at the top, and framework overview grid cards. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.