Agentic AI Infrastructure: local coding-agent benchmarks, visible eval loops, RAG harnesses, and safety boundaries for tools that can be inspected from a fresh source checkout.
I build small infrastructure around agentic systems: benchmark generation, agent runtimes, prompt regression testing, retrieval pipelines, hardened parsers, and local-first developer tools. The pinned repositories are the main story; the utility libraries are the disciplined substrate underneath them.
| Project | Why It Matters | First Demo |
|---|---|---|
| PatchGym | Mine real Git history into local SWE-bench-style coding-agent tasks with hidden tests and auditable oracle patches. | bash scripts/demo.sh |
| agent-framework | A tiny visible agent runtime: plan, act, observe, remember, finish, with readable traces. | python examples/no_api_key_agent.py |
| rag-pipeline | RAG from first principles: chunking, retrieval, citations, evaluation, and reports without hosted services. | python examples/local_rag_demo.py |
| prompt-eval | Prompt regression tests that can run in CI without secrets or hidden service calls. | python examples/no_api_key_regression.py |
| safejson | JSON parsing treated as a security boundary with duplicate-key rejection, typed errors, and resource limits. | python examples/security_boundary.py |
| decimal-ts | Exact fixed-point decimal arithmetic for TypeScript, backed by BigInt instead of floating point. |
npm install && npm run demo |
PatchGym is the flagship and the best first run:
git clone https://github.com/nripankadas07/patchgym
cd patchgym
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
bash scripts/demo.shI use AI heavily for scaffolding, test generation, edge-case brainstorming, and first-pass documentation. The design direction, architecture choices, project boundaries, quality bar, and final review are mine.
My workflow is simple: AI-assisted output has to survive source-checkout setup, local tests, CI, release packaging, security notes, limitations notes, and manual review before it becomes part of the public portfolio. That is why the profile emphasizes reproducible demos and auditable artifacts instead of vague claims.
These release-track projects support the agentic AI infrastructure theme. They are not competing with the flagships for attention; they are the pieces that make agent workflows easier to test, parse, package, and reason about.
| Area | Projects |
|---|---|
| AI and evaluation | PatchGym, agent-framework, rag-pipeline, prompt-eval, token-counter |
| Security and operations | safejson, dep-audit |
| Parsers and data formats | tomlmini, bencode, csvinfer, urlnorm |
| TypeScript primitives | decimal-ts, argv-zod, argv-strict |
| Terminal and text tooling | wordwrap |
The release-track repositories have GitHub releases with build artifacts: Python wheels/source distributions for Python projects and npm tarballs for TypeScript projects.
Ten compact projects for owning more of the internet workflows people usually rent from platforms. They sit beside the AI work as local-first infrastructure practice.
| Project | Inspired By | Job |
|---|---|---|
| lanbeam | LocalSend | Local network file drops with one-time share tokens. |
| rssdeck | FreeTube | No-account RSS and YouTube feed dashboard generator. |
| passhouse | Vaultwarden | Small encrypted local secrets vault with explicit safety notes. |
| syncplan | Syncthing | Directory snapshot and sync-plan engine before copying bytes. |
| readmine | Ladder | Ethical offline reader for public pages you can access. |
| photoflow | Immich | Local photo inventory, duplicate detection, and album planning. |
| dnswarden | AdGuard Home | Compile hosts-style blocklists into clean DNS sinkhole rules. |
| medialoom | Jellyfin | Static local media catalog for movies, shows, music, and audiobooks. |
| chatmux | LibreChat | Provider-neutral chat transcript hub with no-key local mocks. |
| uptimelog | Uptime Kuma | Tiny uptime monitor with JSON logs and static status pages. |
Each project has a README, CLI, tests, CI, quality docs, contribution templates,
and a v0.1.0 GitHub release with wheel/source artifacts.
- PatchGym: Local Coding-Agent Benchmarks From Real Git History
- Visible Agent Evaluation: Testing The Loop, Not The Demo
- Safe Local-First AI Tooling: Small Systems With Hard Boundaries
The six flagship repositories also include architecture notes, release notes, limitations, and runnable examples.
Last audited on May 25, 2026 across the public GitHub profile.
| Signal | Current State |
|---|---|
| Public repositories | 111 total: 110 active, 1 archived scratchpad |
| Active repo hygiene | 110/110 have README, license metadata, license file, CI, issue templates, and PR templates |
| CI state | 110/110 active repos have a latest completed GitHub Actions run passing |
| Flagships | 6 pinned repositories, all unarchived, release-tracked, and green |
| Release track | 25 repositories with v0.1.0 GitHub releases and build artifacts |
| Internet Ownership Kit | 10/10 projects shipped with CLI, tests, CI, docs, releases, and contribution templates |
| Open issue load | 0 open issues across active repositories at audit time |
The detailed audit note is in docs/PORTFOLIO_AUDIT.md.
Every active repository is expected to have:
- a specific README that says why the project exists and where it stops;
- a license, security policy, contribution guide, code of conduct, changelog, roadmap, and quality notes;
- tests or an honest docs-only status;
- source-checkout installation instructions until registry publication is real;
- CI where a build or test surface exists;
- issue templates and a pull request template;
- no fake package badges, fake benchmark numbers, or fake social proof.
For parsers and evaluators, correctness means adversarial inputs, conformance checks where possible, explicit limits, typed failure modes, and repeatable local tests. For TypeScript packages, correctness means strict typechecking, tests, build output, and package metadata that matches what is actually shipped.
This GitHub profile is intentionally code-first. Career credentials, product leadership context, and publication context live on LinkedIn.
Open an issue on the relevant repository for bugs, design questions, or focused collaboration. For profile-level context, use nripankadas07/nripankadas07.
- PatchGym: the flagship, because it turns real Git history into coding-agent tasks with hidden tests and oracle patches.
- Visible Agent Evaluation: the testing thesis behind the profile, focused on evaluating the loop instead of the demo.
- agent-framework: the smallest readable runtime for plans, tool calls, memory, traces, and no-key agent examples.