A curated, implementation-first list of agent harness engineering resources, with GitHub projects as the primary focus.
- Total entries: 207
- GitHub entries: 182 (87.9%)
- GitHub in project categories (excluding readings): 178/178 (100.0%)
- Categories: 9
- Last verified: 2026-05-25
- Language: English | 中文
- Scaling Managed Agents: Decoupling the brain from the hands: Anthropic's meta-harness architecture for decoupling session logs, harness loops, and sandboxes in long-horizon agents.
- Claude Code auto mode: Anthropic's write-up on classifier-backed approval delegation for safer high-autonomy coding-agent runs.
- Harness engineering (OpenAI): Field report on building reliable agent-first software via harness constraints and verification.
- Building Effective AI Agents: Anthropic's practical guidance on when to use workflows vs. autonomous agents and how to structure them.
- Writing effective tools for AI agents: Best practices for tool interface design so agents call tools safely and reliably.
- Effective harnesses for long-running agents: Practical guide to maintaining state, resumability, and reliability over long agent runs.
- Harness design for long-running application development: Follow-up article on improving long-running app generation through harness structure.
- Improving Deep Agents with harness engineering: Evidence that harness improvements alone can move benchmark performance.
- Evaluating Deep Agents: Our Learnings: LangChain's practical lessons on evaluating stateful and long-horizon agents.
- Your Agent Needs a Harness, Not a Framework: Argument for reliability-first infrastructure around agents instead of framework-only thinking.
- Category Overview
- Featured Harness Blogs
- Catalog
- Harness Architecture & Orchestration
- Context & Working-State Engineering
- Execution Substrates & Sandboxing
- Protocols, Tool Interfaces & Agent Contracts
- Evaluation Harnesses & Benchmarks
- Observability & Reliability Operations
- Guardrails, Security & Governance
- Reference Harness Implementations
- Essential Readings & Ecosystem Maps
- Maintenance Notes
- Citation
| Category | Entries |
|---|---|
| Harness Architecture & Orchestration | 27 |
| Context & Working-State Engineering | 10 |
| Execution Substrates & Sandboxing | 23 |
| Protocols, Tool Interfaces & Agent Contracts | 14 |
| Evaluation Harnesses & Benchmarks | 24 |
| Observability & Reliability Operations | 14 |
| Guardrails, Security & Governance | 16 |
| Reference Harness Implementations | 50 |
| Essential Readings & Ecosystem Maps | 29 |
Notes:
Starsare rendered as badges from snapshot values.- Repository update dates are tracked in
data/projects.yamland validation reports. - Entries are sorted by stars (descending) within each category.
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| ECC | GitHub | cross-harness, hooks, skills | Cross-harness operator system combining skills, hooks, memory optimization, security scanning, and validation workflows for agentic work. | |
| DeerFlow | GitHub | long-horizon, memory, subagents | Long-horizon super-agent harness integrating memory, tools, subagents, and sandboxes. | |
| AutoGen | GitHub | multi-agent, orchestration, framework | Programming framework for agentic AI with multi-agent interaction and orchestration. | |
| Ruflo | GitHub | multi-agent, swarm, mcp | Multi-agent orchestration platform for Claude Code with swarms, persistent memory, federation, plugins, and MCP hooks. | |
| Agno | GitHub | scale, runtime, management | Agent software runtime focused on running and managing agentic systems at scale. | |
| LangGraph | GitHub | graph, workflow, runtime | Graph-based runtime for resilient stateful agents and deterministic workflow control. | |
| Semantic Kernel | GitHub | enterprise, orchestration, plugins | Enterprise-grade agentic application framework with orchestration and plugin patterns. | |
| OpenAI Agents SDK (Python) | GitHub | sdk, handoff, workflows | Lightweight framework for multi-agent workflows, handoffs, and production patterns. | |
| Symphony | GitHub | orchestration, control-plane, workflows | Ticket-driven orchestration layer that turns project work into isolated autonomous implementation runs. | |
| deepagents | GitHub | runtime, orchestration, long-running | Open-source harness for long-running, tool-using agents with planning and subagent patterns. | |
| Archon | GitHub | workflow-engine, worktrees, validation | Workflow engine for AI coding agents with YAML-defined phases, isolated worktrees, and validation gates. | |
| Google ADK (Python) | GitHub | toolkit, deployment, evaluation | Code-first toolkit to build, evaluate, and deploy advanced AI agents. | |
| PydanticAI | GitHub | python, typing, schema | Type-safe Python framework for agents with strong schema contracts and tooling. | |
| Microsoft Agent Framework | GitHub | multi-agent, workflows, observability | Multi-language framework for building, orchestrating, and deploying AI agents with graph workflows and observability. | |
| Hive | GitHub | harness, orchestration, runtime | Outcome-driven agent runtime harness with explicit control loops and orchestration blocks. | |
| VoltAgent | GitHub | typescript, platform, runtime | TypeScript agent engineering platform built around open runtime abstractions. | |
| mcp-agent | GitHub | mcp, runtime, workflow | Practical agent framework centered on MCP tool ecosystems and workflow composition. | |
| Yao | GitHub | single-binary, runtime, autonomous | Single-binary runtime for defining and running autonomous agents. | |
| Open Multi-Agent | GitHub | multi-agent, dag, tracing | TypeScript-native multi-agent orchestrator that turns goals into task DAGs with parallel execution, MCP integration, and live tracing. | |
| Cloudflare Agents | GitHub | platform, deployment, runtime | Platform runtime for building and deploying agents with production infrastructure primitives. | |
| Flue | GitHub | typescript, headless, sandbox | TypeScript harness framework for building headless agents with sessions, tools, skills, and pluggable sandboxes. | |
| Docker Agent | GitHub | docker, runtime, container | Agent builder and runtime stack emphasizing container-native execution. | |
| NeMo Agent Toolkit | GitHub | multi-agent, optimization, toolkit | Open toolkit for connecting and optimizing teams of AI agents. | |
| Scion | GitHub | multi-agent, containers, orchestration | Experimental multi-agent orchestration testbed that runs isolated agent harnesses in containers, worktrees, and remote runtimes. | |
| deepagentsjs | GitHub | typescript, langgraph, subagents | TypeScript agent harness with built-in planning, filesystem tools, subagents, and LangGraph-native runtime hooks. | |
| Pydantic AI Harness | GitHub | capabilities, hooks, pydantic | Official Pydantic AI capability library for composing tools, lifecycle hooks, instructions, and model settings into reusable agent harnesses. | |
| hankweave | GitHub | long-horizon, runtime, checkpoints | Headless-first long-horizon runtime that orchestrates existing agent harnesses with sentinels, loops, checkpoints, and event journals. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| claude-mem | GitHub | memory, context, session | Plugin-style memory layer that captures session history and reinjects relevant context into future coding runs. | |
| planning-with-files | GitHub | planning, skills, persistence | Skill package for persistent file-based planning in coding-agent workflows. | |
| agentmemory | GitHub | memory, mcp, hooks | Persistent memory server for coding agents using hooks, MCP/REST integration, hybrid search, and shared session recall. | |
| Agent Skills for Context Engineering | GitHub | skills, context, production | Large skill library oriented around context engineering and production agents. | |
| Context Mode | GitHub | context, mcp, session | MCP context optimization server that sandboxes tool output, indexes session events, and restores continuity across agent compactions. | |
| Context-Engineering Handbook | GitHub | context-engineering, handbook, practices | First-principles handbook focused on practical context engineering for agent systems. | |
| Trellis | GitHub | specs, memory, workflow | Multi-platform coding-agent workflow framework with task context, project memory, and spec injection. | |
| CCPM | GitHub | planning, github-issues, parallel-execution | Spec-driven project-manager skill that turns PRDs and GitHub issues into persistent context and parallel agent execution. | |
| Awesome Context Engineering | GitHub | awesome-list, context, survey | Survey-style list for context engineering resources and frameworks. | |
| context-space | GitHub | context, infrastructure, mcp | Infrastructure project focused on context engineering building blocks and MCP-centric integrations. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| Daytona | GitHub | sandbox, execution, infra | Secure and elastic sandbox infrastructure for running AI-generated code with file, Git, LSP, and execution APIs. | |
| CUA | GitHub | computer-use, sandbox, infra | Infrastructure stack for computer-use agents with sandbox, SDK, and benchmark support. | |
| Browser Harness | GitHub | browser, cdp, self-healing | Thin editable CDP harness that connects LLMs directly to real browsers and lets agents extend helpers in flight. | |
| E2B | GitHub | cloud-sandbox, execution, enterprise | Secure cloud environments with real tools for production-grade agent execution. | |
| OpenSandbox | GitHub | sandbox, security, runtime | Secure and extensible sandbox runtime built for agent workloads. | |
| Microsandbox | GitHub | sandbox, vm, mcp | Rootless local VM sandbox runtime with SDKs, detached long-running sessions, agent skills, and MCP server integration. | |
| OpenShell | GitHub | sandbox, policy, runtime | Safe private runtime for autonomous agents with sandbox lifecycle control and declarative filesystem, network, process, and inference policies. | |
| CubeSandbox | GitHub | microvm, sandbox, e2b-compatible | MicroVM-based sandbox service for AI agents with sub-60ms startup, E2B-compatible APIs, and hardware-level isolation. | |
| Sandcastle | GitHub | sandbox, typescript, branch-strategy | TypeScript library for orchestrating coding agents inside isolated sandboxes with configurable branch strategies. | |
| agent-infra sandbox | GitHub | all-in-one, browser, shell | All-in-one sandbox combining browser, shell, files, MCP, and IDE server. | |
| Judge0 | GitHub | code-execution, sandbox, backend | Scalable sandboxed code execution system usable as an agent execution backend. | |
| Agent Sandbox | GitHub | kubernetes, sandbox, stateful | Kubernetes-native sandbox control plane for isolated, stateful agent runtimes with stable identity, persistence, and warm-pool support. | |
| stakpak/agent | GitHub | always-on, autonomous, ops | Always-on open agent that runs on your machines with autonomous operational loops. | |
| OSS-Fuzz Gen | GitHub | fuzzing, security, execution | LLM-powered fuzzing workflows integrated with controlled execution contexts. | |
| E2B Desktop Sandbox | GitHub | desktop, sandbox, computer-use | Secure virtual desktop sandbox for computer-use agents with SDK control and screen streaming. | |
| AgentBay SDK | GitHub | cloud-sandbox, computer-use, sdk | Cloud sandbox SDK for agents spanning browser, desktop, mobile, and code execution environments. | |
| Tensorlake | GitHub | microvm, sandbox, orchestration | Serverless runtime for agent sandboxes with MicroVM isolation, snapshots, suspend-resume, and background orchestration. | |
| Arrakis | GitHub | sandbox, microvm, snapshots | Self-hosted sandbox substrate with MicroVM isolation, snapshot restore, and REST, SDK, and MCP interfaces for agent code execution and computer use. | |
| AgentScope Runtime | GitHub | runtime, sandbox, deployment | Production runtime for agent apps with secure tool sandboxes, deployment APIs, observability, and state services. | |
| SWE-ReX | GitHub | sandbox, execution, coding-agent | Sandboxed execution infrastructure for AI coding agents at local and cloud scale. | |
| sandboxed.sh | GitHub | self-hosted, isolation, orchestrator | Self-hosted orchestrator running coding agents inside isolated Linux workspaces. | |
| Capsule | GitHub | wasm, sandbox, task-runtime | Durable runtime that coordinates agent tasks inside isolated WebAssembly sandboxes with retries and lifecycle tracking. | |
| terminal-bench-env | GitHub | terminal, benchmark-env, sandbox | Environment layer for terminal-agent benchmark execution. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| GitHub Spec Kit | GitHub | spec-driven, workflows, tooling | Toolkit for spec-driven development to guide deterministic agent execution. | |
| MCP Servers | GitHub | mcp, servers, implementations | Official collection of MCP server implementations across tools and domains. | |
| Chrome DevTools MCP | GitHub | mcp, browser, devtools | Official MCP server that gives coding agents Chrome DevTools access for reliable browser automation, debugging, and performance analysis. | |
| AGENTS.md | GitHub | spec, agent-file, instructions | Open format for repository-local instructions that coding agents can follow. | |
| Model Context Protocol | GitHub | mcp, protocol, interoperability | Core specification and docs for MCP-based tool and context interoperability. | |
| directories (rules and MCP indexes) | GitHub | directories, mcp, rules | Curated directories of agent rules and MCP servers for tool discovery. | |
| Atmosphere | GitHub | jvm, multi-protocol, governance | JVM runtime for streaming governable AI agents across MCP, A2A, AG-UI, and browser-facing transports. | |
| LangChain MCP Adapters | GitHub | mcp, adapters, integration | Adapters connecting LangChain components with MCP servers. | |
| Microsoft MCP Servers | GitHub | mcp, enterprise, servers | Microsoft's official MCP server catalog for enterprise data and tools. | |
| GitAgentProtocol | GitHub | standard, git-native, workflows | Git-native, framework-agnostic standard for defining agents, skills, workflows, tools, and runtime memory in repositories. | |
| ACPX | GitHub | acp, client, sessions | Headless CLI client for stateful Agent Client Protocol sessions. | |
| Microsoft Learn MCP | GitHub | mcp, docs, grounding | MCP server and CLI for grounding agents with Microsoft documentation sources. | |
| IBM MCP | GitHub | mcp, clients, tooling | IBM collection of MCP servers, clients, and developer tooling. | |
| AGENT.md | GitHub | standard, agent-file, interoperability | Standardized machine-readable file format for agentic coding tools. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| Promptfoo | GitHub | eval, red-team, ci | Config-driven prompt/agent/RAG testing, comparison, and red-team evaluation tool. | |
| DeepEval | GitHub | evaluation, framework, testing | LLM evaluation framework supporting agent and workflow quality testing. | |
| RAGAS | GitHub | rag, metrics, evaluation | Open evaluation toolkit for LLM and RAG quality metrics. | |
| lm-evaluation-harness | GitHub | benchmark, harness, llm | Popular benchmark harness for consistent LLM evaluation across tasks. | |
| SWE-bench | GitHub | benchmark, swe, evaluation | Standard benchmark for evaluating issue-fixing software engineering agents. | |
| verifiers | GitHub | verifier, rl, evaluation | Library for RL environments and verifier-based evaluation loops. | |
| AgentBench | GitHub | benchmark, cross-domain, agent | Cross-environment benchmark for evaluating LLM agents as tool-using systems. | |
| LangWatch | GitHub | simulation, evaluation, testing | End-to-end platform for agent simulations, evaluation loops, and production testing. | |
| EvalScope | GitHub | benchmark, framework, llm | Customizable framework for large-model benchmarking and performance evaluation. | |
| Terminal-Bench | GitHub | terminal, benchmark, long-horizon | Terminal-native benchmark suite for long-horizon, verification-heavy agent tasks. | |
| Harbor | GitHub | evaluation, harness, rl-env | Framework for running agent evaluations and constructing RL-style environments. | |
| tau2-bench | GitHub | tool-use, interaction, benchmark | Tool-agent-user interaction benchmark emphasizing multi-step execution quality. | |
| Meta-Harness | GitHub | harness-search, optimization, terminal-bench | Framework for automated search over task-specific model harnesses, with reference experiments for memory systems and terminal-agent scaffolds. | |
| NeMo Gym | GitHub | rl-env, training, evaluation | Toolkit for building RL environments suitable for LLM/agent training and eval. | |
| TheAgentCompany | GitHub | benchmark, workplace, multi-step | Agent benchmark with simulated software-company tasks for evaluating multi-step workplace autonomy. | |
| Claw-Eval | GitHub | benchmark, trajectory, safety | Evaluation harness and benchmark for autonomous agents with human-verified tasks, trajectory auditing, and completion, safety, and robustness rubrics. | |
| Inspect Evals | GitHub | inspect, eval-suite, reproducibility | Evaluation suite collection for Inspect AI workflows. | |
| auto-harness | GitHub | optimization, regression, evals | Benchmark-gated optimization loop that mines failures, edits agent code, and guards against regressions overnight. | |
| WildClawBench | GitHub | benchmark, harness-comparison, multimodal | In-the-wild benchmark that compares multiple agent harnesses on end-to-end multimodal, coding, safety, and productivity tasks inside a live OpenClaw environment. | |
| SWE-Bench Pro | GitHub | swe, benchmark, long-horizon | Long-horizon software-engineering benchmark with reproducible Docker-based evaluation for issue-driven coding agents. | |
| Agent Evaluation | GitHub | evaluation, testing, ci | AWS framework for testing virtual agents with evaluator-driven multi-turn conversations, hooks, and CI-friendly workflows. | |
| WorkArena | GitHub | browser, benchmark, enterprise | Browser benchmark for practical enterprise-like knowledge work tasks. | |
| OpenHands Benchmarks | GitHub | openhands, eval, harness | Evaluation harness and benchmark definitions for OpenHands systems. | |
| WebArena-Verified | GitHub | web-agent, benchmark, deterministic | Verified web-agent benchmark with deterministic evaluators. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| Langfuse | GitHub | llmops, tracing, metrics | Open-source LLM engineering platform for traces, metrics, prompts, and evals. | |
| MLflow | GitHub | platform, monitoring, evaluation | Broad AI engineering platform with monitoring and evaluation support for agents. | |
| Opik | GitHub | monitoring, eval, tracing | End-to-end debug/eval/monitoring stack for LLM apps and agent workflows. | |
| RagaAI Catalyst | GitHub | agentops, analytics, monitoring | Agent observability and monitoring framework with timeline and graph analytics. | |
| TensorZero | GitHub | llmops, gateway, optimization | Open LLMOps stack unifying gateway, observability, evaluation, and optimization. | |
| Arize Phoenix | GitHub | observability, tracing, evaluation | Open platform for AI observability, tracing, and evaluation analytics. | |
| OpenLLMetry | GitHub | opentelemetry, instrumentation, tracing | OpenTelemetry-based instrumentation for GenAI and LLM applications. | |
| Helicone | GitHub | monitoring, traffic, production | Lightweight platform for monitoring and evaluating LLM traffic in production. | |
| AgentOps SDK | GitHub | agentops, monitoring, cost | Monitoring and benchmarking SDK for agent workflows with cost and trace tracking. | |
| Latitude | GitHub | platform, eval, observability | Open-source agent engineering platform with eval and observability capabilities. | |
| Laminar | GitHub | observability, tracing, evals | Agent-focused observability stack with tracing, evaluation runs, monitoring, and dashboards. | |
| claude-code-reverse | GitHub | trace, visualization, debugging | Tooling to visualize and inspect Claude Code LLM interaction traces. | |
| Future AGI | GitHub | observability, evaluation, guardrails | Self-hostable platform that closes the loop across agent tracing, evaluation, simulation, guardrails, and gateway operations. | |
| OpenInference | GitHub | spec, instrumentation, observability | Open instrumentation specification and tooling for AI observability. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| LiteLLM | GitHub | gateway, proxy, guardrails | Unified LLM gateway/proxy with cost tracking, load balancing, and guardrails. | |
| Kong | GitHub | gateway, policy, infra | API and AI gateway infrastructure useful for policy enforcement in agent systems. | |
| Parlant | GitHub | interaction-control, guardrails, customer-agents | Interaction-control harness for customer-facing agents focused on consistent, predictable, and governed LLM behavior. | |
| Portkey Gateway | GitHub | gateway, guardrails, routing | AI gateway with routing and guardrails for multi-model production traffic. | |
| CAI (Cybersecurity AI) | GitHub | security, governance, framework | Security-focused agent framework for offensive/defensive AI workflows. | |
| OpenAI Realtime Agents | GitHub | realtime, orchestration, control | Advanced agentic realtime patterns with structured control and interaction loops. | |
| Plano | GitHub | proxy, safety, data-plane | AI-native proxy and data plane with orchestration, safety, and observability. | |
| OpenAI CS Agents Demo | GitHub | demo, handoffs, governance | Customer-service multi-agent demo highlighting handoffs and guardrail-like control points. | |
| ContextForge | GitHub | gateway, governance, observability | Registry and proxy layer that unifies MCP, A2A, and REST/gRPC endpoints with centralized governance and observability. | |
| Archestra | GitHub | enterprise, guardrails, governance | Enterprise AI platform with guardrails, MCP registry, and orchestration services. | |
| Tracecat | GitHub | security, automation, policy | AI automation platform for security teams with policy and workflow controls. | |
| AgentGateway | GitHub | gateway, mcp, proxy | Agentic proxy gateway for AI agents and MCP server ecosystems. | |
| Agent Governance Toolkit | GitHub | governance, policy, sandboxing | Runtime governance toolkit that deterministically enforces agent policy, identity, sandboxing, and audit controls before actions execute. | |
| Haft | GitHub | governance, decisions, mcp | Decision-governance harness that records falsifiable contracts, evidence, and commissions before agents execute. | |
| ClawManager | GitHub | control-plane, governance, runtimes | Kubernetes-native control plane for governing agent runtimes, AI gateway access, and reusable skills across multiple agent backends. | |
| DashClaw | GitHub | approvals, policy, audit | Governance layer that intercepts risky agent actions, enforces policy, routes approvals, and records audit-ready decision trails. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| Hermes Agent | GitHub | memory, skills, subagents | Self-improving agent runtime with memory, skill creation, subagents, scheduled automations, and pluggable terminal backends. | |
| OpenCode | GitHub | terminal, coding-agent, subagents | Open-source coding agent with built-in plan/build roles, subagents, LSP support, and a client-server runtime. | |
| Claude Code | GitHub | terminal, coding-agent, git-workflows | Official terminal coding agent that understands codebases and executes editing, debugging, and Git workflows through natural language. | |
| Gemini CLI | GitHub | terminal, coding-agent, mcp | Open-source terminal agent with built-in tools, MCP support, checkpointing, and sandboxing controls. | |
| Codex CLI | GitHub | terminal, coding-agent, local-execution | Terminal-native coding agent that runs locally and exposes practical agent workflows for software tasks. | |
| OpenHands | GitHub | coding-agent, software-engineering, repo | Open-source AI software engineer focused on repo-level coding task execution. | |
| Paperclip | GitHub | managed-agents, control-plane, governance | Managed-agent control plane with org charts, ticketing, budgets, heartbeats, and audit trails for coordinating agent teams. | |
| learn-claude-code | GitHub | tutorial, harness, claude-code | Hands-on harness tutorial for building Claude Code-like systems from scratch. | |
| Cline | GitHub | coding-agent, mcp, checkpoints | Open-source coding agent spanning IDE, terminal, SDK, and kanban surfaces with shared approvals, MCP, checkpoints, and agent teams. | |
| OpenManus | GitHub | general-agent, autonomy, workflows | Open foundation for broad autonomous agent workflows with coding-heavy use cases. | |
| pi | GitHub | coding-agent, runtime, monorepo | Agent harness monorepo combining a coding-agent CLI, shared runtime, and multi-provider LLM stack. | |
| aider | GitHub | terminal, repo-map, testing | Terminal coding assistant with repo mapping, git-aware edits, and built-in lint/test feedback loops. | |
| CLI-Anything | GitHub | cli, tool-use, automation | CLI agent system that unifies command-line tool usage in agent loops. | |
| Claude Code Plugins: Orchestration and Automation | GitHub | claude-code, plugins, orchestration | Production-ready Claude Code plugin marketplace bundling agents, skills, tools, and multi-agent workflow orchestrators. | |
| oh-my-claudecode | GitHub | claude-code, multi-agent, worktrees | Team-first orchestration layer for Claude Code with staged multi-agent execution, worktree-aware setup, and persistent session artifacts. | |
| Multica | GitHub | managed-agents, coding-agent, runtimes | Managed-agents platform that assigns issues to coding agents, routes execution through runtimes, and compounds reusable skills. | |
| NanoClaw | GitHub | containers, claude-sdk, scheduling | Container-isolated Claude agent harness with channel routing, scheduled jobs, per-group memory, and small-codebase customization. | |
| Vibe Kanban | GitHub | coding-agent, workspaces, review | Kanban control plane for planning, running, reviewing, and merging work from coding agents in isolated workspaces. | |
| Qwen Code | GitHub | terminal, coding-agent, cli | Terminal-native open-source coding agent tuned for practical dev loops. | |
| SuperClaude Framework | GitHub | config, personas, workflow | Configuration framework adding commands, personas, and method templates to coding agents. | |
| Devika | GitHub | assistant, planning, coding | Open-source coding assistant system for planning and implementing development tasks. | |
| SWE-agent | GitHub | swe, issue-fixing, tooling | Research-grade coding agent that resolves GitHub issues with explicit tooling loops. | |
| cmux | GitHub | macos, workspace, browser | Native macOS terminal and browser workspace for AI coding agents with notifications, split panes, and scriptable control. | |
| Compound Engineering | GitHub | plugins, worktrees, review | Cross-agent engineering plugin that codifies brainstorming, planning, worktree execution, review, and knowledge compounding loops. | |
| Aperant | GitHub | coding-agent, parallel, memory | Autonomous multi-agent coding framework with parallel execution, isolated workspaces, QA loops, and persistent memory. | |
| Eigent | GitHub | desktop, cowork, productivity | Open-source desktop cowork agent for autonomous task execution and productivity. | |
| OpenHarness | GitHub | tool-use, memory, multi-agent | Open agent harness implementation covering tool use, skills, memory, permissions, and multi-agent coordination. | |
| IronClaw | GitHub | security, wasm, routines | Security-first personal agent harness with WASM sandboxing, routines, tool plugins, and persistent memory. | |
| Superset | GitHub | worktrees, desktop, parallel | Worktree-based desktop orchestrator for running and reviewing parallel CLI coding agents from one workspace. | |
| GitHub Copilot CLI | GitHub | terminal, coding-agent, mcp | Official terminal coding agent built on GitHub's Copilot harness with MCP extensibility, approval controls, and GitHub-native context. | |
| Open SWE | GitHub | async, coding-agent, swe | Asynchronous open-source coding agent focused on software issue workflows. | |
| Agent Orchestrator | GitHub | worktrees, parallel, dashboard | Worktree-based orchestration layer for parallel coding agents with autonomous CI and review feedback handling. | |
| oh-my-pi | GitHub | terminal, lsp, subagents | Terminal AI coding agent with edit safety, LSP integration, and subagent support. | |
| Paseo | GitHub | coding-agent, daemon, multi-device | Multi-device coding-agent daemon and client stack for orchestrating local agents, parallel runs, and cross-provider workflows. | |
| holaOS | GitHub | long-horizon, desktop, durable-state | Desktop-first long-horizon agent environment with runtime, memory, tools, apps, and durable state. | |
| 1Code | GitHub | coding-agent, orchestration, worktrees | Desktop-first coding-agent orchestrator with worktree isolation, background sandboxes, MCP tooling, and automation triggers. | |
| OSAURUS | GitHub | macos, local-first, memory | Native macOS harness for autonomous coding agents with persistent memory. | |
| HiClaw | GitHub | multi-agent, human-in-the-loop, shared-state | Collaborative multi-agent OS with manager-worker coordination, shared state, and human-in-the-loop oversight via Matrix rooms. | |
| mini-swe-agent | GitHub | minimal, swe, coding-agent | Minimal coding agent implementation with strong benchmark competitiveness. | |
| TinyAGI | GitHub | team-orchestration, autonomous, workflows | Team-style agent orchestrator for one-person-company style autonomous workflows. | |
| Harness | GitHub | claude-code, meta-factory, agent-teams | Claude Code meta-factory that generates domain-specific agent teams, skills, orchestration patterns, and validation steps from a project description. | |
| Devon | GitHub | pair-programming, coding-agent, autonomous | Open-source pair programmer agent with autonomous coding execution patterns. | |
| Open Claude Cowork | GitHub | desktop, ui, orchestration | Desktop coding cowork assistant that turns agent orchestration into GUI workflows. | |
| Maestro | GitHub | desktop, worktrees, orchestration | Desktop command center for parallel coding agents with worktree isolation, queued tasks, auto-run playbooks, and reusable sessions. | |
| Amazon Bedrock AgentCore Samples | GitHub | aws, runtime, operations | Official sample suite for deploying and operating agents with runtime, gateway, memory, observability, evaluation, and policy layers. | |
| Google Agents CLI | GitHub | google-cloud, lifecycle, skills | Google Cloud CLI and skill bundle that gives coding agents scaffold, evaluation, deployment, publishing, and observability workflows. | |
| AI-DLC Workflows | GitHub | workflow-rules, quality-gates, steering | Official AWS workflow ruleset that steers coding agents through adaptive phases, quality gates, and IDE-specific context files. | |
| Open Cowork | GitHub | desktop, sandbox, mcp | Desktop agent app with VM-backed sandboxing, MCP connectors, GUI control, and built-in skill workflows. | |
| mini-coding-agent | GitHub | coding-agent, minimal, approvals | Minimal coding agent harness illustrating approvals, memory, bounded delegation, and durable transcripts. | |
| codex-autorunner | GitHub | meta-harness, tickets, long-running | Meta-harness that treats tickets as the control plane for long-running coding agents, with queue execution, hub UI, and chat notifications. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| awesome-claude-code | GitHub | awesome-list, claude-code, skills | Community collection of Claude Code skills, hooks, and orchestrator tooling. | |
| awesome-agentic-patterns | GitHub | awesome-list, patterns, design | Catalog of reusable agentic design patterns and implementation motifs. | |
| awesome-mcp-servers | GitHub | awesome-list, mcp, tools | Curated MCP server index for tool interoperability in agent systems. | |
| awesome-harness-engineering | GitHub | awesome-list, curation, harness | Curated list focused on harness engineering articles, benchmarks, and implementations. | |
| 12 Factor Agents | Reference | - | reading, operations, principles | Operations-oriented principles for building maintainable production agents. |
| Agent Frameworks, Runtimes, and Harnesses, oh my! | Reference | - | reading, langchain, architecture | Clear decomposition of framework vs runtime vs harness responsibilities. |
| An open-source spec for Codex orchestration: Symphony. | Reference | - | reading, openai, orchestration | OpenAI's orchestration write-up on turning issue trackers into always-on control planes for coding agents. |
| Building agents with the Claude Agent SDK | Reference | - | reading, claude, sdk | Claude blog on production-oriented SDK usage for sessions, tools, and orchestration. |
| Building Effective AI Agents | Reference | - | reading, anthropic, agents | Anthropic's practical guidance on when to use workflows vs. autonomous agents and how to structure them. |
| Claude Code auto mode | Reference | - | reading, anthropic, permissions | Anthropic's write-up on classifier-backed approval delegation for safer high-autonomy coding-agent runs. |
| Code execution with MCP | Reference | - | reading, anthropic, mcp | Anthropic's design notes on controlled code execution via MCP boundaries. |
| Demystifying Evals for AI Agents | Reference | - | reading, evals, anthropic | Methodology for designing robust agent evals in non-deterministic trajectories. |
| Effective context engineering for AI agents | Reference | - | reading, context, anthropic | Guidance on context-window budgeting and working-state management for agents. |
| Effective harnesses for long-running agents | Reference | - | reading, long-running, anthropic | Practical guide to maintaining state, resumability, and reliability over long agent runs. |
| Evaluating Deep Agents: Our Learnings | Reference | - | reading, langchain, evaluation | LangChain's practical lessons on evaluating stateful and long-horizon agents. |
| Harness design for long-running application development | Reference | - | reading, app-dev, anthropic | Follow-up article on improving long-running app generation through harness structure. |
| Harness Engineering (Martin Fowler) | Reference | - | reading, architecture, fowler | Architectural perspective on harness engineering and entropy control. |
| Harness engineering (OpenAI) | Reference | - | reading, methodology, openai | Field report on building reliable agent-first software via harness constraints and verification. |
| How we built our multi-agent research system | Reference | - | reading, anthropic, multi-agent | Anthropic architecture write-up on role separation and coordination in multi-agent systems. |
| Improving Deep Agents with harness engineering | Reference | - | reading, langchain, harness | Evidence that harness improvements alone can move benchmark performance. |
| Making Claude Code more secure and autonomous with sandboxing | Reference | - | reading, anthropic, sandboxing | How Anthropic uses sandbox boundaries to raise agent autonomy without giving up security controls. |
| Quantifying infrastructure noise in agentic coding evals | Reference | - | reading, anthropic, evaluation | Analysis of how infrastructure choices impact coding-agent benchmark outcomes. |
| Scaling Managed Agents: Decoupling the brain from the hands | Reference | - | reading, anthropic, architecture | Anthropic's meta-harness architecture for decoupling session logs, harness loops, and sandboxes in long-horizon agents. |
| Skill Issue: Harness Engineering for Coding Agents | Reference | - | reading, humanlayer, coding-agents | Practical breakdown of why coding-agent quality depends heavily on harness setup. |
| Testing Agent Skills Systematically with Evals | Reference | - | reading, openai, evals | OpenAI Developers guide for turning agent traces into repeatable skill evaluations. |
| The Anatomy of an Agent Harness | Reference | - | reading, architecture, langchain | Conceptual decomposition of agent harness components and their responsibilities. |
| Unrolling the Codex agent loop | Reference | - | reading, openai, architecture | OpenAI engineering deep dive into the Codex harness loop, prompt growth, tool-call replay, and stateless execution tradeoffs. |
| Writing effective tools for AI agents | Reference | - | reading, anthropic, tools | Best practices for tool interface design so agents call tools safely and reliably. |
| Your Agent Needs a Harness, Not a Framework | Reference | - | reading, inngest, reliability | Argument for reliability-first infrastructure around agents instead of framework-only thinking. |
- Source of truth:
data/projects.yaml - Regenerate README files:
python3 scripts/render_readme.py - Verify catalog and links:
python3 scripts/verify_catalog.py
@misc{li2026agentharness,
title={Agent Harness Engineering: A Survey},
author={Li, Junjie and Xiao, Xi and Zhang, Yunbei and Liu, Chen and Zhao, Lin and Liao, Xiaoying and Ji, Yingrui and Wang, Janet and Gu, Jianyang and Ge, Yingqiang and Xu, Weijie and Fang, Xi and Xu, Xiang and Zhao, Tianchen and Kim, Youngeun and Wang, Tianyang and Hamm, Jihun and Krishnaswamy, Smita and Huan, Jun and Reddy, Chandan},
url={https://openreview.net/pdf?id=eONq7FdiHa},
year={2026}
}