ConceptAgent infrastructure

Agent Harness

An agent harness is the non-model layer that turns a language model into a working agent by providing state, tools, planning, execution environments, permissions, feedback loops, memory, observability, and human control points.

Why it matters

Modern agents are not only models. Their reliability depends on the surrounding harness: the runtime that executes tools, the environment that exposes files and browsers, the controls that constrain behavior, and the feedback loops that help the agent self-correct.

Source-backed summary

The term is used across several adjacent communities. OpenAI uses harness engineering for systems, scaffolding, feedback loops, tools, PR review, browser/devtools access, and repo-embedded skills around Codex. Anthropic describes long-running agent harnesses for work across context windows. Martin Fowler frames a user-side coding-agent harness as guides and sensors. LangChain and Rebyte use harness for the capabilities layer around models, tools, files, sandboxes, subagents, and human approval.

Key points

Start broad: discover official docs, engineering blogs, product docs, academic papers, Reddit discussions, and builder posts before ranking evidence.
Explain the difference between the model, the agent product, the framework, and the harness around them.
Track runtime, long-running, user-side, adapter, and evaluation harness meanings separately.
Use source confidence labels to rank adoption, not to exclude community discovery.

The broad scope of an agent harness

Agent harness is not limited to OpenAI Codex or to an app server. In the broadest definition, it is everything around the model that makes agentic work possible: prompts, state, tool execution, permissions, memory, planning, environment access, feedback, evaluation, and handoff artifacts.

Runtime harness: agent loop, tools, filesystem, browser, shell, sandbox, approvals, events, and persistence.
Long-running harness: initializer agents, progress files, work decomposition, context-window handoff, and artifacts for the next session.
User harness: AGENTS.md, repo docs, test matrices, linters, review agents, browser checks, logs, and domain-specific skills.
Adapter harness: a unified interface over Codex, Claude Code, Cursor, OpenCode, Gemini CLI, or other agent executors.
Evaluation harness: reproducible runs, traces, budget controls, model snapshots, scoring, and regression checks.

How major sources use the term

OpenAI emphasizes engineering environments and feedback loops around Codex. Anthropic emphasizes long-running work across many context windows. Martin Fowler emphasizes feedforward guides and feedback sensors that let coding agents self-correct before humans review the result. LangChain lists harness capabilities such as planning, virtual filesystem, permissions, subagents, context management, code execution, human-in-the-loop, skills, and memory. Rebyte and harness.lol use the term for swappable executors and unified interfaces across coding agents.

Why this is not just an agent framework

A framework is usually a developer library or SDK for composing agent logic. A harness can include a framework, but it also includes the runtime, environment, controls, feedback signals, and user-side operating system around the agent. This is why two agents using the same model can perform differently when their harnesses differ.

Related entities

Codex Pets

A visible companion feature that depends on active agent status.

OpenClaw

A personal agent system that exposes gateway, tools, skills, channels, apps, and sandbox choices.

Hermes Agent

A Nous Research personal agent with memory, skills, gateway access, scheduling, tools, and subagents.

Related concepts

AI Coding Companion

The user-facing companion layer around coding agents.

Comparisons

Agent Harness vs Agent Framework

Separate product runtime layers from developer libraries.

Sources

Source confidence

official-docs

Harness engineering: leveraging Codex in an agent-first world

OpenAI

official-docs

Unlocking the Codex harness: how we built the App Server

OpenAI

official-docs

Effective harnesses for long-running agents

Anthropic Engineering

LangChain Docs

Rebyte Docs

harness.lol

Harness engineering for coding agent users

Martin Fowler

kol-community

Has anyone here started building a harness for coding agents?

Reddit / r/codex

Agent Harness FAQ

Page-level questions for Agent Harness.

Is an agent harness only the Codex App Server?+

No. The Codex App Server is one concrete product harness, but agent harness is a broader concept. It can mean the runtime around an agent, a long-running workflow pattern, user-side controls such as docs and tests, a unified adapter over multiple coding agents, or an evaluation harness for reproducible agent runs.

Is an agent harness the same as an agent framework?+

No. An agent framework is usually a library or SDK for composing agent workflows, while an agent harness is the broader system that makes the agent reliable in a real environment. A harness can include framework code, but it also includes tools, files, sandboxes, memory, permissions, feedback sensors, evaluation, and human approval loops.

Why does a coding agent need a harness?+

A coding agent needs a harness because raw model output is not enough to ship reliable software changes. The harness gives the agent context, tools, executable environments, tests, logs, review loops, permissions, and memory across sessions. Better harnesses can improve reliability even when the underlying model stays the same.

Can Reddit or community posts help define agent harness?+

Yes, Reddit and community posts are useful for discovering emerging harness use cases, pain points, and comparison questions. They should not outrank official docs for factual claims, but they are valid evidence for search intent, terminology, adoption friction, and what builders are trying in practice.