Its Harness GitHub
OPEN SOURCE · FULL HARNESS IMPLEMENTED · APACHE 2.0

It's all about the harness.
It's AI's harness.

Not just a workflow. A complete reasoning and control architecture for AI agents.

A workflow tells your AI what to do. A harness makes sure it actually does it — with a world model that tracks beliefs and contradictions, a control layer that knows when to slow down or stop, nine-layer verification before every action, and recovery strategies when things go wrong. Its Harness has delivered the full architecture: draw it on a canvas, run it on any framework, trace every decision.

LangGraph CrewAI Mastra MS Agent Framework
Star on GitHub★ 0 View roadmap → Free & open source · Apache 2.0
Simple Agent Loop
Input / Caller
LLM Call
Tool Call↺ loop
Output
prompt in → answer out
no world model · no control state · no verification
vs
Full Harness — Implemented
Caller Stateconstraints · clarification · propagation
World Modelbeliefs · contradictions · generation_id
Reasoningevidence · hypotheses (4 sources) · VOI
Control5-tier resolver · deadlock detectkey
Planningtask graph (6-state) · parallel concurrency
ExecutionVOI · review gate
Verification9 layers
Recovery6 strategies
Memorycompression · journal
Learningexperience store · warm start (optional)
Output & Reviewer Passcontract · 3-lens review
22 nodes · 11 layers · world model + 5-tier control · 9-layer verification
WHY A HARNESS, NOT JUST A WORKFLOW

A harness does what a workflow can't.

A workflow routes prompts from node to node. A harness governs what the AI believes, what it's allowed to do, how it catches its own mistakes, and what it learns for next time. Its Harness delivers the full architecture — canvas and framework adapters as the foundation, and the complete reasoning and control layer on top.

Reasoning, not just prompting

The full harness maintains a world model — typed beliefs, contradictions, hypotheses from four generation sources, and a value-of-information gate before every action. Not just "ask the LLM."

Control that holds

A five-tier control state resolver (NORMAL → CAUTIOUS → BLOCKED) governs every action. Diagnostic health vectors drive it. Deadlock detection stops it escalating forever. Your AI knows when to slow down.

Verification with teeth

Nine verification layers, a pre-execution review gate across five dimensions, an adversarial reviewer pass, and contract validation before every return. Trust, but verify — every time.

Recovery and learning

Six named recovery strategies, typed failure detection, local vs global replanning, and an optional experience store that reuses successful decompositions — not just success-rate priors — across runs.

Canvas + 4 framework adapters + Langfuse observability + full harness: world model, control state resolver, 9-layer verification, recovery, experience store, reviewer pass — 22 harness nodes · 379 tests passing.

FULL HARNESS ARCHITECTURE

Canvas foundation to full reasoning and control.

Its Harness is a complete reasoning and control architecture for AI agents. The canvas and framework adapters are the visible surface — below them sit world model management, diagnostic health vectors, a five-tier control state resolver, nine-layer verification, named recovery strategies, an experience store, and a three-lens reviewer pass. 379 acceptance tests across all four frameworks.

Canvas & execution layer
Canvas with 24 node types (14 execution + 10 harness)
4 framework adapters (LangGraph, CrewAI, Mastra, MAF)
Langfuse observability — harness traces, all 4 runtimes
HITL pause/resume · REST/MCP/A2A deploy
FlowSpec v0.2.0 — open, portable JSON format
Process concepts — pre-seeded task graph scaffolds
Reasoning & control layer
World model · typed beliefs · contradiction detection
5-tier control state resolver · deadlock detection
Pre-execution review gate · 9-layer verification
6 named recovery strategies · typed failure library
Experience store — cross-run structural reuse
Adversarial reviewer pass · output contract validation
Read the full implementation plan → 22 nodes · 11 layers · 379 tests · all harness invariants
Foundation State Architecture
Evidence & Reasoning
World Model & Contradiction
Diagnostics & Control State
Planning & Task Graph
Execution & Verification
Recovery & Memory
Caller State & Escalation
Experience Store
Reviewer Pass & Output Contract
Canvas Integration
E2E Integration & Testing
State & reasoning infrastructure

World model with typed beliefs and generation_id tracking, evidence store, hypothesis system with diversity enforcement, contradiction detection, diagnostic health vectors, five-tier control state resolver, and a six-state task graph.

Execution hardening & learning

VOI estimation, nine-layer verification, pre-execution review gate, reversibility strategy, six named recovery strategies, context compression, experience store for cross-run structural reuse, adversarial reviewer pass, and output contract validation.

Canvas & full integration

Ten harness canvas node types, diagnostic health dashboard, updated framework adapters with harness-aware tracing in Langfuse, caller state and escalation, process concepts, and end-to-end tests across all four frameworks.

GET STARTED

Up and running in two commands.

Its Harness runs entirely on your machine via Docker. No cloud account, no sign-up — just clone and run.

terminal
# 1. Generate secrets and configure your environment
./setup-env.sh

# 2. Start all services
docker compose up

What starts up

Canvas — localhost:3000

The visual workflow editor. Draw your flows here.

API — localhost:8000

The backend that compiles and runs your flows.

Langfuse — localhost:3001

Monitoring dashboard. Every run is traced automatically.

Nine services in total. setup-env.sh handles all the secrets and configuration automatically — you only need to provide your LLM API key (or skip it and use a free local model instead).

AI MODELS

Use OpenAI, Claude, or run completely free locally.

Its Harness routes all AI calls through LiteLLM — a proxy that works with any model provider. You pick the model in your workflow; the rest is handled automatically.

OpenAI

Add your OPENAI_API_KEY and use gpt-4o or gpt-4o-mini in any flow.

Anthropic

Add your ANTHROPIC_API_KEY and use claude-sonnet, claude-haiku, or claude-opus.

Ollama — free & local

Run mistral, qwen3, or qwen2.5-coder locally. No API key, no cost, no data leaving your machine.

Any custom model

Edit one config file to add any OpenAI-compatible model or endpoint — including self-hosted or fine-tuned models.

Want to try it without an API key? Install Ollama, pull a model (ollama pull mistral), and run ./setup-ollama.sh — it tests all four frameworks end-to-end with no paid account needed.

HOW IT WORKS

One harness definition. Runs on any engine.

Its Harness stores your flow in an open format called FlowSpec. The same spec compiles to LangGraph, CrewAI, Mastra, or MAF — no rewriting. As the harness phases land, FlowSpec gains new node types for world model management, control state, verification gates, and recovery — all backwards-compatible with existing flows.

support-triage.flow.json
{
  "workflow": "support-triage",

  "nodes": [
    {
      "type": "llm_call",
      "prompt": "Classify severity"
    },
    {
      "type": "condition",
      "route": "high_priority"
    },
    {
      "type": "human_review"
    }
  ],

  "telemetry": {
    "provider": "langfuse"
  }
}

What you can build

Multi-step AI flows

Routing, branching, conditional logic, and tool calling — all drawn visually.

Safety & reliability

Guardrails, retry logic, and human approval checkpoints built right in.

Team collaboration

Edit workflows together, share with your team, and push to production.

Full observability

Langfuse traces give you complete visibility into every run, cost, and failure.

node taxonomy · execution layer14 types
i/o
inputoutput
llm
llm_call
tools
tool_invoke
agents
agent_roleagent_debate
flow control
conditionparallel_forkparallel_join
human
hitl_breakpoint
memory
memory_readmemory_write
composition
subgraphtransform
Harness canvas layer — 10 harness node types
world_model hypothesis_set control_state task_graph verification_gate recovery_node evidence_store experience_store reviewer_pass gather_evidence
SUPPORTED FRAMEWORKS

Use the AI framework your team already knows.

Most workflow tools are tied to one framework. Its Harness separates the workflow design from the execution — so you can experiment, migrate, or compare frameworks without rebuilding your flows each time.

LangGraph

Graph-based orchestration for complex, stateful AI pipelines.

CrewAI

Multi-agent workflows where AI "crew members" collaborate on tasks.

Mastra

TypeScript-native AI workflows, built for modern JS teams.

Microsoft Agent Framework

Enterprise AI workflows powered by Semantic Kernel.

Feature LangGraph CrewAI Mastra MS Agent Framework
llm_call · structured output
Prompt templates, model overrides, validators.
fullfullfullfull
agent_role · agent_debate
Named personas, multi-agent loops. MAF maps natively to AgentGroupChat.
via synthesisrole nativevia synthesisAgentGroupChat native
condition · parallel_fork / join
Branching, fan-out, configurable join reducers.
fullfullfullfull
hitl_breakpoint
Pause, inspect, edit state, resume.
fullpartialfull_HitlPause exception
memory · semantic (vector)
Embeddings, top-k, similarity threshold.
via pluginfullfullfull
streaming · tokens / updates
Token streaming and graph-update events.
fullpartialfullvia adapter
checkpoint · Postgres / Redis
Durable execution, resume from any node.
fullvia pluginfullvia Dapr / Orleans
EXAMPLE FLOWS

Five ready-made flows to learn from or build on.

The repo ships with five real, working flows. Each one demonstrates a different set of features — open them in the canvas, run them, and modify them to fit your use case.

Flow Best framework What it shows
RAG Agent
Search a knowledge base, retrieve the most relevant chunks, and generate a grounded answer.
LangGraph Memory read, semantic search, caching
Content Moderation + Human Review
Classify content, auto-approve low-risk items, and pause for a human to review anything flagged as high risk.
Mastra Human-in-the-loop, structured output
Parallel Risk Assessment
Three specialist AI agents review a document simultaneously, then their findings are merged into one report.
CrewAI Parallel agents, fan-out/merge
Research Crew
A team of AI agents collaborate on a research task, with human approval required before any external tool is used.
CrewAI Multi-agent, tool approval
Debate Agent
Multiple AI agents argue different sides of a question until they reach a conclusion — then the flow is also exposed as an A2A agent other systems can call.
MS Agent Framework Multi-agent debate, A2A protocol
MONITORING

Know exactly what your AI did — and why it did it.

Every prompt, decision, tool call, failure, and slowdown is tracked automatically via Langfuse. Compare runs across frameworks, spot problems fast, and understand exactly what happened — without digging through logs.

As the harness phases land: traces will extend to harness-specific spans — world model generation_id per step, control state transitions (NORMAL → CAUTIOUS → BLOCKED), diagnostic health vectors, recovery strategy changes, and reviewer-pass findings — giving you a complete audit trail of every reasoning decision the agent made.

DEPLOY & INTEGRATE

One click to publish your flow as an API, a Claude tool, or an agent.

Once your flow is ready, deploying it takes a single API call. It goes live simultaneously as three different things — so whatever system needs to use it can call it the way that makes sense.

REST endpoint

Any app can trigger your flow with a standard HTTP POST. No special SDK needed.

MCP tool

Your flow becomes a tool that Claude Desktop (and any MCP client) can call directly in conversation.

A2A agent

Expose your flow as an agent other AI systems can discover and invoke using the open A2A protocol.

Embeddable canvas

Drop the visual editor into your own app with the @itsharness/canvas npm package.

PUBLIC ALPHA

Its Harness is in public alpha.
The full harness is here.

The canvas, adapters, observability layer, and the full reasoning and control architecture are all working and ready to use — world model, control state resolver, nine-layer verification, recovery, experience store, adversarial reviewer pass. Your real flows, bug reports, and contributions shape what gets fixed and what gets built next.

Public alpha — current and planned work is in the open

The canvas-and-adapters layer is stable but alpha — APIs may shift, Docker Compose behaviour may vary, and edge cases in less-common node combinations aren't fully covered yet. The full harness reasoning and control architecture is implemented and tested. Run it, break it, and tell us what you need. Every report shapes what gets prioritised.

track A · run & report

Run it against real flows

One docker compose up starts all nine services. Point it at a real flow — your actual use case, not a toy example. When something breaks, crashes silently, or produces wrong output, open a bug report with your FlowSpec JSON, the runtime you used, and the full error. The more specific, the faster it gets fixed.

Report a bug →
track B · shape the roadmap

Tell us what you need

Missing a node type? A harness feature that would change how you build? A runtime behaviour that doesn't map cleanly? Open a feature request. Describe what you're building and where Its Harness falls short — concrete use cases carry far more weight than abstract asks. Phase priority is influenced by community demand.

Request a feature →
track C · build with it

Build a node pack or contribute a phase

FlowSpec v0.2.0 is stable for third-party node packs (@itsharness/nodes/…). The full harness implementation is public and open for community contribution. The spec, adapter interface, and canvas package are all stable enough to build on today.

Read the spec → View the plan →