Not just a workflow. A complete reasoning and control architecture for AI agents.
A workflow tells your AI what to do. A harness makes sure it actually does it — with a world model that tracks beliefs and contradictions, a control layer that knows when to slow down or stop, nine-layer verification before every action, and recovery strategies when things go wrong. Its Harness has delivered the full architecture: draw it on a canvas, run it on any framework, trace every decision.
A workflow routes prompts from node to node. A harness governs what the AI believes, what it's allowed to do, how it catches its own mistakes, and what it learns for next time. Its Harness delivers the full architecture — canvas and framework adapters as the foundation, and the complete reasoning and control layer on top.
The full harness maintains a world model — typed beliefs, contradictions, hypotheses from four generation sources, and a value-of-information gate before every action. Not just "ask the LLM."
A five-tier control state resolver (NORMAL → CAUTIOUS → BLOCKED) governs every action. Diagnostic health vectors drive it. Deadlock detection stops it escalating forever. Your AI knows when to slow down.
Nine verification layers, a pre-execution review gate across five dimensions, an adversarial reviewer pass, and contract validation before every return. Trust, but verify — every time.
Six named recovery strategies, typed failure detection, local vs global replanning, and an optional experience store that reuses successful decompositions — not just success-rate priors — across runs.
Canvas + 4 framework adapters + Langfuse observability + full harness: world model, control state resolver, 9-layer verification, recovery, experience store, reviewer pass — 22 harness nodes · 379 tests passing.
Its Harness is a complete reasoning and control architecture for AI agents. The canvas and framework adapters are the visible surface — below them sit world model management, diagnostic health vectors, a five-tier control state resolver, nine-layer verification, named recovery strategies, an experience store, and a three-lens reviewer pass. 379 acceptance tests across all four frameworks.
World model with typed beliefs and generation_id tracking, evidence store, hypothesis system with diversity enforcement, contradiction detection, diagnostic health vectors, five-tier control state resolver, and a six-state task graph.
VOI estimation, nine-layer verification, pre-execution review gate, reversibility strategy, six named recovery strategies, context compression, experience store for cross-run structural reuse, adversarial reviewer pass, and output contract validation.
Ten harness canvas node types, diagnostic health dashboard, updated framework adapters with harness-aware tracing in Langfuse, caller state and escalation, process concepts, and end-to-end tests across all four frameworks.
Its Harness runs entirely on your machine via Docker. No cloud account, no sign-up — just clone and run.
# 1. Generate secrets and configure your environment ./setup-env.sh # 2. Start all services docker compose up
localhost:3000
The visual workflow editor. Draw your flows here.
localhost:8000
The backend that compiles and runs your flows.
localhost:3001
Monitoring dashboard. Every run is traced automatically.
Nine services in total. setup-env.sh handles all the secrets and configuration automatically — you only need to provide your LLM API key (or skip it and use a free local model instead).
Its Harness routes all AI calls through LiteLLM — a proxy that works with any model provider. You pick the model in your workflow; the rest is handled automatically.
Add your OPENAI_API_KEY and use gpt-4o or gpt-4o-mini in any flow.
Add your ANTHROPIC_API_KEY and use claude-sonnet, claude-haiku, or claude-opus.
Run mistral, qwen3, or qwen2.5-coder locally. No API key, no cost, no data leaving your machine.
Edit one config file to add any OpenAI-compatible model or endpoint — including self-hosted or fine-tuned models.
Want to try it without an API key? Install Ollama, pull a model (ollama pull mistral), and run ./setup-ollama.sh — it tests all four frameworks end-to-end with no paid account needed.
Its Harness stores your flow in an open format called FlowSpec. The same spec compiles to LangGraph, CrewAI, Mastra, or MAF — no rewriting. As the harness phases land, FlowSpec gains new node types for world model management, control state, verification gates, and recovery — all backwards-compatible with existing flows.
{
"workflow": "support-triage",
"nodes": [
{
"type": "llm_call",
"prompt": "Classify severity"
},
{
"type": "condition",
"route": "high_priority"
},
{
"type": "human_review"
}
],
"telemetry": {
"provider": "langfuse"
}
}
Routing, branching, conditional logic, and tool calling — all drawn visually.
Guardrails, retry logic, and human approval checkpoints built right in.
Edit workflows together, share with your team, and push to production.
Langfuse traces give you complete visibility into every run, cost, and failure.
Most workflow tools are tied to one framework. Its Harness separates the workflow design from the execution — so you can experiment, migrate, or compare frameworks without rebuilding your flows each time.
Graph-based orchestration for complex, stateful AI pipelines.
Multi-agent workflows where AI "crew members" collaborate on tasks.
TypeScript-native AI workflows, built for modern JS teams.
Enterprise AI workflows powered by Semantic Kernel.
The repo ships with five real, working flows. Each one demonstrates a different set of features — open them in the canvas, run them, and modify them to fit your use case.
Every prompt, decision, tool call, failure, and slowdown is tracked automatically via Langfuse. Compare runs across frameworks, spot problems fast, and understand exactly what happened — without digging through logs.
As the harness phases land: traces will extend to harness-specific spans — world model generation_id per step, control state transitions (NORMAL → CAUTIOUS → BLOCKED), diagnostic health vectors, recovery strategy changes, and reviewer-pass findings — giving you a complete audit trail of every reasoning decision the agent made.
Once your flow is ready, deploying it takes a single API call. It goes live simultaneously as three different things — so whatever system needs to use it can call it the way that makes sense.
Any app can trigger your flow with a standard HTTP POST. No special SDK needed.
Your flow becomes a tool that Claude Desktop (and any MCP client) can call directly in conversation.
Expose your flow as an agent other AI systems can discover and invoke using the open A2A protocol.
Drop the visual editor into your own app with the @itsharness/canvas npm package.
The canvas, adapters, observability layer, and the full reasoning and control architecture are all working and ready to use — world model, control state resolver, nine-layer verification, recovery, experience store, adversarial reviewer pass. Your real flows, bug reports, and contributions shape what gets fixed and what gets built next.
The canvas-and-adapters layer is stable but alpha — APIs may shift, Docker Compose behaviour may vary, and edge cases in less-common node combinations aren't fully covered yet. The full harness reasoning and control architecture is implemented and tested. Run it, break it, and tell us what you need. Every report shapes what gets prioritised.