tracehouse — live trace LIVE
+0ms USER refactor the auth module
+12ms THINK planning the change, need to check current session handling first
+38ms AGENT I'll read the current auth file first
+41ms TOOL Read auth.ts
+89ms RESULT 284 lines, exports: login, logout, verifyToken
+94ms THINK the token guard uses req.session.exp — that's stale
+102ms AGENT patching the token guard to use req.session.expiresAt
+108ms TOOL Edit auth.ts
+143ms TOOL Bash npm test
+198ms RESULT FAIL: 3 failing, 2 erroring
+201ms AGENT fixing the session middleware
+214ms TOOL Edit middleware/session.ts
+247ms TOOL Bash npm test
+312ms RESULT PASS: all green
The flight recorder for AI coding agents

Record every
move your
agent makes.

Tracehouse tails your Claude Code and Codex transcripts live.
Every prompt, thought, tool call, result — in one waterfall.
Share any run. Compare any two. Export to HuggingFace.

$ curl -fsSL https://tracehouse.ai/install.sh | bash or pip install tracehouse-sdk
Claude Code + Codex CLI · Zero-dependency SDK · Free to start

Trace waterfalls.
Share links. Compare runs.

Trace waterfalls

Every prompt, thought, tool call and result — nested by subagent, on a real timeline. You see exactly what the model saw, in order.

Shareable trace links

One click turns a run into a public link. Anyone can replay the full waterfall — no account needed. Teammates, stakeholders, your future self.

Diff two runs

Side-by-side comparison. Flag the bad ones. See exactly where it went sideways — down to the specific tool call and timestamp.

Runs to datasets

Turn agent runs into shareable HuggingFace datasets in a click. Your traces become training data. Ship better agents, faster.

Zero-dependency SDK

Drop into any Python script. Works with Claude Code, Codex, and any LLM agent. No framework lock-in. Full observability in three lines of code.

Local-first

The local agent tails your transcripts. Your code never leaves your machine. We ship the trace — you control what gets recorded.

Your teammates should
be able to replay it.

"It failed" is not a bug report. Share the trace. Anyone opens the link, sees the full waterfall, and understands exactly what happened. No account. No setup. Just a link.

tracehouse.ai/d/refactor-auth-run-7a3f
RUN refactor-auth · 4m 12s · 23 spans
+0msUSERrefactor the auth module
+12msTHINKplanning the change...
+41msTOOLRead auth.ts
+89msRESULT284 lines, exports: login, logout...
+198msRESULTFAIL: 3 failing, 2 erroring
+214msTOOLEdit middleware/session.ts
+312msRESULTPASS: all green
diff — auth refactor run A vs run B · 3 changes
A — failed
+89ms TOOL Read auth.ts 36ms
+198ms RESULT FAIL — 3 failing
B — passed
+82ms TOOL Read auth.ts 31ms
+287ms RESULT PASS — all green
run B added Edit middleware/session.ts at +214ms — the fix. A skipped it entirely.

Flag the bad runs.
Ship the good ones.

Diff any two runs side by side. See the exact moment a change broke something — or fixed it. Turn regression hunting from intuition into precision.