Canary deployments for LLM prompts and models. Split traffic, evaluate quality with an AI judge, and auto-rollback when things break — before users notice.
Live Traffic Flow
Three-step loop running automatically, 24/7.
Route a percentage of requests to a new prompt or model version. Users see no difference — Repath transparently proxies both.
An LLM judge scores every response on your criteria — helpfulness, accuracy, completeness. Async, never slows down requests.
Every 30 seconds, the controller checks scores. Quality good? Advance to more traffic. Quality dropped? Instant rollback.
Traffic advances through gates automatically when quality holds
You can't detect "responses got 23% worse" by watching error rates. Zero HTTP errors. Zero exceptions. Just degraded output your users silently abandon.
When the new prompt degrades, Repath catches it instantly
OpenAI and Anthropic push model updates without notice. Your carefully-tuned prompts degrade overnight — zero errors in logs.
Without quality scoring, prompt regressions hide for days. By the time users complain, the damage is done.
Feature flags control deployment, not quality. They can't answer: 'is the new version producing better outputs?'
Purpose-built for AI deployment safety. Not feature flags bolted onto LLMs.
5% → 25% → 50% → 100% with quality gates at each step. Configurable weights, durations, thresholds.
GPT-4o-mini scores every response. Define criteria in plain English. Fully async — never slows requests.
Quality < threshold? Traffic returns to baseline in <500ms. No human intervention needed.
Rust gateway with lock-free config. 50K+ req/s per instance. No GC, no runtime overhead.
Every advance/rollback decision logged with exact scores. Full visibility into why actions were taken.
Docker Compose — one command. Data stays on your infra. No vendor lock-in.
Change base_url in your OpenAI client. One line. No SDK, no wrappers, no code changes.
Live traffic split, quality graphs, decision timeline. See everything as it happens.
Declare rollouts as code. Version control your deployment strategy. GitOps-ready.
The only tool that auto-rolls back based on semantic quality — not just error rates.
Not just error rates — we evaluate whether responses are actually good using LLM-as-judge scoring on your criteria.
Change base_url in your OpenAI client. That's the entire integration. No SDK, no wrapper functions.
Rollback triggers on 'responses got less helpful' — not just latency spikes or error rate increases.
OpenAI today. Anthropic and Gemini coming Q3 2026. Any OpenAI-compatible endpoint works now.
Rust gateway, Python evaluators, Next.js dashboard. Not a prototype.
Where we are and what's coming. Building in public.
Clone, configure, run. No account needed.
Know if your prompt change is better or worse — before your users do.