Progressive Delivery for AI

Ship AI changes
without the guesswork.

Canary deployments for LLM prompts and models. Split traffic, evaluate quality with an AI judge, and auto-rollback when things break — before users notice.

Live Traffic Flow

Your App
Repath
Repath GatewayRoutes + Evaluates
90%
BaselineCurrent prompt
10%
CandidateNew prompt
LLM Judge

How It Works

Three-step loop running automatically, 24/7.

01

Split Traffic

Route a percentage of requests to a new prompt or model version. Users see no difference — Repath transparently proxies both.

02

Evaluate Quality

An LLM judge scores every response on your criteria — helpfulness, accuracy, completeness. Async, never slows down requests.

03

Decide & Act

Every 30 seconds, the controller checks scores. Quality good? Advance to more traffic. Quality dropped? Instant rollback.

Progressive Rollout in Action

Traffic advances through gates automatically when quality holds

5%
Start
25%
Gate 1
50%
Gate 2
100%
Full

AI models break silently

You can't detect "responses got 23% worse" by watching error rates. Zero HTTP errors. Zero exceptions. Just degraded output your users silently abandon.

Quality Score Over Time

When the new prompt degrades, Repath catches it instantly

0.92
0.89
0.91
0.93
0.88
0.85
0.72
0.65
rollback threshold
Auto-rollback triggered

Provider Updates Break You

OpenAI and Anthropic push model updates without notice. Your carefully-tuned prompts degrade overnight — zero errors in logs.

Weeks Before You Notice

Without quality scoring, prompt regressions hide for days. By the time users complain, the damage is done.

Feature Flags Can't Help

Feature flags control deployment, not quality. They can't answer: 'is the new version producing better outputs?'

Everything You Need

Purpose-built for AI deployment safety. Not feature flags bolted onto LLMs.

Canary Deployments

5% → 25% → 50% → 100% with quality gates at each step. Configurable weights, durations, thresholds.

LLM-as-Judge

GPT-4o-mini scores every response. Define criteria in plain English. Fully async — never slows requests.

Auto-Rollback

Quality < threshold? Traffic returns to baseline in <500ms. No human intervention needed.

Sub-2ms Overhead

Rust gateway with lock-free config. 50K+ req/s per instance. No GC, no runtime overhead.

Audit Trail

Every advance/rollback decision logged with exact scores. Full visibility into why actions were taken.

Self-Hosted

Docker Compose — one command. Data stays on your infra. No vendor lock-in.

Drop-in Integration

Change base_url in your OpenAI client. One line. No SDK, no wrappers, no code changes.

Real-time Dashboard

Live traffic split, quality graphs, decision timeline. See everything as it happens.

YAML Config

Declare rollouts as code. Version control your deployment strategy. GitOps-ready.

Why Repath?

The only tool that auto-rolls back based on semantic quality — not just error rates.

Semantic quality evaluation

Not just error rates — we evaluate whether responses are actually good using LLM-as-judge scoring on your criteria.

One-line integration

Change base_url in your OpenAI client. That's the entire integration. No SDK, no wrapper functions.

Quality drives rollout decisions

Rollback triggers on 'responses got less helpful' — not just latency spikes or error rate increases.

Works with any LLM provider

OpenAI today. Anthropic and Gemini coming Q3 2026. Any OpenAI-compatible endpoint works now.

vs. Traditional Tools

Auto-rollback on quality
Built-in, free
xEnterprise only ($50K+/yr)
LLM-as-judge scoring
Drives decisions
xObservability only
Integration effort
1 line change
xSDK + config + metrics
Self-hosted option
Docker Compose
xSaaS only
AI-native design
Purpose-built
xFeature flags + bolt-on

Built for Production

Rust gateway, Python evaluators, Next.js dashboard. Not a prototype.

Rust Gateway
Axum + Tokio
Rust Controller
30s state machine
Python Evaluators
Async workers
Next.js Dashboard
React 19
PostgreSQL 16
Persistent state
Redis Streams
Eval queue
Docker Compose
One command
CLI
Rust-powered

Roadmap

Where we are and what's coming. Building in public.

PHASE 1COMPLETEQ2 2026

Core Platform

OpenAI-compatible transparent proxy
Canary traffic splitting
LLM-as-judge quality evaluation
Auto advance & rollback controller
Real-time dashboard
CLI for rollout management
PostgreSQL + Redis infrastructure
Docker Compose one-command startup
PHASE 2IN PROGRESSQ3 2026

Multi-Provider & Cloud

Anthropic Claude support
Google Gemini support
Shadow testing mode
Repath Cloud (managed hosting)
Webhook & Slack alerts
Cost-aware routing
PHASE 3Q4 2026

Advanced Evals

  • Drift detection
  • Human feedback loops
  • A/B with significance
  • Custom judge models
PHASE 42027

Enterprise

  • SSO / SAML
  • Team RBAC
  • Kubernetes Helm chart
  • Enterprise SLA & support

Running in 60 Seconds

Clone, configure, run. No account needed.

1

Clone & configure

git clone https://github.com/repathhq/repath cd repath && cp .env.example .env # Add your OPENAI_API_KEY to .env
2

Start all services

docker compose up # PostgreSQL, Redis, Gateway, Controller, Evaluator, Dashboard # Ready in ~30 seconds
3

Point your app

client = OpenAI( api_key="sk-...", base_url="http://localhost:8080/v1" )
4

Create a canary rollout

repath rollout create -f examples/demo-canary.yaml repath rollout status demo-customer-support --watch
Repath

Stop shipping AI blind.

Know if your prompt change is better or worse — before your users do.