Progressive Delivery for AI

Ship AI changes
without the guesswork.

Canary deployments for LLM prompts and models. Split traffic, evaluate quality with an AI judge, and auto-rollback when things break — before users notice.

Get Started View on GitHub

Live Traffic Flow

Your App

Repath GatewayRoutes + Evaluates

90%

BaselineCurrent prompt

10%

CandidateNew prompt

LLM Judge

How It Works

Three-step loop running automatically, 24/7.

Split Traffic

Route a percentage of requests to a new prompt or model version. Users see no difference — Repath transparently proxies both.

Evaluate Quality

An LLM judge scores every response on your criteria — helpfulness, accuracy, completeness. Async, never slows down requests.

Decide & Act

Every 30 seconds, the controller checks scores. Quality good? Advance to more traffic. Quality dropped? Instant rollback.

Progressive Rollout in Action

Traffic advances through gates automatically when quality holds

Start

25%

Gate 1

50%

Gate 2

100%

Full

AI models break silently

You can't detect "responses got 23% worse" by watching error rates. Zero HTTP errors. Zero exceptions. Just degraded output your users silently abandon.

Quality Score Over Time

When the new prompt degrades, Repath catches it instantly

0.92

0.89

0.91

0.93

0.88

0.85

0.72

0.65

rollback threshold

Auto-rollback triggered

Provider Updates Break You

OpenAI and Anthropic push model updates without notice. Your carefully-tuned prompts degrade overnight — zero errors in logs.

Weeks Before You Notice

Without quality scoring, prompt regressions hide for days. By the time users complain, the damage is done.

Feature Flags Can't Help

Feature flags control deployment, not quality. They can't answer: 'is the new version producing better outputs?'

Everything You Need

Purpose-built for AI deployment safety. Not feature flags bolted onto LLMs.

Canary Deployments

5% → 25% → 50% → 100% with quality gates at each step. Configurable weights, durations, thresholds.

LLM-as-Judge

GPT-4o-mini scores every response. Define criteria in plain English. Fully async — never slows requests.

Auto-Rollback

Quality < threshold? Traffic returns to baseline in <500ms. No human intervention needed.

Sub-2ms Overhead

Rust gateway with lock-free config. 50K+ req/s per instance. No GC, no runtime overhead.

Audit Trail

Every advance/rollback decision logged with exact scores. Full visibility into why actions were taken.

Self-Hosted

Docker Compose — one command. Data stays on your infra. No vendor lock-in.

Drop-in Integration

Change base_url in your OpenAI client. One line. No SDK, no wrappers, no code changes.

Real-time Dashboard

Live traffic split, quality graphs, decision timeline. See everything as it happens.

YAML Config

Declare rollouts as code. Version control your deployment strategy. GitOps-ready.

Why Repath?

The only tool that auto-rolls back based on semantic quality — not just error rates.

Semantic quality evaluation

Not just error rates — we evaluate whether responses are actually good using LLM-as-judge scoring on your criteria.

One-line integration

Change base_url in your OpenAI client. That's the entire integration. No SDK, no wrapper functions.

Quality drives rollout decisions

Rollback triggers on 'responses got less helpful' — not just latency spikes or error rate increases.

Works with any LLM provider

OpenAI today. Anthropic and Gemini coming Q3 2026. Any OpenAI-compatible endpoint works now.

vs. Traditional Tools

Auto-rollback on quality

Built-in, free

xEnterprise only ($50K+/yr)

LLM-as-judge scoring

Drives decisions

xObservability only

Integration effort

1 line change

xSDK + config + metrics

Self-hosted option

Docker Compose

xSaaS only

AI-native design

Purpose-built

xFeature flags + bolt-on

Built for Production

Rust gateway, Python evaluators, Next.js dashboard. Not a prototype.

Rust Gateway

Axum + Tokio

Rust Controller

30s state machine

Python Evaluators

Async workers

Next.js Dashboard

React 19

PostgreSQL 16

Persistent state

Redis Streams

Eval queue

Docker Compose

One command

CLI

Rust-powered

Roadmap

Where we are and what's coming. Building in public.

PHASE 1COMPLETEQ2 2026

Core Platform

OpenAI-compatible transparent proxy

Canary traffic splitting

LLM-as-judge quality evaluation

Auto advance & rollback controller

Real-time dashboard

CLI for rollout management

PostgreSQL + Redis infrastructure

Docker Compose one-command startup

PHASE 2IN PROGRESSQ3 2026

Multi-Provider & Cloud

Anthropic Claude support

Google Gemini support

Shadow testing mode

Repath Cloud (managed hosting)

Webhook & Slack alerts

Cost-aware routing

PHASE 3Q4 2026

Advanced Evals

Drift detection
Human feedback loops
A/B with significance
Custom judge models

PHASE 42027

Enterprise

SSO / SAML
Team RBAC
Kubernetes Helm chart
Enterprise SLA & support

Running in 60 Seconds

Clone, configure, run. No account needed.

Clone & configure

git clone https://github.com/repathhq/repath cd repath && cp .env.example .env # Add your OPENAI_API_KEY to .env

Start all services

docker compose up # PostgreSQL, Redis, Gateway, Controller, Evaluator, Dashboard # Ready in ~30 seconds

Point your app

client = OpenAI( api_key="sk-...", base_url="http://localhost:8080/v1" )

Create a canary rollout

repath rollout create -f examples/demo-canary.yaml repath rollout status demo-customer-support --watch

Stop shipping AI blind.

Know if your prompt change is better or worse — before your users do.

View on GitHub Try Dashboard

Ship AI changeswithout the guesswork.