Dashboard Get API Key

Model drift.
Made visible.

Daily signals for empty outputs, constraint failures, and factual drift across OpenAI, Claude, and Gemini.

Empty outputs Constraint drift Factual stability
Day (UTC)

Selected day

What failed today, in one view.

Provider comparison

Reliability, speed, and factual stability for the selected day.

Empty output trend

Strict prompt failures over time
Last 30 days. Lower is better.
Metric: complex prompt failure rate. Lower is better.

Other signals

Instruction following
Higher is better. Empty outputs lower the score.
Format compliance
Higher is better on strict output rules.
Factual stability
Higher is better on repeated fact checks.

Risk feed

Worst scenarios

Top 5 by worst empty rate.
Metric URLs
Proof
Aggregated metrics only. Each day links back to a canonical run root hash.
FAQ
What is a “no-text response”?
A request that “succeeds” (often 200 OK) but returns no usable output text. Your app might treat it as success and then crash later.
What does “complex prompt failure rate” mean?
It’s how often models return no text on prompts with strict constraints (e.g., “exactly 3 sentences”). These are common in production (JSON schemas, word limits, bullet counts).
What are A054–A057 codes?
Internal test IDs for specific constraint prompts. They’re stable identifiers so we can compare behavior over time.
What does “tamper‑evident” mean?
Each day’s metrics link back to a hash root of the underlying run. If someone edits history later, the proof won’t match.
Example (selected day)
OpenAI complex prompt no-text rate: on .