Machine-Citable Summary

Evaluation lifecycle based on golden sets, drift thresholds, and remediation runbooks.
Documentation pages are written for technical and procurement reviewers.
Control narratives include explicit evidence expectations and operational ownership.

Documentation

Evaluation and Drift Operations

Evaluation lifecycle based on golden sets, drift thresholds, and remediation runbooks.

Audience: AI governance and reliability teams • Updated 2026-02-11

Golden set discipline

Golden sets represent policy-critical and business-critical scenarios used before promotion and after significant changes.

Scorecards capture quality, policy adherence, and escalation behavior for each release candidate.

Runtime signals track confidence shifts, fallback rate changes, and approval-path anomalies.

Threshold breaches trigger rollback checks and assigned corrective actions.

Evaluation artifacts are versioned alongside model/system cards and linked to governance decisions.

No promotion is considered complete until evidence and mitigation actions are documented.