Red Team and Eval Engine
Sentinel Red Team and Eval workflows run repeatable probes, detectors, generators, and config-driven assertions against LLM applications and provider integrations.
Sentinel Red Team and Eval Engine is a repeatable testing layer for LLM applications. It turns prompts, probes, expected assertions, and provider settings into evidence that can be run locally, in CI, and during release review.
Config-driven evals
Use YAML evals when you need a stable regression suite for prompts, providers, and tool behavior. Every important app behavior should have an assertion that explains what must stay true.
- Provider IDs and model choices
- Prompt templates and variables
- Assertions for refusal, containment, JSON shape, and leakage
- Thresholds that decide release pass/fail
Red team probes
Red team probes are designed to exercise failure modes, not happy paths. They help security teams find injection, leakage, excessive agency, and unsafe output handling before a real user does.
- Prompt injection and indirect prompt injection
- Tool-use abuse and overbroad permissions
- RAG leakage and poisoned context
- Output handling and structured response bypasses
Reporting evidence
Eval evidence should be versioned with the application. A failed probe should include the prompt, model/provider, assertion, observed output, severity, and retest command.
- Attach JSON output to CI artifacts
- Keep failed prompts small and reproducible
- Map failures to OWASP LLM where possible
Probe coverage benchmarks (OWASP LLM Top 10 2025 + benchmarks)
External benchmarks provide objective probe corpus and difficulty ratings. JailbreakBench and AIRTBench score how effectively evals catch adversarial inputs. ISC-Bench evaluates instruction-following safety compliance. Cross-reference probe results against OWASP LLM Top 10 2025 categories to verify coverage.
- OWASP LLM01:2025 Prompt Injection — probe corpus: direct instruction override, ignore-previous-instructions patterns
- OWASP LLM06:2025 Excessive Agency — probe corpus: tool permission boundary violations, overbroad capability requests
- OWASP LLM02:2025 Sensitive Info Disclosure — probe corpus: PII extraction, credential leakage, training data extraction
- JailbreakBench (jailbreakbench.github.io): standardized leaderboard and probe corpus for jailbreak safety evaluation
- AIRTBench: AI red teaming evaluation suite covering both LLM and agentic attack surfaces
- ISC-Bench: instruction-following safety compliance benchmark for evaluating refusal quality
SARIF output and CI integration
Sentinel red-team eval output can be emitted as SARIF (Static Analysis Results Interchange Format), making probe failures first-class findings in GitHub Advanced Security, Azure DevSecOps, and other SARIF consumers.
- SARIF rule ID maps to OWASP LLM category (e.g., LLM01, LLM06) for automatic triage
- Each finding includes: probe prompt, model response, assertion, severity, and remediation hint
- Upload to GitHub Code Scanning via `actions/upload-sarif` for inline PR annotation
- SARIF severity levels align with Sentinel severity guide: CRITICAL/HIGH block merge, MEDIUM/LOW advisory
Commands
sentinel redteam --target openai/gpt-4o
sentinel redteam --list-probes
sentinel evaluate eval.yaml --fail-on-threshold 0.95
sentinel evaluate eval.yaml -f json -o eval-report.jsonExpected output
Output should carry rule ID, severity, surface, evidence, and release decision in a way other teams can understand.
suite: agent-redteam
pass_rate: 0.91
failed:
- prompt_injection.system_prompt_leak
- tool_use.excessive_agency
decision: fail threshold 0.95FAQ
How often should evals run?
Run fast evals on prompt/tool PRs, full suites before release, and scheduled suites when providers or retrieval data change.
Should eval failures block release?
Failures tied to data leakage, privileged tool abuse, or policy bypass should block release. Cosmetic response drift can be reviewed as MEDIUM or LOW.
Which external benchmarks should I use?
JailbreakBench for jailbreak probe coverage and leaderboard comparison, AIRTBench for agentic red teaming, and ISC-Bench for instruction-following safety compliance. Map probe results to OWASP LLM Top 10 2025 to confirm coverage across all ten categories.
References
Eresus support
Turn the finding into an action your team can actually close.
If you need exploit evidence, prioritization, remediation direction, and retesting for Red Team and Eval Engine, Eresus can help scope the work with your team.
Start Security Test