Runtime Gateway
Sentinel Runtime Gateway provides a provider-neutral contract for monitoring or enforcing LLM requests, responses, and policy decisions in application runtime paths.
Sentinel Runtime Gateway is a provider-neutral LLM gateway pattern that can observe or enforce prompt, response, tool, and policy decisions. It lets teams place Sentinel checks between the application and model provider without making findings depend on a single vendor.
Monitor and enforce modes
Start in monitor mode to understand traffic and false positives. Move to enforce mode only after owners agree on what blocks production actions.
- Monitor: log decisions and evidence
- Enforce: block requests or responses that violate policy
- Shadow: compare policies without changing user experience
Where to place it
The gateway belongs at the boundary where prompts, tools, retrieval context, and provider calls converge. This is usually inside the application backend, not only at the frontend.
- Before provider API calls
- Before tool execution
- After retrieval context assembly
- Before response reaches a downstream parser or user
Evidence model
Runtime evidence should be minimal and privacy-aware. Store rule ID, decision, severity, redacted evidence, owner, and request correlation ID.
- Redact secrets and user data from logs
- Keep request IDs for incident response
- Export policy decisions to SIEM when needed
Detection → response loop
Effective runtime security requires a closed loop: observe traffic, surface anomalies, limit suspicious behavior, investigate with correlation IDs, and update policy based on findings. OWASP AI Exchange §2.0 defines three foundational controls for LLM runtime: MONITOR USE, RATE LIMIT, and MODEL ACCESS CONTROL.
- MONITOR USE (OWASP AI Exchange §2.0): log all LLM interactions with enough context to reconstruct decision paths
- RATE LIMIT (OWASP AI Exchange §2.0): restrict request frequency to prevent model extraction and resource exhaustion
- MODEL ACCESS CONTROL (OWASP AI Exchange §2.0): enforce least-privilege access to model endpoints and capabilities
- MITRE ATLAS AML.M0024: restrict model access as a mitigation for supply chain and runtime compromise
- Detection → alert → throttle → investigate: closed-loop response prevents escalation while preserving evidence
Commands
sentinel proxy --mode http --upstream http://localhost:3000 --port 8080
sentinel proxy --mode stdio -- npx my-mcp-serverExpected output
Output should carry rule ID, severity, surface, evidence, and release decision in a way other teams can understand.
mode: monitor
request_id: req_01
decision: blocked
reason: system prompt leakage pattern
rule: JINJA2-SECRET-EXPOSUREFAQ
Should runtime gateway block everything suspicious?
No. Use monitor mode first, then enforce only high-confidence classes such as secret exposure, forbidden tool calls, and policy-critical leakage.
Does this replace application authorization?
No. Gateway checks complement authorization; they do not replace server-side permission checks.
What is the difference between monitor and enforce mode?
Monitor mode logs decisions and evidence without changing request flow. Enforce mode blocks or modifies requests and responses that violate policy. Start with monitor mode to understand baseline traffic and false-positive rates before enabling enforcement.
References
- Eresus Sentinel GitHub
- OWASP LLM Top 10 2025
- OWASP AI Exchange §2.0 Runtime Controls
- MITRE ATLAS AML.M0024 Restrict Model Access
Eresus support
Turn the finding into an action your team can actually close.
If you need exploit evidence, prioritization, remediation direction, and retesting for Runtime Gateway, Eresus can help scope the work with your team.
Start Security Test