ScannersRuntime Gateway

Runtime Gateway

Sentinel Runtime Gateway provides a provider-neutral contract for monitoring or enforcing LLM requests, responses, and policy decisions in application runtime paths.

Definition

Sentinel Runtime Gateway is a provider-neutral LLM gateway pattern that can observe or enforce prompt, response, tool, and policy decisions. It lets teams place Sentinel checks between the application and model provider without making findings depend on a single vendor.

Monitor and enforce modes

Start in monitor mode to understand traffic and false positives. Move to enforce mode only after owners agree on what blocks production actions.

Operational checklist

Monitor: log decisions and evidence
Enforce: block requests or responses that violate policy
Shadow: compare policies without changing user experience

Where to place it

The gateway belongs at the boundary where prompts, tools, retrieval context, and provider calls converge. This is usually inside the application backend, not only at the frontend.

Operational checklist

Before provider API calls
Before tool execution
After retrieval context assembly
Before response reaches a downstream parser or user

Evidence model

Runtime evidence should be minimal and privacy-aware. Store rule ID, decision, severity, redacted evidence, owner, and request correlation ID.

Operational checklist

Redact secrets and user data from logs
Keep request IDs for incident response
Export policy decisions to SIEM when needed

Detection → response loop

Effective runtime security requires a closed loop: observe traffic, surface anomalies, limit suspicious behavior, investigate with correlation IDs, and update policy based on findings. OWASP AI Exchange §2.0 defines three foundational controls for LLM runtime: MONITOR USE, RATE LIMIT, and MODEL ACCESS CONTROL.

Operational checklist

MONITOR USE (OWASP AI Exchange §2.0): log all LLM interactions with enough context to reconstruct decision paths
RATE LIMIT (OWASP AI Exchange §2.0): restrict request frequency to prevent model extraction and resource exhaustion
MODEL ACCESS CONTROL (OWASP AI Exchange §2.0): enforce least-privilege access to model endpoints and capabilities
MITRE ATLAS AML.M0024: restrict model access as a mitigation for supply chain and runtime compromise
Detection → alert → throttle → investigate: closed-loop response prevents escalation while preserving evidence

Commands

sentinel proxy --mode http --upstream http://localhost:3000 --port 8080
sentinel proxy --mode stdio -- npx my-mcp-server

Expected output

Output should carry rule ID, severity, surface, evidence, and release decision in a way other teams can understand.

mode: monitor
request_id: req_01
decision: blocked
reason: system prompt leakage pattern
rule: JINJA2-SECRET-EXPOSURE

FAQ

Should runtime gateway block everything suspicious?

No. Use monitor mode first, then enforce only high-confidence classes such as secret exposure, forbidden tool calls, and policy-critical leakage.

Does this replace application authorization?

No. Gateway checks complement authorization; they do not replace server-side permission checks.

What is the difference between monitor and enforce mode?

Monitor mode logs decisions and evidence without changing request flow. Enforce mode blocks or modifies requests and responses that violate policy. Start with monitor mode to understand baseline traffic and false-positive rates before enabling enforcement.

Eresus support

Turn the finding into an action your team can actually close.

If you need exploit evidence, prioritization, remediation direction, and retesting for Runtime Gateway, Eresus can help scope the work with your team.

Start Security Test

PreviousMCP / Agent Security NextAPI / Dashboard