Guardrails API Server
The Mend Guardrails SDK ships with a built-in HTTP server that exposes an
OpenAI-compatible REST API — the same /v1/chat/completions surface your
application already calls — but with Mend guardrails applied to every request.
Client app Guardrails Server Upstream LLM
(OpenAI SDK) ──POST──► pre_flight + input rails ──► (OpenAI / Azure / …)
◄──JSON── output rails applied ◄──
Why use the server instead of the SDK?
| Scenario | Recommended approach |
|---|---|
| Your app is already calling OpenAI directly | SDK (drop-in client, zero infra) |
| You want a language-agnostic guardrail layer | Server (any HTTP client works) |
| You run multiple services, different stacks | Server (one central guardrail proxy) |
| You can't modify application code | Server (transparent HTTP proxy) |
| You need per-team policy isolation | Server (multiple named policies) |
How it works
-
Policies are loaded from disk. Every
*.jsonor*.yamlfile inside your policy directory becomes a named configuration, addressable by its file stem (e.g.strict.json→config_id: "strict"). -
Clients are cached. The first request for a given
config_idpays the ONNX model warm-up cost once. All subsequent requests reuse the cached client with zero additional overhead. -
The guardrail pipeline runs on every request. Pre-flight and input guardrails are checked before the upstream LLM is called. If either stage blocks the request the server returns
HTTP 400immediately — the upstream is never called. Output guardrails run on the LLM's response before it is returned to the caller. -
The response is standard OpenAI JSON. Any existing code that handles a
ChatCompletionobject continues to work without modification.
Server deployment options
| Method | Command |
|---|---|
| Console script | mend-guardrails-server --policy-dir ./policies |
| Uvicorn directly | uvicorn guardrails.server.api:app --port 8000 |
| Docker | docker run … -e MEND_KEY=… mend-guardrails-server |
Next steps
- Run the Server — installation, environment variables, and Docker
- Chat Completions — request format, enforcement responses, and examples
- List Configurations — discover and inspect loaded policies