Guardrails API Server

The Mend Guardrails SDK ships with a built-in HTTP server that exposes an OpenAI-compatible REST API — the same /v1/chat/completions surface your application already calls — but with Mend guardrails applied to every request.

Client app                Guardrails Server              Upstream LLM
(OpenAI SDK)  ──POST──►  pre_flight + input rails  ──►  (OpenAI / Azure / …)
              ◄──JSON──  output rails applied        ◄──

Why use the server instead of the SDK?

Scenario	Recommended approach
Your app is already calling OpenAI directly	SDK (drop-in client, zero infra)
You want a language-agnostic guardrail layer	Server (any HTTP client works)
You run multiple services, different stacks	Server (one central guardrail proxy)
You can't modify application code	Server (transparent HTTP proxy)
You need per-team policy isolation	Server (multiple named policies)

How it works

Policies are loaded from disk. Every *.json or *.yaml file inside your policy directory becomes a named configuration, addressable by its file stem (e.g. strict.json → config_id: "strict").
Clients are cached. The first request for a given config_id pays the ONNX model warm-up cost once. All subsequent requests reuse the cached client with zero additional overhead.
The guardrail pipeline runs on every request. Pre-flight and input guardrails are checked before the upstream LLM is called. If either stage blocks the request the server returns HTTP 400 immediately — the upstream is never called. Output guardrails run on the LLM's response before it is returned to the caller.
The response is standard OpenAI JSON. Any existing code that handles a ChatCompletion object continues to work without modification.

Server deployment options

Method	Command
Console script	`mend-guardrails-server --policy-dir ./policies`
Uvicorn directly	`uvicorn guardrails.server.api:app --port 8000`
Docker	`docker run … -e MEND_KEY=… mend-guardrails-server`

Next steps

Run the Server — installation, environment variables, and Docker
Chat Completions — request format, enforcement responses, and examples
List Configurations — discover and inspect loaded policies