Skip to content

Guardrails API Server

The Mend Guardrails SDK ships with a built-in HTTP server that exposes an OpenAI-compatible REST API — the same /v1/chat/completions surface your application already calls — but with Mend guardrails applied to every request.

Client app                Guardrails Server              Upstream LLM
(OpenAI SDK)  ──POST──►  pre_flight + input rails  ──►  (OpenAI / Azure / …)
              ◄──JSON──  output rails applied        ◄──

Why use the server instead of the SDK?

Scenario Recommended approach
Your app is already calling OpenAI directly SDK (drop-in client, zero infra)
You want a language-agnostic guardrail layer Server (any HTTP client works)
You run multiple services, different stacks Server (one central guardrail proxy)
You can't modify application code Server (transparent HTTP proxy)
You need per-team policy isolation Server (multiple named policies)

How it works

  1. Policies are loaded from disk. Every *.json or *.yaml file inside your policy directory becomes a named configuration, addressable by its file stem (e.g. strict.jsonconfig_id: "strict").

  2. Clients are cached. The first request for a given config_id pays the ONNX model warm-up cost once. All subsequent requests reuse the cached client with zero additional overhead.

  3. The guardrail pipeline runs on every request. Pre-flight and input guardrails are checked before the upstream LLM is called. If either stage blocks the request the server returns HTTP 400 immediately — the upstream is never called. Output guardrails run on the LLM's response before it is returned to the caller.

  4. The response is standard OpenAI JSON. Any existing code that handles a ChatCompletion object continues to work without modification.

Server deployment options

Method Command
Console script mend-guardrails-server --policy-dir ./policies
Uvicorn directly uvicorn guardrails.server.api:app --port 8000
Docker docker run … -e MEND_KEY=… mend-guardrails-server

Next steps