Skip to main content

Overview

The Unbound Security AI Gateway provides a RESTful API that allows you to integrate AI capabilities into your applications while maintaining enterprise-grade security, cost control, and compliance. The API is compatible with OpenAI’s Chat Completions API format, making it easy to migrate existing integrations.

Base URL

All API requests should be made to:
https://api.getunbound.ai/v1

Authentication

The Unbound API uses Bearer token authentication. Include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Get your API key at gateway.getunbound.ai/connect.

Chat Completions

Create a completion for the provided chat messages.

Endpoint

POST /v1/chat/completions

Request Headers

HeaderTypeRequiredDescription
AuthorizationstringYesBearer token with your API key
Content-TypestringYesMust be application/json

Request Body

ParameterTypeRequiredDescription
modelstringYesThe model to use for completion. Accepts either provider/model (e.g. anthropic/claude-sonnet-4-20250514) or the bare model id (e.g. claude-sonnet-4-20250514).
messagesarrayYesArray of message objects
max_tokensintegerNoMaximum tokens to generate (default: 1000)
temperaturenumberNoSampling temperature (0.0 to 2.0, default: 1.0)
streambooleanNoWhether to stream the response (default: false)
The model field in the response is always the bare model id without the provider prefix, regardless of which form you sent. The original provider is returned separately as the top-level provider field (see Response Body).

Message Object

ParameterTypeRequiredDescription
rolestringYesRole of the message sender (user, assistant, system)
contentstringYesContent of the message

Response Body

FieldTypeDescription
idstringUnique identifier for the completion
objectstringObject type. Anthropic, Google, Vertex AI, Cohere, and Workers AI return "chat_completion"; OpenAI and Bedrock return "chat.completion".
createdintegerUnix timestamp (seconds) when the response was generated
modelstringThe bare model id (provider prefix stripped)
providerstringThe upstream provider that served the request (e.g. "anthropic", "openai", "google")
choicesarrayArray of completion choices — see below
usageobjectToken usage — prompt_tokens, completion_tokens, total_tokens, plus optional cache_read_input_tokens / cache_creation_input_tokens
hook_resultsobjectPresent only when guardrail hooks ran for this request. Contains before_request_hooks and after_request_hooks arrays describing each hook’s verdict.
Each entry in choices has:
FieldTypeDescription
indexintegerPosition of this choice in the array
message.rolestringAlways "assistant" for completion responses
message.contentstring | arrayPlain string for text-only replies; an array of content blocks (e.g. [{"type": "text", "text": "..."}]) when the upstream returns structured content such as text + tool use
message.tool_callsarrayPresent when the model invoked tools
finish_reasonstring"stop" (natural completion), "length" (hit max_tokens), "tool_calls" (model invoked a tool), or a provider-specific passthrough value for less common stop reasons
logprobsobject | nullLog probabilities, when requested by the provider

Example Request

curl -X POST 'https://api.getunbound.ai/v1/chat/completions' \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250514",
    "messages": [
      {
        "role": "user",
        "content": "Help me plan meals for the week."
      }
    ]
  }'

Example Response

{
  "id": "msg_01ABCdefGHIJkLMnopQRStu",
  "object": "chat_completion",
  "created": 1677652288,
  "model": "claude-sonnet-4-20250514",
  "provider": "anthropic",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "message": {
        "role": "assistant",
        "content": [
          {
            "type": "text",
            "text": "Here's a balanced meal plan for the week..."
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 245,
    "total_tokens": 260
  }
}
For plain text-only completions some providers return message.content as a string instead of a content-block array. Clients should accept both shapes — check Array.isArray(content) before iterating.

Multi-turn Conversation

curl -X POST 'https://api.getunbound.ai/v1/chat/completions' \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250514",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful coding assistant."
      },
      {
        "role": "user",
        "content": "How do I create a REST API in Python?"
      },
      {
        "role": "assistant",
        "content": "I can help you create a REST API in Python using Flask or FastAPI. Which framework would you prefer?"
      },
      {
        "role": "user",
        "content": "Let's use FastAPI"
      }
    ],
    "max_tokens": 1000,
    "temperature": 0.7
  }'
The response shape is identical to the single-turn case:
{
  "id": "msg_01XYZabc456",
  "object": "chat_completion",
  "created": 1677652290,
  "model": "claude-sonnet-4-20250514",
  "provider": "anthropic",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "message": {
        "role": "assistant",
        "content": [
          {
            "type": "text",
            "text": "Great — let's set up a basic FastAPI app..."
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 84,
    "completion_tokens": 312,
    "total_tokens": 396
  }
}

Streaming Response

Pass --no-buffer so curl flushes each chunk as it arrives instead of waiting for the full response:
curl --no-buffer -X POST 'https://api.getunbound.ai/v1/chat/completions' \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250514",
    "messages": [
      {
        "role": "user",
        "content": "Write a Python function to calculate fibonacci numbers"
      }
    ],
    "stream": true
  }'
Streaming responses are always normalized to OpenAI’s Server-Sent Events shape, regardless of the upstream provider. Every chunk uses "object": "chat.completion.chunk" (dot notation) — including chunks served by Anthropic, Google, Vertex AI, Cohere, and Workers AI, whose non-streaming object value is "chat_completion" (underscore). Build clients that branch on object accordingly. The stream ends with data: [DONE]:
data: {"id":"msg_01ABC","object":"chat.completion.chunk","created":1677652288,"model":"claude-sonnet-4-20250514","provider":"anthropic","choices":[{"index":0,"delta":{"role":"assistant","content":"def "},"finish_reason":null}]}

data: {"id":"msg_01ABC","object":"chat.completion.chunk","created":1677652288,"model":"claude-sonnet-4-20250514","provider":"anthropic","choices":[{"index":0,"delta":{"content":"fib(n):"},"finish_reason":null}]}

...

data: {"id":"msg_01ABC","object":"chat.completion.chunk","created":1677652288,"model":"claude-sonnet-4-20250514","provider":"anthropic","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":18,"completion_tokens":62}}

data: [DONE]
The final usage block on streaming responses contains prompt_tokens and completion_tokens only — total_tokens is not emitted on the stream. Sum the two client-side if you need it.

List Models

Retrieve available models and their pricing from the gateway

Python SDK

Use the Unbound Python SDK for drop-in OpenAI compatibility