Overview
The Unbound Security AI Gateway provides a RESTful API that allows you to integrate AI capabilities into your applications while maintaining enterprise-grade security, cost control, and compliance. The API is compatible with OpenAI’s Chat Completions API format, making it easy to migrate existing integrations.Base URL
All API requests should be made to:Authentication
The Unbound API uses Bearer token authentication. Include your API key in the Authorization header:Get your API key at gateway.getunbound.ai/connect.
Chat Completions
Create a completion for the provided chat messages.Endpoint
Request Headers
| Header | Type | Required | Description |
|---|---|---|---|
Authorization | string | Yes | Bearer token with your API key |
Content-Type | string | Yes | Must be application/json |
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The model to use for completion. Accepts either provider/model (e.g. anthropic/claude-sonnet-4-20250514) or the bare model id (e.g. claude-sonnet-4-20250514). |
messages | array | Yes | Array of message objects |
max_tokens | integer | No | Maximum tokens to generate (default: 1000) |
temperature | number | No | Sampling temperature (0.0 to 2.0, default: 1.0) |
stream | boolean | No | Whether to stream the response (default: false) |
The
model field in the response is always the bare model id without the provider prefix, regardless of which form you sent. The original provider is returned separately as the top-level provider field (see Response Body).Message Object
| Parameter | Type | Required | Description |
|---|---|---|---|
role | string | Yes | Role of the message sender (user, assistant, system) |
content | string | Yes | Content of the message |
Response Body
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for the completion |
object | string | Object type. Anthropic, Google, Vertex AI, Cohere, and Workers AI return "chat_completion"; OpenAI and Bedrock return "chat.completion". |
created | integer | Unix timestamp (seconds) when the response was generated |
model | string | The bare model id (provider prefix stripped) |
provider | string | The upstream provider that served the request (e.g. "anthropic", "openai", "google") |
choices | array | Array of completion choices — see below |
usage | object | Token usage — prompt_tokens, completion_tokens, total_tokens, plus optional cache_read_input_tokens / cache_creation_input_tokens |
hook_results | object | Present only when guardrail hooks ran for this request. Contains before_request_hooks and after_request_hooks arrays describing each hook’s verdict. |
choices has:
| Field | Type | Description |
|---|---|---|
index | integer | Position of this choice in the array |
message.role | string | Always "assistant" for completion responses |
message.content | string | array | Plain string for text-only replies; an array of content blocks (e.g. [{"type": "text", "text": "..."}]) when the upstream returns structured content such as text + tool use |
message.tool_calls | array | Present when the model invoked tools |
finish_reason | string | "stop" (natural completion), "length" (hit max_tokens), "tool_calls" (model invoked a tool), or a provider-specific passthrough value for less common stop reasons |
logprobs | object | null | Log probabilities, when requested by the provider |
Example Request
Example Response
For plain text-only completions some providers return
message.content as a string instead of a content-block array. Clients should accept both shapes — check Array.isArray(content) before iterating.Multi-turn Conversation
Streaming Response
Pass--no-buffer so curl flushes each chunk as it arrives instead of waiting for the full response:
"object": "chat.completion.chunk" (dot notation) — including chunks served by Anthropic, Google, Vertex AI, Cohere, and Workers AI, whose non-streaming object value is "chat_completion" (underscore). Build clients that branch on object accordingly. The stream ends with data: [DONE]:
The final usage block on streaming responses contains
prompt_tokens and completion_tokens only — total_tokens is not emitted on the stream. Sum the two client-side if you need it.List Models
Retrieve available models and their pricing from the gateway
Python SDK
Use the Unbound Python SDK for drop-in OpenAI compatibility

