API Reference
Complete reference for the chat completions endpoint. OpenAI-compatible request and response format.
POST
https://api.continuum.au/v1/chat/completionsCreates a model response for the given chat conversation.
Authentication
All requests require an API key sent via the Authorization header.
| Header | Value | Required |
|---|---|---|
Authorization | Bearer YOUR_API_KEY | Yes |
Content-Type | application/json | Yes |
Request body
modelstringrequiredID of the model to use. Currently available:
deepseek-v4-flashmessagesarrayrequiredA list of messages comprising the conversation. Each message is an object with
role and content fields. Supported roles: system, user, assistant, tool.temperaturenumberDefault: 1Sampling temperature between 0 and 2. Higher values (0.8) produce more random output. Lower values (0.2) produce more focused output. Not available in thinking mode.
top_pnumberDefault: 1Nucleus sampling. 0.1 means only tokens comprising the top 10% probability mass are considered. Alter this or temperature, not both. Not available in thinking mode.
max_tokensintegerMaximum number of tokens to generate. The total of input + output tokens is limited by the model context length (1,048,576 tokens).
streambooleanDefault: falseIf true, partial message deltas are sent as server-sent events (SSE). The stream is terminated by a
data: [DONE] message. See the streaming guide.toolsarrayA list of tools the model may call. Currently only functions are supported. Max 128 tools per request. Each tool requires
type (always "function"), function.name, and function.parameters (JSON Schema). See the tool calling guide.tool_choicestring | objectDefault: autoControls tool selection.
none prevents tool calls. auto lets the model decide. required forces a tool call. Specify a tool with {"type": "function", "function": {"name": "my_fn"}}.response_formatobjectSet to
{"type": "json_object"} to enable JSON output mode. You must also include the word "json" in your system or user message. See the JSON output guide.thinkingobjectControls reasoning mode.
{"type": "enabled"} activates thinking mode. {"type": "disabled"} uses standard mode. See the thinking modes guide.reasoning_effortstringSets the depth of reasoning when thinking mode is enabled.
high for extended reasoning. max for maximum reasoning budget (recommend 384K+ context).stopstring | arrayUp to 16 sequences where the API will stop generating. The returned text will not contain the stop sequence.
frequency_penaltynumberDefault: 0Number between -2.0 and 2.0. Positive values penalise tokens based on their frequency in the text so far. Not available in thinking mode.
presence_penaltynumberDefault: 0Number between -2.0 and 2.0. Positive values penalise tokens based on whether they appear in the text so far. Not available in thinking mode.
Response
response.json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1714387200,
"model": "deepseek-v4-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of Australia is Canberra.",
"reasoning_content": null,
"tool_calls": null
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 9,
"total_tokens": 33
}
}
Response fields
idstringUnique identifier for the completion.
choicesarrayArray of completion choices. Standard requests return one choice.
choices[].message.contentstring | nullThe model response text. Null when the model makes a tool call.
choices[].message.reasoning_contentstring | nullThe chain-of-thought reasoning. Only present when thinking mode is enabled. Null otherwise.
choices[].message.tool_callsarray | nullTool calls requested by the model. Each contains
id, function.name, and function.arguments (JSON string).choices[].finish_reasonstringstop — natural completion. length — hit max_tokens. tool_calls — model wants to call a tool. content_filter — content filtered.usage.prompt_tokensintegerNumber of tokens in the input messages.
usage.completion_tokensintegerNumber of tokens generated (includes reasoning tokens if thinking mode is enabled).
usage.total_tokensintegerTotal tokens consumed. This is the billable amount.
Error codes
| Code | Meaning | Action |
|---|---|---|
400 | Bad request | Check request body format and required fields. |
401 | Unauthorized | Check your API key is correct and active. |
403 | Forbidden | Your API key does not have access to this resource. |
429 | Rate limited | Too many requests. Implement exponential backoff. |
500 | Server error | Retry after a brief delay. Contact support if persistent. |
503 | Service unavailable | The service is temporarily overloaded. Retry with backoff. |
Ready to integrate?
Start with the quickstart guide, or explore tool calling, JSON output, and thinking modes.