API Reference

Complete reference for the chat completions endpoint. OpenAI-compatible request and response format.

POSThttps://api.continuum.au/v1/chat/completions

Creates a model response for the given chat conversation.

Authentication

All requests require an API key sent via the Authorization header.

HeaderValueRequired
AuthorizationBearer YOUR_API_KEYYes
Content-Typeapplication/jsonYes

Request body

modelstringrequired
ID of the model to use. Currently available: deepseek-v4-flash
messagesarrayrequired
A list of messages comprising the conversation. Each message is an object with role and content fields. Supported roles: system, user, assistant, tool.
temperaturenumberDefault: 1
Sampling temperature between 0 and 2. Higher values (0.8) produce more random output. Lower values (0.2) produce more focused output. Not available in thinking mode.
top_pnumberDefault: 1
Nucleus sampling. 0.1 means only tokens comprising the top 10% probability mass are considered. Alter this or temperature, not both. Not available in thinking mode.
max_tokensinteger
Maximum number of tokens to generate. The total of input + output tokens is limited by the model context length (1,048,576 tokens).
streambooleanDefault: false
If true, partial message deltas are sent as server-sent events (SSE). The stream is terminated by a data: [DONE] message. See the streaming guide.
toolsarray
A list of tools the model may call. Currently only functions are supported. Max 128 tools per request. Each tool requires type (always "function"), function.name, and function.parameters (JSON Schema). See the tool calling guide.
tool_choicestring | objectDefault: auto
Controls tool selection. none prevents tool calls. auto lets the model decide. required forces a tool call. Specify a tool with {"type": "function", "function": {"name": "my_fn"}}.
response_formatobject
Set to {"type": "json_object"} to enable JSON output mode. You must also include the word "json" in your system or user message. See the JSON output guide.
thinkingobject
Controls reasoning mode. {"type": "enabled"} activates thinking mode. {"type": "disabled"} uses standard mode. See the thinking modes guide.
reasoning_effortstring
Sets the depth of reasoning when thinking mode is enabled. high for extended reasoning. max for maximum reasoning budget (recommend 384K+ context).
stopstring | array
Up to 16 sequences where the API will stop generating. The returned text will not contain the stop sequence.
frequency_penaltynumberDefault: 0
Number between -2.0 and 2.0. Positive values penalise tokens based on their frequency in the text so far. Not available in thinking mode.
presence_penaltynumberDefault: 0
Number between -2.0 and 2.0. Positive values penalise tokens based on whether they appear in the text so far. Not available in thinking mode.

Response

response.json
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1714387200, "model": "deepseek-v4-flash", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The capital of Australia is Canberra.", "reasoning_content": null, "tool_calls": null }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 24, "completion_tokens": 9, "total_tokens": 33 } }

Response fields

idstring
Unique identifier for the completion.
choicesarray
Array of completion choices. Standard requests return one choice.
choices[].message.contentstring | null
The model response text. Null when the model makes a tool call.
choices[].message.reasoning_contentstring | null
The chain-of-thought reasoning. Only present when thinking mode is enabled. Null otherwise.
choices[].message.tool_callsarray | null
Tool calls requested by the model. Each contains id, function.name, and function.arguments (JSON string).
choices[].finish_reasonstring
stop — natural completion. length — hit max_tokens. tool_calls — model wants to call a tool. content_filter — content filtered.
usage.prompt_tokensinteger
Number of tokens in the input messages.
usage.completion_tokensinteger
Number of tokens generated (includes reasoning tokens if thinking mode is enabled).
usage.total_tokensinteger
Total tokens consumed. This is the billable amount.

Error codes

CodeMeaningAction
400Bad requestCheck request body format and required fields.
401UnauthorizedCheck your API key is correct and active.
403ForbiddenYour API key does not have access to this resource.
429Rate limitedToo many requests. Implement exponential backoff.
500Server errorRetry after a brief delay. Contact support if persistent.
503Service unavailableThe service is temporarily overloaded. Retry with backoff.

Ready to integrate?

Start with the quickstart guide, or explore tool calling, JSON output, and thinking modes.