API Reference

Complete reference for the chat completions endpoint. OpenAI-compatible request and response format.

POSThttps://api.continuum.au/v1/chat/completions

Creates a model response for the given chat conversation.

Authentication

All requests require an API key sent via the Authorization header.

Header	Value	Required
`Authorization`	Bearer YOUR_API_KEY	Yes
`Content-Type`	application/json	Yes

Request body

modelstringrequired

ID of the model to use. Currently available: deepseek-v4-flash

messagesarrayrequired

A list of messages comprising the conversation. Each message is an object with role and content fields. Supported roles: system, user, assistant, tool.

temperaturenumberDefault: 1

Sampling temperature between 0 and 2. Higher values (0.8) produce more random output. Lower values (0.2) produce more focused output. Not available in thinking mode.

top_pnumberDefault: 1

Nucleus sampling. 0.1 means only tokens comprising the top 10% probability mass are considered. Alter this or temperature, not both. Not available in thinking mode.

max_tokensinteger

Maximum number of tokens to generate. The total of input + output tokens is limited by the model context length (1,048,576 tokens).

streambooleanDefault: false

If true, partial message deltas are sent as server-sent events (SSE). The stream is terminated by a data: [DONE] message. See the streaming guide.

toolsarray

A list of tools the model may call. Currently only functions are supported. Max 128 tools per request. Each tool requires type (always "function"), function.name, and function.parameters (JSON Schema). See the tool calling guide.

tool_choicestring | objectDefault: auto

Controls tool selection. none prevents tool calls. auto lets the model decide. required forces a tool call. Specify a tool with {"type": "function", "function": {"name": "my_fn"}}.

response_formatobject

Set to {"type": "json_object"} to enable JSON output mode. You must also include the word "json" in your system or user message. See the JSON output guide.

thinkingobject

Controls reasoning mode. {"type": "enabled"} activates thinking mode. {"type": "disabled"} uses standard mode. See the thinking modes guide.

reasoning_effortstring

Sets the depth of reasoning when thinking mode is enabled. high for extended reasoning. max for maximum reasoning budget (recommend 384K+ context).

stopstring | array

Up to 16 sequences where the API will stop generating. The returned text will not contain the stop sequence.

frequency_penaltynumberDefault: 0

Number between -2.0 and 2.0. Positive values penalise tokens based on their frequency in the text so far. Not available in thinking mode.

presence_penaltynumberDefault: 0

Number between -2.0 and 2.0. Positive values penalise tokens based on whether they appear in the text so far. Not available in thinking mode.

Response

response.json

{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1714387200, "model": "deepseek-v4-flash", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The capital of Australia is Canberra.", "reasoning_content": null, "tool_calls": null }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 24, "completion_tokens": 9, "total_tokens": 33 } }

Response fields

idstring

Unique identifier for the completion.

choicesarray

Array of completion choices. Standard requests return one choice.

choices[].message.contentstring | null

The model response text. Null when the model makes a tool call.

choices[].message.reasoning_contentstring | null

The chain-of-thought reasoning. Only present when thinking mode is enabled. Null otherwise.

choices[].message.tool_callsarray | null

Tool calls requested by the model. Each contains id, function.name, and function.arguments (JSON string).

choices[].finish_reasonstring

stop — natural completion. length — hit max_tokens. tool_calls — model wants to call a tool. content_filter — content filtered.

usage.prompt_tokensinteger

Number of tokens in the input messages.

usage.completion_tokensinteger

Number of tokens generated (includes reasoning tokens if thinking mode is enabled).

usage.total_tokensinteger

Total tokens consumed. This is the billable amount.

Error codes

Code	Meaning	Action
`400`	Bad request	Check request body format and required fields.
`401`	Unauthorized	Check your API key is correct and active.
`403`	Forbidden	Your API key does not have access to this resource.
`429`	Rate limited	Too many requests. Implement exponential backoff.
`500`	Server error	Retry after a brief delay. Contact support if persistent.
`503`	Service unavailable	The service is temporarily overloaded. Retry with backoff.

Ready to integrate?

Start with the quickstart guide, or explore tool calling, JSON output, and thinking modes.

Quickstart All documentation