Frontier AI inference. Sovereign. 80% cheaper.
Open-weight models on Australian infrastructure. OpenAI-compatible API. Your data never leaves the country. One line of code to switch.
One line to switch
If you are using the OpenAI SDK, migration is a single line change. Your prompts, tools, and response handling stay identical.
Why Continuum Inference
Sovereign
Australian infrastructure. No data leaves the country. ISO 27001 and SOC 2 compliant hosting. Meets APRA CPS 234, SOCI Act, and PSPF requirements.
80% cheaper
Frontier-class models at AUD $3–4 per million tokens. Anthropic charges $15–25. OpenAI charges $10–30. Same capability, fraction of the cost.
Drop-in compatible
OpenAI-compatible API. Change your base_url, keep your code. Function calling, JSON mode, streaming, thinking modes. Zero re-architecture.
Frontier models, benchmarked
DeepSeek V4-Flash delivers 85–95% of frontier closed-model capability at a fraction of the cost. Open-weight. MIT licensed. 1M token context.
DeepSeek V4-Flash
AvailableMIT Licence284B total parameters, 13B active per token. MoE architecture. 1M native context.
Benchmark comparison
| Benchmark | V4-Flash | Claude Opus 4.6 | GPT-5.4 |
|---|---|---|---|
| SWE-bench Verified | 79.0% | 80.8% | ~78% |
| LiveCodeBench | 91.6% | 88.8% | — |
| MMLU-Pro | Strong | Strong | Strong |
| Context Window | 1M | ~200K | 1M |
The cost comparison
Same API format. Same capability tier. Dramatically different price.
Anthropic
$15–25
AUD per 1M tokens
- Frontier capability
- US infrastructure
- Proprietary
Continuum
$3–4
AUD per 1M tokens
- Frontier-adjacent
- Australian sovereign
- MIT licensed
OpenAI
$10–30
AUD per 1M tokens
- Frontier capability
- US infrastructure
- Proprietary
A team spending $20,000/month on Anthropic Claude
saves $192,000 per year
with equivalent capability on sovereign Australian infrastructure
Three steps. Five minutes.
Get an API key
Sign up and receive your API key immediately. No credit card required for the first 1M tokens.
Change your base_url
Point your existing OpenAI SDK client at api.continuum.au. One line of code. Everything else stays the same.
Start saving
Your existing application works immediately. Same request format, same response format, 80% lower cost.
Full API feature parity
Function calling
Up to 128 tools per request. Parallel calls. Strict schema validation. OpenAI-compatible format.
Streaming
Server-sent events streaming. Identical to OpenAI streaming format. Real-time token delivery.
JSON structured output
Native JSON mode via response_format. Guaranteed valid JSON for structured extraction pipelines.
Thinking modes
Three reasoning levels: Non-think (fast), Think High (analytical), Think Max (deep reasoning). Per-request.
1M token context
Native million-token context window. No sliding window. No special configuration required.
Stateless inference
No prompts or completions stored. No training on your data. Process and forget. Zero data retention.
Built on proven infrastructure
Ready to cut your inference bill?
Same models. Same API. Sovereign infrastructure. 80% cheaper. Start in five minutes.