Thinking modes
Three levels of reasoning effort. Trade speed for depth on a per-request basis. Same model, different reasoning budgets.
Choose your reasoning level
Non-think
Default (no extra params)
Fast, direct responses. No chain-of-thought reasoning. Best for classification, extraction, routing, simple Q&A, and high-throughput batch processing.
Use when: Speed matters more than reasoning depth
Lowest token consumption
Think High
reasoning_effort: "high"
Extended chain-of-thought reasoning before generating a response. Best for analysis, complex coding, research tasks, and multi-step problem solving.
Use when: The task requires logical analysis
Moderate token overhead
Think Max
reasoning_effort: "max"
Maximum reasoning budget. The model will think extensively before answering. Best for competitive programming, deep research, and the hardest analytical problems.
Use when: Quality matters more than speed or cost
Highest token consumption
Enabling thinking mode
Pass the thinking and reasoning_effort parameters in the request body. When thinking is enabled, the model generates a chain-of-thought before producing the final answer.
Accessing the reasoning chain
When thinking mode is enabled, the response includes a reasoning_content field alongside the standard content field. The reasoning chain shows how the model arrived at its answer.
Multi-turn conversations
How you handle reasoning_content in multi-turn conversations depends on whether the turn involved tool calls.
Turns without tool calls
The reasoning_content from the assistant message does not need to be passed back. If you include it, the API will ignore it. You can safely strip it from the conversation history.
Turns with tool calls
The reasoning_content must be passed back in subsequent requests. If your code strips it, the API returns a 400 error.
Token billing
Thinking tokens are billed at the same per-token rate as output tokens. The difference is volume: Think High and Think Max generate additional reasoning tokens that increase the total output.
Approximate token multipliers
Exact multipliers vary by task complexity. Monitor usage.completion_tokens in responses to track reasoning token consumption.
Parameter constraints
Certain parameters are not available when thinking mode is enabled. Setting them will not produce an error but will have no effect.
| Parameter | Non-think | Think High/Max |
|---|---|---|
temperature | ✓ | ✗ (ignored) |
top_p | ✓ | ✗ (ignored) |
frequency_penalty | ✓ | ✗ (ignored) |
presence_penalty | ✓ | ✗ (ignored) |
max_tokens | ✓ | ✓ |
tools | ✓ | ✓ |
response_format | ✓ | ✓ |
stream | ✓ | ✓ |
Think Max context: For Think Max mode, DeepSeek recommends allocating at least 384K tokens to the context window. This ensures the model has sufficient space for extended reasoning chains. The model will automatically manage its reasoning budget within the available context.
Start reasoning
Add thinking mode to your existing requests with two parameters. No code changes beyond that.