Cogito 671B v2.1 is Deep Cogito’s massive 671B parameter Mixture-of-Experts (MoE) language model. It uses FP8 dynamic quantization for efficient inference while maintaining high-quality outputs across reasoning, coding, and general knowledge tasks.Documentation Index
Fetch the complete documentation index at: https://runpod-b18f5ded-promptless-remove-flash-beta-notification.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Try in playground
Test Cogito 671B v2.1 in the Runpod Hub playground.
| Endpoint | https://api.runpod.ai/v2/cogito-671b-v2-1-fp8-dynamic/runsync |
| Pricing | $0.50 per 1M tokens |
| Type | Text generation |
This endpoint is fully compatible with the OpenAI API. See the OpenAI compatibility examples below.
Request
All parameters are passed within theinput object in the request body.
Prompt for text generation.
Maximum number of tokens to output.
Randomness of the output. Lower values make output more predictable and deterministic. Range: 0.0-1.0.
Nucleus sampling threshold. Samples from the smallest set of words whose cumulative probability exceeds this threshold.
Restricts sampling to the top K most probable words.
Stops generation if the given string is encountered.
Response
Unique identifier for the request.
Request status. Returns
COMPLETED on success, FAILED on error.Time in milliseconds the request spent in queue before processing began.
Time in milliseconds the model took to generate the response.
OpenAI API compatibility
Cogito 671B v2.1 is fully compatible with the OpenAI API format. You can use the OpenAI Python client to interact with this endpoint.Python (OpenAI SDK)
stream=True:
Python (Streaming)
Cost calculation
Cogito 671B v2.1 charges $0.50 per 1M tokens. Example costs:| Tokens | Cost |
|---|---|
| 1,000 tokens | $0.0005 |
| 10,000 tokens | $0.005 |
| 100,000 tokens | $0.05 |
| 1,000,000 tokens | $0.50 |