Documentation Index
Fetch the complete documentation index at: https://runpod-b18f5ded-promptless-remove-flash-beta-notification.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Runpod offers custom pricing plans for large scale and enterprise workloads. Contact our sales team to learn more.
Serverless offers pay-per-second pricing with no upfront costs. You’re billed from when a worker starts until it fully stops, rounded up to the nearest second.
Worker types
| Flex workers | Active workers |
|---|
| Behavior | Scale to zero when idle | Always running (24/7) |
| Pricing | Standard per-second rate | 20–30% discount |
| Best for | Variable workloads, cost optimization | Consistent traffic, low-latency requirements |
GPU pricing
| GPU type(s) | Memory | Flex cost per second | Active cost per second | Description |
|---|
| A4000, A4500, RTX 4000 | 16 GB | $0.00016 | $0.00011 | The most cost-effective for small models. |
| 4090 PRO | 24 GB | $0.00031 | $0.00021 | Extreme throughput for small-to-medium models. |
| L4, A5000, 3090 | 24 GB | $0.00019 | $0.00013 | Great for small-to-medium sized inference workloads. |
| L40, L40S, 6000 Ada PRO | 48 GB | $0.00053 | $0.00037 | Extreme inference throughput on LLMs like Llama 3 7B. |
| A6000, A40 | 48 GB | $0.00034 | $0.00024 | A cost-effective option for running big models. |
| H100 PRO | 80 GB | $0.00116 | $0.00093 | Extreme throughput for big models. |
| A100 | 80 GB | $0.00076 | $0.00060 | High throughput GPU, yet still very cost-effective. |
| H200 PRO | 141 GB | $0.00155 | $0.00124 | Extreme throughput for huge models. |
| B200 | 180 GB | $0.00240 | $0.00190 | Maximum throughput for huge models. |
For the latest pricing, visit the Runpod pricing page.
What you’re billed for
Your total cost includes compute time and storage:
| Cost component | Description | Rate |
|---|
| Compute | GPU time while workers run | See pricing table above |
| Container disk | Worker storage (5-min intervals) | ~$0.10/GB/month |
| Network volume | Shared persistent storage | $0.07/GB/month (< 1TB), $0.05/GB/month (> 1TB) |
Compute cost breakdown
Workers incur charges during three phases:
-
Start time: Initializing the container and loading models into GPU memory. Minimize with FlashBoot or model caching.
-
Execution time: Processing requests. Set execution timeouts to prevent runaway jobs.
-
Idle timeout duration: The time a worker remains active (running) after completing a request, waiting for additional requests before scaling down (default: 5 seconds). Configure in endpoint settings.
For high-volume workloads with significant storage needs, use network volumes to share data across workers and reduce per-worker storage costs.
Account limits
Spend limit: Default limit of $80/hour across all resources. Contact support to increase.
Billing support
If you believe you’ve been billed incorrectly, contact support, including the following information in your ticket:
- Endpoint ID
- Request ID (if applicable)
- Approximate time of the issue