TheDocumentation Index
Fetch the complete documentation index at: https://runpod-b18f5ded-promptless-remove-flash-beta-notification.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
@Endpoint decorator handles most use cases, allowing you to execute arbitrary Python code remotely without managing Docker images.
However, for specialized environments that require custom Docker images, you can use Endpoint(image=...) to deploy your own Docker images.
When to use custom Docker images
Use custom Docker images when you need:- Pre-built inference servers: vLLM, TensorRT-LLM, or other specialized serving frameworks.
- System-level dependencies: Custom CUDA versions, cuDNN, or system libraries not installable via
pip. - Baked-in models: Large models pre-downloaded in the image to avoid runtime downloads.
- Existing Serverless workers: You already have a working Runpod Serverless Docker image.
Available Docker images
Official Runpod workers
Runpod provides pre-built worker images for common frameworks:| Framework | Image name | Documentation |
|---|---|---|
| vLLM | runpod/worker-vllm | vLLM docs |
| Automatic1111 | runpod/worker-a1111:stable | Docker Hub |
| ComfyUI | runpod/worker-comfy | Docker Hub |
Custom images
To create a custom Docker image:- Build a handler function to process requests.
- Create a Dockerfile to build the image.
- Push the image to a registry.
- Reference the image with
Endpoint(image=...).
Deploy a custom image
Complete example: vLLM inference
This example deploys vLLM and makes inference requests:Configuration options
All standardEndpoint parameters work with custom images:
CPU endpoints
For CPU workloads, use thecpu parameter:
Request/response format
Queue-based requests
Use.run() with a dictionary payload in the format {"input": {...}}:
HTTP requests
Use.get(), .post(), .put(), .delete() for direct HTTP calls:
EndpointJob reference
The.run() method returns an EndpointJob for async operations:
Limitations
- Input format: Queue-based calls require
{"input": {...}}format. - Code execution: Cannot execute arbitrary Python code remotely. Your Docker image must include all logic.
- @Endpoint decorator: The decorator pattern doesn’t work with
image=. Use the instance pattern instead. - Handler required: Your Docker image must implement a Runpod Serverless handler function.
Troubleshooting
Endpoint fails to initialize
Problem: Workers fail to start or crash immediately. Solutions:- Verify your Docker image is compatible with Runpod Serverless.
- Check environment variables are correct.
- Ensure the image includes a valid handler function.
- Check worker logs in the Runpod console.
Out of memory errors
Problem: Workers crash with CUDA OOM or RAM errors. Solutions:- Use a larger GPU:
gpu=GpuType.NVIDIA_A100_80GB_PCIe - Reduce
GPU_MEMORY_UTILIZATIONfor vLLM. - Lower
MAX_MODEL_LENor batch size. - Reduce
workersto limit parallel execution.
Authentication errors
Problem: Cannot download gated models or private images. Solutions:- Add
HF_TOKENtoenvfor Hugging Face gated models. - Configure Docker registry authentication in Runpod console for private images.