Troubleshooting

Deployment issues

Worker fails to start

If your worker fails to start or initialize:

Check logs: View endpoint logs in the Runpod console for error messages.
Verify local testing: Ensure your handler works in local testing before deploying.
Check dependencies: Verify all dependencies are installed in your Docker image.
GPU compatibility: Ensure your Docker image is compatible with the selected GPU type.
Input format: Verify your input format matches what your handler expects.

Worker initializes but fails on requests

Issue	Solution
Input validation errors	Add input validation in your handler and check logs for the expected format
Missing dependencies	Verify all required packages are in your Dockerfile
Model loading failures	Check GPU memory requirements and model path
Permission errors	Ensure files are readable and directories are writable

Job issues

Jobs stuck in queue

If jobs remain IN_QUEUE for extended periods:

No workers available: Check if max_workers is set appropriately.
Workers throttled: Your endpoint may be hitting rate limits. Check the Workers tab for throttled workers.
Cold start delays: First requests after idle periods require worker initialization. Consider increasing min_workers or enabling FlashBoot.

Jobs timing out

Cause	Solution
Processing takes too long	Increase `executionTimeout` in your job policy
Model loading too slow	Use model caching or bake models into your image
TTL too short	Set `ttl` to cover both queue time and execution time

Jobs failing

Check the job status response for error details. Common causes:

Handler exceptions: Unhandled exceptions in your handler code. Add try/catch blocks and return structured errors.
OOM (Out of Memory): Model or batch size exceeds GPU memory. Reduce batch size or use a larger GPU.
Timeout: Job exceeded execution timeout. Increase timeout or optimize processing.

Cold start issues

Slow cold starts

Cold start time includes container startup, model loading, and initialization. To reduce cold starts:

Use model caching: Store models on network volumes instead of downloading on each start.
Enable FlashBoot: Use FlashBoot for faster container initialization.
Optimize image size: Use smaller base images and remove unnecessary dependencies.
Initialize outside handler: Load models at module level, not inside the handler function.

# Good: Load model once at startup
model = load_model()

def handler(job):
    return model.predict(job["input"])

# Bad: Load model on every request
def handler(job):
    model = load_model()  # Slow!
    return model.predict(job["input"])

Too many cold starts

If you’re seeing frequent cold starts:

Increase idle timeout: Set a longer idle_timeout to keep workers warm between requests.
Set minimum workers: Configure min_workers > 0 to maintain warm workers.
Check traffic patterns: Sporadic traffic causes more cold starts than steady traffic.

Logging issues

Missing logs

If logs aren’t appearing in the console:

Check throttling: Excessive logging triggers throttling. Reduce log verbosity.
Verify output streams: Ensure you’re writing to stdout/stderr, not just files.
Check worker status: Logs only appear for successfully initialized workers.
Retention period: Logs older than 90 days are automatically removed.

Log throttling

To avoid log throttling:

Reduce log verbosity in production.
Use structured logging for efficiency.
Store detailed logs on network volumes instead of console output.

vLLM-specific issues

OOM errors

If your vLLM worker runs out of memory:

Lower GPU_MEMORY_UTILIZATION from 0.90 to 0.85.
Reduce MAX_MODEL_LEN to limit context window.
Use a GPU with more VRAM.

Model not loading

Issue	Solution
Model not found	Verify `MODEL_NAME` matches the Hugging Face model ID exactly
Gated model access denied	Set `HF_TOKEN` with a token that has access to the model
Incompatible model	Check vLLM supported models

OpenAI API errors

Error	Cause	Solution
401 Unauthorized	Invalid API key	Verify `RUNPOD_API_KEY` is correct
404 Not Found	Wrong endpoint URL	Use the format `https://api.runpod.ai/v2/ENDPOINT_ID/openai/v1`
Connection refused	Endpoint not ready	Wait for workers to initialize

Load balancing endpoint issues

”No workers available” error

This means workers didn’t initialize in time. Common causes:

First request: Workers need time to start. Retry the request. (See Handling cold starts for more information.)
All workers busy: Increase max_workers to handle more concurrent requests.
Workers crashing: Check logs for initialization errors.

Requests not reaching workers

Verify your HTTP server is:

Listening on port 8000 (or the port specified in your configuration).
Binding to 0.0.0.0, not 127.0.0.1.
Returning proper HTTP responses.

Getting help

If you’re still experiencing issues:

Check endpoint logs for detailed error messages.
SSH into workers using SSH access to debug in real-time.
Review metrics in the Metrics tab to identify patterns.
Contact support at help@runpod.io with your endpoint ID and error details.

Get started

Flash

Serverless

Pods

Storage

Public Endpoints

Instant Clusters

Accounts and billing

Integrations

Hub

Reference

Deployment issues

Worker fails to start

Worker initializes but fails on requests

Job issues

Jobs stuck in queue

Jobs timing out

Jobs failing

Cold start issues

Slow cold starts

Too many cold starts

Logging issues

Missing logs

Log throttling

vLLM-specific issues

OOM errors

Model not loading

OpenAI API errors

Load balancing endpoint issues

”No workers available” error

Requests not reaching workers

Getting help

Get started

Flash

Serverless

Pods

Storage

Public Endpoints

Instant Clusters

Accounts and billing

Integrations

Hub

Reference

Documentation Index

​Deployment issues

​Worker fails to start

​Worker initializes but fails on requests

​Job issues

​Jobs stuck in queue

​Jobs timing out

​Jobs failing

​Cold start issues

​Slow cold starts

​Too many cold starts

​Logging issues

​Missing logs

​Log throttling

​vLLM-specific issues

​OOM errors

​Model not loading

​OpenAI API errors

​Load balancing endpoint issues

​”No workers available” error

​Requests not reaching workers

​Getting help

Deployment issues

Worker fails to start

Worker initializes but fails on requests

Job issues

Jobs stuck in queue

Jobs timing out

Jobs failing

Cold start issues

Slow cold starts

Too many cold starts

Logging issues

Missing logs

Log throttling

vLLM-specific issues

OOM errors

Model not loading

OpenAI API errors

Load balancing endpoint issues

”No workers available” error

Requests not reaching workers

Getting help