Flash runs your Python functions on remote GPU/CPU workers while you maintain local control flow. This page explains what happens when you call anDocumentation Index
Fetch the complete documentation index at: https://runpod-b18f5ded-promptless-remove-flash-beta-notification.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
@Endpoint function.
What runs where
The@Endpoint decorator marks functions for remote execution. Everything else runs locally.
| Code | Location |
|---|---|
@Endpoint decorator | Your machine (marks function) |
Inside process_on_gpu | Runpod worker |
| Everything else | Your machine |
Flash apps
When you build a Flash app: Development (flash dev):
- FastAPI server runs locally.
@Endpointfunctions run on Runpod workers.
flash deploy):
- Each endpoint configuration becomes a separate Serverless endpoint.
- All endpoints run on Runpod.
Execution flow
Here’s what happens when you call an@Endpoint function:
Endpoint naming
Flash identifies endpoints by theirname parameter:
- Same name, same config: Reuses the existing endpoint.
- Same name, different config: Updates the endpoint automatically.
- New name: Creates a new endpoint.
workers without creating a new endpoint—Flash detects the change and updates it.
Worker lifecycle
Workers scale up and down based on demand and your configuration.Worker states
| State | Description | Billing |
|---|---|---|
| Initializing | Downloading image, loading code | Yes |
| Idle | Scaled down, waiting for requests | No |
| Running | Processing requests | Yes |
| Throttled | Temporarily unable to run due to host resource constraints | No |
| Outdated | Marked for replacement after update | Yes (while processing) |
| Unhealthy | Crashed; auto-retries for up to 7 days | No |
Scaling behavior
- First job arrives → Scale to 1 worker (cold start).
- More jobs arrive while worker busy → Scale up to max workers.
- Jobs complete → Workers stay running for
idle_timeoutseconds before scaling down to idle. - No new jobs → Scale down to min workers.
Cold starts and warm starts
Understanding cold and warm starts helps you predict latency and set expectations.Cold start
A cold start occurs when no workers are available to handle your job, because:- You’re calling an endpoint for the first time.
- All workers have been scaled down after not processing requests for
idle_timeoutseconds. - All running workers are busy processing requests.
- Runpod provisions a new worker with your configured GPU/CPU.
- The worker image starts (dependencies are pre-installed during build).
- Your function executes.
When using
flash build or flash deploy, dependencies are pre-installed in the worker image, eliminating pip installation at request time. When running standalone scripts with @Endpoint functions outside of a Flash app, dependencies may be installed on the worker at request time.Warm start
A warm start occurs when a worker is already running and idle:- Worker completed a previous job and is waiting for more work.
- Worker is within its
idle_timeoutperiod.
- Job is routed immediately to the idle worker.
- Your function executes.
The relationship between configuration and starts
Yourworkers and idle_timeout settings directly affect cold start frequency:
workers=(0, n): Workers scale to zero when not processing. Every request after theidle_timeoutperiod triggers a cold start.workers=(1, n): At least one worker stays ready. First concurrent request is warm, additional requests may cold start.- Higher
idle_timeout: Workers stay running longer before scaling down, reducing cold starts for sporadic traffic.