Life of a run
Understanding what happens when you invoke flyte.run() is crucial for optimizing workflow performance and debugging issues. This guide walks through each phase of task execution from submission to completion.
Overview
When you execute flyte.run(), the system goes through several phases:
- Code analysis and preparation: Discover environments and images
- Image building: Build container images if changes are detected
- Code bundling: Package your Python code
- Upload: Transfer the code bundle to object storage
- Run creation: Submit the run to the backend
- Task execution: Execute the task in the data plane
- State management: Track and persist execution state
Phase 1: Code analysis and preparation
When flyte.run() is invoked:
- Environment discovery: Flyte analyzes your code and finds all relevant
flyte.TaskEnvironmentinstances by walking thedepends_onhierarchy. - Image identification: Discovers unique
flyte.Imageinstances used across all environments. - Image building: Starts the image building process. Images are only built if a change is detected.
If you invoke flyte.run() multiple times within the same Python process without changing code (such as in a notebook or script), the code bundling and image building steps are done only once. This can dramatically speed up iteration.
Phase 2: Image building
Container images provide the runtime environment for your tasks:
- Change detection: Images are only rebuilt if changes are detected in dependencies or configuration.
- Caching: Previously built images are reused when possible.
- Parallel builds: Multiple images can be built concurrently.
For more details on container images, see Container Images.
Phase 3: Code bundling
After images are built, your project files are bundled:
Default: copy_style="loaded_modules"
By default, all Python modules referenced by the invoked tasks through module-level import statements are automatically copied. This provides a good balance between completeness and efficiency.
Alternative: copy_style="none"
Skip bundling by setting copy_style="none" in flyte.with_runcontext() and adding all code into flyte.Image:
# Add code to image
image = flyte.Image().with_source_code("/path/to/code")
# Or use Dockerfile
image = flyte.Image.from_dockerfile("Dockerfile")
# Skip bundling
run = flyte.with_runcontext(copy_style="none").run(my_task, input_data=data)For more details on code packaging, see Packaging.
Phase 4: Upload code bundle
Once the code bundle is created:
- Negotiate signed URL: The SDK requests a signed URL from the backend.
- Upload: The code bundle is uploaded to the signed URL location in object storage.
- Reference stored: The backend stores a reference to the uploaded bundle.
Phase 5: Run creation and queuing
The CreateRun API is invoked:
- Copy inputs: Input data is copied to the object store.
- En-queue a run: The run is queued into the Union Control Plane.
- Hand off to executor: Union Control Plane hands the task to the Executor Service in your data plane.
- Create action: The parent task action (called
a0) is created.
Phase 6: Task execution in data plane
Container startup
- Container starts: The task container starts in your data plane.
- Download code bundle: The Flyte runtime downloads the code bundle from object storage.
- Inflate task: The task is inflated from the code bundle.
- Download inputs: Inline inputs are downloaded from the object store.
- Execute task: The task is executed with context and inputs.
Invoking downstream tasks
If the task invokes other tasks:
- Controller thread: A controller thread starts to communicate with the backend Queue Service.
- Monitor status: The controller monitors the status of downstream actions.
- Crash recovery: If the task crashes, the action identifier is deterministic, allowing the task to resurrect its state from Union Control Plane.
- Replay: The controller efficiently replays state (even at large scale) to find missing completions and resume monitoring.
Execution flow diagram
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#1f2937', 'primaryTextColor':'#e5e7eb', 'primaryBorderColor':'#6b7280', 'lineColor':'#9ca3af', 'secondaryColor':'#374151', 'tertiaryColor':'#1f2937', 'actorBorder':'#6b7280', 'actorTextColor':'#e5e7eb', 'signalColor':'#9ca3af', 'signalTextColor':'#e5e7eb'}}}%%
sequenceDiagram
participant Client as SDK/Client
participant Control as Control Plane<br/>(Queue Service)
participant Data as Data Plane<br/>(Executor)
participant ObjStore as Object Store
participant Container as Task Container
Client->>Client: Analyze code & discover environments
Client->>Client: Build images (if changed)
Client->>Client: Bundle code
Client->>Control: Upload code bundle
Control->>Data: Store code bundle
Data->>ObjStore: Write code bundle
Client->>Control: CreateRun API with inputs
Control->>Data: Copy inputs
Data->>ObjStore: Write inputs
Control->>Data: Queue task (create action a0)
Data->>Container: Start container
Container->>Data: Request code bundle
Data->>ObjStore: Read code bundle
ObjStore-->>Data: Code bundle
Data-->>Container: Code bundle
Container->>Container: Inflate task
Container->>Data: Request inputs
Data->>ObjStore: Read inputs
ObjStore-->>Data: Inputs
Data-->>Container: Inputs
Container->>Container: Execute task
alt Invokes downstream tasks
Container->>Container: Start controller thread
Container->>Control: Submit downstream tasks
Control->>Data: Queue downstream actions
Container->>Control: Monitor downstream status
Control-->>Container: Status updates
end
Container->>Data: Upload outputs
Data->>ObjStore: Write outputs
Container->>Control: Complete
Control-->>Client: Run complete
Action identifiers and crash recovery
Flyte uses deterministic action identifiers to enable robust crash recovery:
- Consistent identifiers: Action identifiers are consistently computed based on task and invocation context.
- Re-run identical: In any re-run, the action identifier is identical for the same invocation.
- Multiple invocations: Multiple invocations of the same task receive unique identifiers.
- Efficient resurrection: On crash, the
a0action resurrects its state from Union Control Plane efficiently, even at large scale. - Replay and resume: The controller replays execution until it finds missing completions and starts watching them.
Downstream task execution
When downstream tasks are invoked:
- Action creation: Downstream actions are created with unique identifiers.
- Queue assignment: Actions are handed to an executor, which can be selected using a queue or from the general pool.
- Parallel execution: Multiple downstream tasks can execute in parallel.
- Result aggregation: Results are aggregated and returned to the parent task.
Reusable containers
When using reusable containers, the execution model changes:
- Environment spin-up: The container environment is first spun up with configured replicas.
- Task allocation: Tasks are allocated to available replicas in the environment.
- Scaling: If all replicas are busy, new replicas are spun up (up to the configured maximum), or tasks are backlogged in queues.
- Container reuse: The same container handles multiple task executions, reducing startup overhead.
- Lifecycle management: Containers are managed according to
ReusePolicysettings (idle_ttl,scaledown_ttl, etc.).
Reusable container execution flow
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#1f2937', 'primaryTextColor':'#e5e7eb', 'primaryBorderColor':'#6b7280', 'lineColor':'#9ca3af', 'secondaryColor':'#374151', 'tertiaryColor':'#1f2937', 'actorBorder':'#6b7280', 'actorTextColor':'#e5e7eb', 'signalColor':'#9ca3af', 'signalTextColor':'#e5e7eb'}}}%%
sequenceDiagram
participant Control as Queue Service
participant Executor as Executor Service
participant Pool as Container Pool
participant Replica as Container Replica
Control->>Executor: Submit task
alt Reusable containers enabled
Executor->>Pool: Request available replica
alt Replica available
Pool->>Replica: Allocate task
Replica->>Replica: Execute task
Replica->>Pool: Task complete (ready for next)
else No replica available
alt Can scale up
Executor->>Pool: Create new replica
Pool->>Replica: Spin up new container
Replica->>Replica: Execute task
Replica->>Pool: Task complete
else At max replicas
Executor->>Pool: Queue task
Pool-->>Executor: Wait for available replica
Pool->>Replica: Allocate when available
Replica->>Replica: Execute task
Replica->>Pool: Task complete
end
end
else No reusable containers
Executor->>Replica: Create new container
Replica->>Replica: Execute task
Replica->>Executor: Complete & terminate
end
Replica-->>Control: Return results
State replication and visualization
Queue Service to Run Service
- Reliable replication: Queue Service reliably replicates execution state back to Run Service.
- Eventual consistency: The Run Service may be slightly behind the actual execution state.
- Visualization: Run Service paints the entire run onto the UI.
UI limitations
- Current limit: The UI is currently limited to displaying 50k actions per run.
- Future improvements: This limit will be increased in future releases. Contact the Union team if you need higher limits.
Optimization opportunities
Understanding the life of a run reveals several optimization opportunities:
- Reuse Python process: Run
flyte.run()multiple times in the same process to avoid re-bundling code. - Skip bundling: Use
copy_style="none"and bake code into images for faster startup. - Reusable containers: Use reusable containers to eliminate container startup overhead.
- Parallel execution: Invoke multiple downstream tasks concurrently using
flyte.map()orasyncio. - Efficient data flow: Minimize data transfer by using reference types (files, directories) instead of inline data.
- Caching: Enable task caching to avoid redundant computation.
For detailed performance tuning guidance, see Scale your workflows.