2.0.9

ReusePolicy

Package: flyte

Configure a task environment for container reuse across multiple task invocations.

When environment creation is expensive relative to task runtime, reusable containers keep a pool of warm containers ready, avoiding cold-start overhead. The Python process may be reused by subsequent task invocations.

Total concurrent capacity is max_replicas * concurrency. For example, ReusePolicy(replicas=(1, 3), concurrency=2) supports up to 6 concurrent tasks.

Caution: The environment is shared across invocations — manage memory and resources carefully.

Example:

env = flyte.TaskEnvironment(
    name="fast_env",
    reusable=flyte.ReusePolicy(replicas=(1, 3), concurrency=2),
)

Parameters

class ReusePolicy(
    replicas: typing.Union[int, typing.Tuple[int, int]],
    idle_ttl: typing.Union[int, datetime.timedelta],
    concurrency: int,
    scaledown_ttl: typing.Union[int, datetime.timedelta],
)
Parameter Type Description
replicas typing.Union[int, typing.Tuple[int, int]] Number of container replicas to maintain. - int: Fixed replica count, always running (e.g., replicas=3). - tuple(min, max): Auto-scaling range (e.g., replicas=(1, 5)). Scales between min and max based on demand. Default is 2. A minimum of 2 replicas is recommended to avoid starvation when the parent task occupies one replica.
idle_ttl typing.Union[int, datetime.timedelta] Environment-level idle timeout — shuts down all replicas when the entire environment has been idle for this duration. Specified as seconds (int) or timedelta. Minimum 30 seconds. Default is 30 seconds.
concurrency int Maximum concurrent tasks per replica. Values greater than 1 are only supported for async tasks. Default is 1.
scaledown_ttl typing.Union[int, datetime.timedelta] Per-replica scale-down delay — minimum time to wait before removing an individual idle replica. Prevents rapid scale-down when tasks arrive in bursts. Specified as seconds (int) or timedelta. Default is 30 seconds. Note the distinction: idle_ttl controls when the whole environment shuts down; scaledown_ttl controls when individual replicas are removed during auto-scaling.

Properties

Property Type Description
max_replicas None Returns the maximum number of replicas.
min_replicas None Returns the minimum number of replicas.

Methods

Method Description
get_scaledown_ttl() Returns the scaledown TTL as a timedelta.

get_scaledown_ttl()

def get_scaledown_ttl()

Returns the scaledown TTL as a timedelta. If scaledown_ttl is not set, returns None.