Flyte 2 is available today for local execution - distributed execution coming to open source soon. Preview Flyte 2 for production, hosted on Union.ai
2.0.9

Scaling

Package: flyte.app

Controls replica count and autoscaling behavior for app environments.

Common scaling patterns:

  • Scale-to-zero (default): Scaling(replicas=(0, 1)) — no replicas when idle, scales to 1 on demand.
  • Always-on: Scaling(replicas=(1, 1)) — exactly 1 replica at all times.
  • Burstable: Scaling(replicas=(1, 5)) — 1 replica minimum, scales up to 5.
  • High-availability: Scaling(replicas=(2, 10)) — at least 2 replicas always running.
  • Fixed size: Scaling(replicas=3) — exactly 3 replicas.

Parameters

class Scaling(
    replicas: typing.Union[int, typing.Tuple[int, int]],
    metric: typing.Union[flyte.app._types.Scaling.Concurrency, flyte.app._types.Scaling.RequestRate, NoneType],
    scaledown_after: int | datetime.timedelta | None,
)
Parameter Type Description
replicas typing.Union[int, typing.Tuple[int, int]] Number of replicas. An int for fixed count, or a (min, max) tuple for autoscaling. Default (0, 1).
metric typing.Union[flyte.app._types.Scaling.Concurrency, flyte.app._types.Scaling.RequestRate, NoneType] Autoscaling metric — Scaling.Concurrency(val) (scale when concurrent requests per replica exceeds val) or Scaling.RequestRate(val) (scale when requests per second per replica exceeds val). Default None.
scaledown_after int | datetime.timedelta | None Time to wait after the last request before scaling down. Seconds (int) or timedelta. Default None (platform default).

Methods

Method Description
get_replicas()

get_replicas()

def get_replicas()