Hydra
Hydra is a framework for composing and overriding configuration trees from YAML files, dataclasses and the command line. The flyteplugins-hydra plugin makes Hydra a first-class submission layer for Flyte, so you can compose a config exactly as you would in any other Hydra app and have each composed run executed as a Flyte task, locally or as a remote execution on a Union.ai cluster.
The plugin offers three complementary entry points that share a single launcher implementation:
| Entry point | Use it when |
|---|---|
hydra/launcher=flyte (Hydra Launcher plugin) |
You already have a @hydra.main script and want standard Hydra CLI ergonomics, including --multirun and custom sweepers. |
flyte hydra run (Flyte CLI extension) |
You want a Flyte-style CLI that imports a task from a Python file and composes a Hydra config without requiring a @hydra.main wrapper. |
hydra_run / hydra_sweep (Python SDK) |
You want to submit runs directly from Python – notebooks, tests, examples or another orchestration script. |
All three paths converge on the same FlyteLauncher.
Installation
pip install flyteplugins-hydraThe plugin depends on flyteplugins-omegaconf, which is installed automatically and provides the DictConfig/ListConfig type transformers that allow Hydra-composed configs to flow into Flyte tasks. Both packages must be available in the same environment as flyte.
If you call apply_task_env for child tasks (see
Task environment overrides), include flyteplugins-hydra in the task image as well.
Requirements on tasks
Every task launched through this plugin must accept an OmegaConf DictConfig input. Any other parameters are passed through as ordinary task arguments.
from omegaconf import DictConfig
@env.task
async def pipeline(cfg: DictConfig, dataset: str) -> float:
...The plugin auto-detects the DictConfig parameter name. If your parameter is cfg, app-level overrides are passed through --cfg on the CLI; if it is config, they are passed through --config; and so on.
A walkthrough config
The examples in this page assume a small project layout:
project/
├── train.py
└── conf/
├── training.yaml
├── model/
│ ├── resnet.yaml
│ └── vit.yaml
├── optimizer/
│ ├── adam.yaml
│ └── sgd.yaml
└── task_env/
├── a100.yaml
└── prebuilt_image.yamlconf/training.yaml:
defaults:
- optimizer: adam
- model: resnet
- _self_
data:
path: s3://my-bucket/imagenet
dataset: imagenet
training:
epochs: 30
batch_size: 64train.py (abbreviated):
import flyte
from omegaconf import DictConfig
from flyteplugins.hydra import apply_task_env
env = flyte.TaskEnvironment(name="training", image=...)
@env.task
async def preprocess(cfg: DictConfig) -> flyte.io.Dir: ...
@env.task
async def train_model(cfg: DictConfig, data: flyte.io.Dir) -> tuple[flyte.io.Dir, float]: ...
@env.task
async def pipeline(cfg: DictConfig, dataset: str) -> float:
data = await preprocess(cfg)
train_task = apply_task_env(train_model, cfg)
_, val_loss = await train_task(cfg, data)
return val_lossThe same pipeline task is the target of every example below.
config_path is resolved relative to the current working directory. If you submit runs from a directory other than project/, pass an absolute path (or an absolute path on the CLI via --config-path /abs/path/to/conf). For structured-config-only setups (no YAML files), omit config_path / --config-path entirely.Execution mode
Remote execution is the default. Every entry point exposes an explicit knob:
| Surface | Local | Remote |
|---|---|---|
@hydra.main launcher |
hydra.launcher.mode=local |
hydra.launcher.mode=remote (default) |
flyte hydra run |
--local |
--mode remote (default) |
| Python SDK | mode="local" |
mode="remote" (default) |
For the @hydra.main launcher, the default applies as soon as hydra/launcher=flyte is selected.
Remote runs print the Flyte run URL immediately after submission, before any waiting. By default the plugin then waits for every submitted run to reach a terminal phase, capped at 32 worker threads. To tune or disable waiting:
| Surface | Tune wait threads | Fire and forget |
|---|---|---|
@hydra.main launcher |
hydra.launcher.wait_max_workers=64 |
hydra.launcher.wait=false |
flyte hydra run |
--wait-max-workers 64 |
--no-wait |
| Python SDK | wait_max_workers=64 |
wait=False |
For a sweep, every job is submitted first, and then the plugin waits for all runs concurrently. Submission is not blocked by earlier runs reaching a terminal phase.
Hydra launcher (@hydra.main scripts)
Use this path when your script already has a @hydra.main entry point. Selecting hydra/launcher=flyte swaps Hydra’s built-in BasicLauncher for FlyteLauncher.
Single remote run:
python train.py hydra/launcher=flyte hydra.launcher.mode=remoteSingle local run:
python train.py hydra/launcher=flyte hydra.launcher.mode=localRemote grid sweep submission: Each comma-separated value expands into a separate Flyte execution; six executions in this example:
python train.py --multirun \
hydra/launcher=flyte hydra.launcher.mode=remote \
hydra.launcher.wait_max_workers=64 \
optimizer.lr=0.001,0.01,0.1 training.epochs=10,20Fire-and-forget sweep submission:
python train.py --multirun \
hydra/launcher=flyte hydra.launcher.wait=false \
optimizer.lr=0.001,0.01,0.1Custom sweepers (Optuna) work exactly as they do with the BasicLauncher. Selecting hydra/sweeper=... activates the sweeper and FlyteLauncher runs each trial as a Flyte execution:
python train.py --multirun \
hydra/launcher=flyte hydra.launcher.mode=remote \
hydra/sweeper=optuna hydra.sweeper.n_trials=20 \
hydra.sweeper.n_jobs=4 \
"optimizer.lr=interval(1e-4,1e-1)"Inside @hydra.main, the standard pattern is:
import flyte
import hydra
from omegaconf import DictConfig
from flyteplugins.hydra import apply_task_env
@hydra.main(version_base=None, config_path="conf", config_name="training")
def main(cfg: DictConfig):
flyte.init_from_config()
entry_task = apply_task_env(pipeline, cfg)
return flyte.run(entry_task, cfg=cfg, dataset=cfg.data.dataset)
if __name__ == "__main__":
main()Python SDK
hydra_run composes one config and runs the task once. hydra_sweep expands sweep overrides and runs the task once per combination.
Single run
from flyteplugins.hydra import hydra_run
run = hydra_run(
pipeline,
config_path="conf",
config_name="training",
overrides=["optimizer.lr=0.01"],
dataset="s3://my-bucket/imagenet",
mode="remote",
wait=True,
wait_max_workers=64,
)For a remote run with wait=True, the return value is a wrapper exposing both run.url and run.value (the resolved task output). The wrapper is float()-castable so Hydra sweepers such as Optuna can consume scalar objectives directly. With wait=False, the return value is the underlying flyte.remote.Run.
Grid sweep
from flyteplugins.hydra import hydra_sweep
runs = hydra_sweep(
pipeline,
config_path="conf",
config_name="training",
overrides=["optimizer.lr=0.001,0.01,0.1", "training.epochs=10,20"],
dataset="s3://my-bucket/imagenet",
mode="remote",
)Six executions are submitted (3 × 2). runs is a list aligned with the Cartesian-product order Hydra’s BasicSweeper produces.
Custom sweepers
Custom sweeper plugins are activated by passing their selection in overrides:
runs = hydra_sweep(
pipeline,
config_path="conf",
config_name="training",
overrides=[
"hydra/sweeper=optuna",
"hydra.sweeper.n_trials=20",
"hydra.sweeper.n_jobs=4",
"optimizer.lr=interval(1e-4,1e-1)",
],
dataset="s3://my-bucket/imagenet",
mode="remote",
)Whenever an override starts with hydra/, the plugin invokes the full Hydra runtime so plugin discovery (sweepers, launchers, callbacks) can run. Pure value overrides on the hydra.* namespace (for example hydra.run.dir=...) do not need the full runtime and are applied per-job by the launcher directly.
Forwarding flyte.with_runcontext options
Use run_options to pass Flyte runtime options through to every job:
runs = hydra_sweep(
pipeline,
config_path="conf",
config_name="training",
overrides=["optimizer.lr=0.001,0.01,0.1"],
dataset="s3://my-bucket/imagenet",
mode="remote",
run_options={
"name": "my-training-sweep",
"service_account": "default",
"copy_style": "all",
"raw_data_path": "s3://my-bucket/raw-data",
"debug": True,
},
)Flyte CLI (flyte hydra run)
flyte hydra run is registered through the flyte.plugins.cli.commands entry point. It loads a task from a Python file, composes a Hydra config, and runs the task without requiring the script to have its own @hydra.main function. It also inherits the relevant flags from flyte run (--project, --domain, --image, --name, --service-account, --raw-data-path, --copy-style, --debug, --local, --follow).
Single run
Remote (default):
flyte hydra run --config-path conf --config-name training \
train.py pipeline --dataset s3://my-bucket/imagenetForced local:
flyte hydra run --local --config-path conf --config-name training \
train.py pipeline --dataset s3://my-bucket/imagenetGrid sweep
flyte hydra run --multirun --config-path conf --config-name training \
--wait-max-workers 64 \
train.py pipeline --dataset s3://my-bucket/imagenet \
--cfg "optimizer.lr=0.001,0.01,0.1" --cfg "training.epochs=10,20"App-level vs Hydra-namespace overrides
The CLI keeps app-level overrides separate from Hydra runtime overrides so they do not collide with ordinary Flyte task arguments.
App-level overrides target the composed config and are passed through the task’s DictConfig parameter name. For pipeline(cfg: DictConfig, ...), use --cfg. For pipeline_with_config(config: DictConfig, ...), use --config:
flyte hydra run --config-path conf --config-name training \
train.py pipeline \
--cfg optimizer.lr=0.01 \
--cfg training.epochs=20
flyte hydra run --config-path conf --config-name training \
train.py pipeline_with_config \
--config optimizer.lr=0.01Hydra runtime overrides: Anything in the hydra.* or hydra/* namespace go through --hydra-override:
flyte hydra run --config-path conf --config-name training \
train.py pipeline \
--hydra-override hydra.run.dir=./outputs/exp1 \
--hydra-override hydra/launcher=flyteCustom sweepers combine the two:
flyte hydra run --multirun --config-path conf --config-name training \
train.py pipeline --dataset s3://my-bucket/imagenet \
--hydra-override hydra/sweeper=optuna \
--hydra-override hydra.sweeper.n_trials=20 \
--hydra-override hydra.sweeper.n_jobs=4 \
--cfg "optimizer.lr=interval(1e-4,1e-1)" \
--cfg "training.epochs=choice(10,20,50)"--follow and --no-wait
--follow streams logs from the launched run after submission; it implies waiting and cannot be combined with --no-wait. --no-wait returns immediately after submission and skips log streaming.
Shell completion
Install Click’s completion hook for the flyte executable. For zsh:
eval "$(_FLYTE_COMPLETE=zsh_source flyte)"For bash:
eval "$(_FLYTE_COMPLETE=bash_source flyte)"Once installed, flyte hydra run adds Hydra-aware completion after SCRIPT TASK_NAME. The command imports the script, inspects the task signature, and suggests:
- The app override flag matching the task’s
DictConfigparameter (--cfg,--config, …). - Override values for that flag and
--hydra-overridevia Hydra’s own completion engine, including config keys, config-group selections and sweep functions.
flyte hydra run --config-path conf --config-name training \
train.py pipeline --cfg optimizer.<TAB>
# suggests optimizer.lr=, optimizer.weight_decay=, ...
flyte hydra run --config-path conf --config-name training \
train.py pipeline --hydra-override hydra/launcher=<TAB>
# suggests hydra launcher choicesBecause completion has to import the target script, keep task definitions and ConfigStore registration import-safe, and avoid expensive top-level work in scripts you reach via flyte hydra run.
Override grammar
The override grammar is identical to standard Hydra; what differs is only how you pass the strings (positional in python train.py ..., list entries in overrides=[...], repeated --cfg/--hydra-override on the Flyte CLI).
| Form | Meaning |
|---|---|
optimizer.lr=0.01 |
Set an existing key. |
optimizer=sgd |
Select a config group (replaces the optimizer subtree with conf/optimizer/sgd.yaml). |
+task_env=a100 |
Append a config group whose key is not currently in the config. |
+training.grad_clip=1.0 |
Append a key that does not exist. |
++optimizer.lr=0.05 |
Force-set a key, creating it if missing and overriding strict-schema errors. |
~training.warmup_steps |
Delete a key from the composed config. |
optimizer.lr=0.001,0.01,0.1 |
Sweep value (with --multirun); expanded into one job per element. |
optimizer.lr=interval(1e-4,1e-1) |
Continuous sweep range; consumed by samplers like Optuna. |
optimizer=choice(adam,sgd) |
Categorical sweep; consumed by samplers. |
hydra.run.dir=./outputs/exp1 |
Hydra-namespace value override (single run output dir). |
hydra.sweep.dir=./outputs/sweep1 |
Hydra-namespace sweep output dir. |
hydra/sweeper=optuna |
Hydra-namespace config group selection (activates the Optuna sweeper plugin). |
Sweeps
Grid sweeps (BasicSweeper)
Comma-separated overrides expand into a Cartesian product. The plugin uses Hydra’s BasicSweeper to expand them, then submits one Flyte execution per combination.
from flyteplugins.hydra import hydra_sweep
runs = hydra_sweep(
pipeline,
config_path="conf", config_name="training",
overrides=["model=resnet,vit", "optimizer.lr=0.001,0.01,0.1"],
dataset="s3://my-bucket/imagenet",
mode="remote",
) # 6 executionsflyte hydra run --multirun --config-path conf --config-name training \
train.py pipeline --dataset s3://my-bucket/imagenet \
--cfg "model=resnet,vit" --cfg "optimizer.lr=0.001,0.01,0.1"Hardware presets can sweep alongside hyperparameters:
flyte hydra run --multirun --config-path conf --config-name training \
train.py pipeline --dataset s3://my-bucket/imagenet \
--cfg "+task_env=a10g,a100" --cfg "optimizer.lr=0.001,0.01,0.1"Bayesian / TPE sweeps (Optuna)
Install the sweeper, then activate it via hydra/sweeper=optuna. Continuous parameters use interval(...); categorical parameters use choice(...).
pip install hydra-optuna-sweeperflyte hydra run --multirun --config-path conf --config-name training \
train.py pipeline --dataset s3://my-bucket/imagenet \
--hydra-override "hydra/sweeper=optuna" \
--hydra-override "hydra.sweeper.n_trials=30" \
--hydra-override "hydra.sweeper.n_jobs=5" \
--cfg "optimizer.lr=interval(1e-4,1e-1)" \
--cfg "optimizer.weight_decay=interval(1e-6,1e-2)" \
--cfg "model=choice(resnet,vit)"When wait=True, each remote run’s wrapped result exposes the task output as a float (via __float__), so Optuna can use it directly as the trial objective. With wait=False, the sweeper sees the run URL but cannot read objective values; use this only for fire-and-forget submission.
Other sweepers that respect Hydra’s plugin protocol are activated the same way: install the package, select hydra/sweeper=<name>, and set the sweeper’s parameters under hydra.sweeper.*.
Sweep output directories
Hydra-namespace overrides redirect where Hydra writes per-job logs and config snapshots:
flyte hydra run --multirun --config-path conf --config-name training \
train.py pipeline --dataset s3://my-bucket/imagenet \
--hydra-override "hydra.sweep.dir=./outputs/sweep1" \
--hydra-override "hydra.sweep.subdir=\${hydra.job.num}" \
--cfg "optimizer.lr=0.001,0.01,0.1"Task environment overrides
Hydra is good at composing flat YAML; Flyte tasks need richer settings such as resources and container images. The plugin reserves a config key named task_env by default that maps task names to task.override kwargs.
task_env:
pipeline:
resources:
cpu: "2"
memory: 8Gi
train_model:
resources:
cpu: "16"
memory: 64Gi
gpu: "A100:1"When the plugin launches a task, it looks up task_env[<entry-task-name>] (pipeline in this example) and applies the values via task.override(...). Resource mappings are converted into flyte.Resources(**values) automatically.
Prebuilt images
To run a task in a prebuilt container image, set image (and optionally primary_container_name):
task_env:
pipeline:
image: ghcr.io/acme/flyte-training:latest
primary_container_name: main
resources:
cpu: "4"
memory: 16Gitask.override does not accept image directly. The task image is part of the task definition. Instead, the plugin lowers the override to a flyte.PodTemplate whose primary container uses the requested image:
- If the task has no inline pod template, a new one is created.
- If the task already has an inline
flyte.PodTemplate, the plugin deep-copies it and sets only the image on the primary container. - If the task references a pod template by name (a string), the plugin raises an error. You must patch a string-named template by editing it in cluster config rather than at submission time.
Applying overrides to child tasks
The launcher only controls the entry task it submits. Child tasks called from within the entry task are not patched automatically. Use apply_task_env to apply the same resources/image handling to a child task before invoking it:
from flyteplugins.hydra import apply_task_env
@env.task
async def pipeline(cfg: DictConfig, dataset: str) -> float:
data = await preprocess(cfg)
train_task = apply_task_env(train_model, cfg)
_, val_loss = await train_task(cfg, data)
return val_lossThis keeps the override knobs in YAML/CLI surfaces while leaving each task in control of which children it patches.
Renaming the task-env key
If your config uses a different name for the task-env subtree, pass it explicitly:
hydra_run(..., task_env_key="task_environment")flyte hydra run --task-env-key task_environment ...What task_env should not model
The YAML schema intentionally omits the full Kubernetes V1PodSpec. Keep advanced pod configuration (volumes, init containers, node selectors, etc.) in Python task/environment code where you have a real type. Use Hydra task_env presets for the common knobs only: image, primary container name and resources.
Structured configs (without YAML)
Structured configs work with this plugin as long as they are registered before the launcher composes the config. flyte hydra run imports the script first, so top-level ConfigStore.instance().store(...) calls run before composition.
from dataclasses import dataclass, field
from hydra.core.config_store import ConfigStore
from omegaconf import DictConfig
@dataclass
class TrainingConf:
epochs: int = 30
batch_size: int = 64
@dataclass
class RootConf:
training: TrainingConf = field(default_factory=TrainingConf)
ConfigStore.instance().store(name="structured_training", node=RootConf)Run a fully-structured config without YAML:
flyte hydra run --config-name structured_training \
train.py pipeline --dataset s3://my-bucket/imagenetThe same config also works through @hydra.main:
python train.py --config-name structured_trainingIf the structured config still references YAML config groups, keep --config-path conf. If everything is registered in ConfigStore, omit --config-path.
if __name__ == "__main__": or inside the @hydra.main function body. flyte hydra run and shell completion inspect the script at import time, before either of those blocks runs, and registrations placed there will not be visible.Structured configs sweep just like YAML configs:
runs = hydra_sweep(
pipeline,
config_path=None,
config_name="structured_training",
overrides=["training.epochs=10,20", "training.batch_size=32,64"],
dataset="s3://my-bucket/imagenet",
mode="remote",
)