Multiple environments
In many applications, different tasks within your workflow may require different configurations. Flyte enables you to manage this complexity by allowing multiple environments within a single workflow.
Multiple environments are useful when:
- Different tasks in your workflow need different dependencies.
- Some tasks require specific CPU/GPU or memory configurations.
- A task requires a secret that other tasks do not (and you want to limit exposure of the secret value).
- You’re integrating specialized tools that have conflicting requirements.
Constraints on multiple environments
To use multiple environments in your workflow you define multiple TaskEnvironment
instances, each with its own configuration, and then assign tasks to their respective environments.
There are, however, two additional constraints that you must take into account.
If task_1
in environment env_1
calls a task_2
in environment env_2
, then:
env_1
must declare a deployment-time dependency onenv_2
in thedepends_on
parameter ofTaskEnvironment
that definesenv_1
.- The image used in the
TaskEnvironment
ofenv_1
must include all dependencies of the module containing thetask_2
(unlesstask_2
is invoked as a remote task).
Task depends_on
constraints
The depends_on
parameter in TaskEnvironment
is used to provide deployment-time dependencies by establishing a relationship between one TaskEnvironment
and another.
The system uses this information to determine which environments (and, specifically which images) need to be built in order to be able to run the code.
On flyte run
(or flyte deploy
), the system walks the tree defined by the depends_on
relationships, starting with the environment of the task being invoked (or the environment being deployed, in the case of flyte deploy
), and prepares each required environment.
Most importantly, it ensures that the container images need for all required environments are available (and if not, it builds them).
This deploy-time determination of what to build is important because it means that for any given run
or deploy
, only those environments that are actually required are built.
The alternative strategy of building all environments defined in the set of deployed code can lead to unnecessary and expensive builds, especially when iterating on code.
Dependency inclusion constraints
When a parent task invokes a child task in a different environment, the container image of the parent task environment must include all dependencies used by the child task. This is necessary because of the way task invocation works in Flyte:
- When a child task is invoked by function name, that function, necessarily, has to be imported into the parent tasks’s Python environment.
- This results in all the dependencies of the child task function also being imported.
- But, nonetheless, the actual execution of the child task occurs in its own environment.
To avoid this requirement, you can invoke a task in another environment remotely.
Example
The following example is a (very) simple mock of an AlphaFold2 pipeline. It demonstrates a workflow with three tasks, each in its own environment.
The example project looks like this:
├── msa/
│ ├── __init__.py
│ └── run.py
├── fold/
│ ├── __init__.py
│ └── run.py
├── __init__.py
└── main.py
(The source code for this example can be found here: AlphaFold2 mock example)
In file msa/run.py
we define the task run_msa
, which mocks the multiple sequence alignment step of the process:
import flyte
from flyte.io import File
MSA_PACKAGES = ["pytest"]
msa_image = flyte.Image.from_debian_base().with_pip_packages(*MSA_PACKAGES)
msa_env = flyte.TaskEnvironment(name="msa_env", image=msa_image)
@msa_env.task
def run_msa(x: str) -> File:
f = File.new_remote()
with f.open_sync("w") as fp:
fp.write(x)
return f
- A dedicated image (
msa_image
) is built using theMSA_PACKAGES
dependency list, on top of the standard base image. - A dedicated environment (
msa_env
) is defined for the task, usingmsa_image
. - The task is defined within the context of the
msa_env
environment.
In file fold/run.py
we define the task run_fold
, which mocks the fold step of the process:
import flyte
from flyte.io import File
FOLD_PACKAGES = ["ruff"]
fold_image = flyte.Image.from_debian_base().with_pip_packages(*FOLD_PACKAGES)
fold_env = flyte.TaskEnvironment(name="fold_env", image=fold_image)
@fold_env.task
def run_fold(sequence: str, msa: File) -> list[str]:
with msa.open_sync("r") as f:
msa_content = f.read()
return [msa_content, sequence]
- A dedicated image (
fold_image
) is built using theFOLD_PACKAGES
dependency list, on top of the standard base image. - A dedicated environment (
fold_env
) is defined for the task, usingfold_image
. - The task is defined within the context of the
fold_env
environment.
Finally, in file main.py
we define the task main
that ties everything together into a workflow.
We import the required modules and functions:
import logging
import pathlib
from fold.run import fold_env, fold_image, run_fold
from msa.run import msa_env, MSA_PACKAGES, run_msa
import flyte
Notice that we import
- The task functions that we will be calling:
run_fold
andrun_msa
. - The environments of those tasks:
fold_env
andmsa_env
. - The dependency list of the
run_msa
task:MSA_PACKAGES
- The image of the
run_fold
task:fold_image
We then assemble the image and the environment:
main_image = fold_image.with_pip_packages(*MSA_PACKAGES)
env = flyte.TaskEnvironment(
name="multi_env",
depends_on=[fold_env, msa_env],
image=main_image,
)
The image for the main
task (main_image
) is built by starting with fold_image
(the image for the run_fold
task) and adding MSA_PACKAGES
(the dependency list for the run_msa
task).
This ensures that main_image
includes all dependencies needed by both the run_fold
and run_msa
tasks.
The environment for the main
task is defined with:
- The image
main_image
. This ensures that themain
task has all the dependencies it needs. - A depends_on list that includes both
fold_env
andmsa_env
. This establishes the deploy-time dependencies on those environments.
Finally, we define the main
task itself:
@env.task
def main(sequence: str) -> list[str]:
"""Given a sequence, outputs files containing the protein structure
This requires model weights + gpus + large database on aws fsx lustre
"""
print(f"Running AlphaFold2 for sequence: {sequence}")
msa = run_msa(sequence)
print(f"MSA result: {msa}, passing to fold task")
results = run_fold(sequence, msa)
print(f"Fold results: {results}")
return results
Here we call, in turn, the run_msa
and run_fold
tasks.
Since we call them directly rather than as remote tasks, we had to ensure that main_image
includes all dependencies needed by both tasks.
The final piece of the puzzle is the if __name__ == "__main__":
block that allows us to run the main
task on the configured Flyte backend:
if __name__ == "__main__":
flyte.init_from_config(root_dir=pathlib.Path(__file__).parent, log_level=logging.INFO)
r = flyte.run(main, "AAGGTTCCAA")
print(r.url)
Now you can run the workflow with:
python main.py