Multiple environments

In many applications, different tasks within your workflow may require different configurations. Flyte enables you to manage this complexity by allowing multiple environments within a single workflow.

Multiple environments are useful when:

  • Different tasks in your workflow need different dependencies.
  • Some tasks require specific CPU/GPU or memory configurations.
  • A task requires a secret that other tasks do not (and you want to limit exposure of the secret value).
  • You’re integrating specialized tools that have conflicting requirements.

Constraints on multiple environments

To use multiple environments in your workflow you define multiple TaskEnvironment instances, each with its own configuration, and then assign tasks to their respective environments.

There are, however, two additional constraints that you must take into account. If task_1 in environment env_1 calls a task_2 in environment env_2, then:

  1. env_1 must declare a deployment-time dependency on env_2 in the depends_on parameter of TaskEnvironment that defines env_1.
  2. The image used in the TaskEnvironment of env_1 must include all dependencies of the module containing the task_2 (unless task_2 is invoked as a remote task).

Task depends_on constraints

The depends_on parameter in TaskEnvironment is used to provide deployment-time dependencies by establishing a relationship between one TaskEnvironment and another. The system uses this information to determine which environments (and, specifically which images) need to be built in order to be able to run the code.

On flyte run (or flyte deploy), the system walks the tree defined by the depends_on relationships, starting with the environment of the task being invoked (or the environment being deployed, in the case of flyte deploy), and prepares each required environment. Most importantly, it ensures that the container images need for all required environments are available (and if not, it builds them).

This deploy-time determination of what to build is important because it means that for any given run or deploy, only those environments that are actually required are built. The alternative strategy of building all environments defined in the set of deployed code can lead to unnecessary and expensive builds, especially when iterating on code.

Dependency inclusion constraints

When a parent task invokes a child task in a different environment, the container image of the parent task environment must include all dependencies used by the child task. This is necessary because of the way task invocation works in Flyte:

  • When a child task is invoked by function name, that function, necessarily, has to be imported into the parent tasks’s Python environment.
  • This results in all the dependencies of the child task function also being imported.
  • But, nonetheless, the actual execution of the child task occurs in its own environment.

To avoid this requirement, you can invoke a task in another environment remotely.

Example

The following example is a (very) simple mock of an AlphaFold2 pipeline. It demonstrates a workflow with three tasks, each in its own environment.

The example project looks like this:

├── msa/
│   ├── __init__.py
│   └── run.py
├── fold/
│   ├── __init__.py
│   └── run.py
├── __init__.py
└── main.py

(The source code for this example can be found here: AlphaFold2 mock example)

In file msa/run.py we define the task run_msa, which mocks the multiple sequence alignment step of the process:

import flyte
from flyte.io import File

MSA_PACKAGES = ["pytest"]

msa_image = flyte.Image.from_debian_base().with_pip_packages(*MSA_PACKAGES)

msa_env = flyte.TaskEnvironment(name="msa_env", image=msa_image)


@msa_env.task
def run_msa(x: str) -> File:
    f = File.new_remote()
    with f.open_sync("w") as fp:
        fp.write(x)
    return f
  • A dedicated image (msa_image) is built using the MSA_PACKAGES dependency list, on top of the standard base image.
  • A dedicated environment (msa_env) is defined for the task, using msa_image.
  • The task is defined within the context of the msa_env environment.

In file fold/run.py we define the task run_fold, which mocks the fold step of the process:

import flyte
from flyte.io import File

FOLD_PACKAGES = ["ruff"]

fold_image = flyte.Image.from_debian_base().with_pip_packages(*FOLD_PACKAGES)

fold_env = flyte.TaskEnvironment(name="fold_env", image=fold_image)


@fold_env.task
def run_fold(sequence: str, msa: File) -> list[str]:
    with msa.open_sync("r") as f:
        msa_content = f.read()
    return [msa_content, sequence]
  • A dedicated image (fold_image) is built using the FOLD_PACKAGES dependency list, on top of the standard base image.
  • A dedicated environment (fold_env) is defined for the task, using fold_image.
  • The task is defined within the context of the fold_env environment.

Finally, in file main.py we define the task main that ties everything together into a workflow.

We import the required modules and functions:

import logging
import pathlib

from fold.run import fold_env, fold_image, run_fold
from msa.run import msa_env, MSA_PACKAGES, run_msa

import flyte

Notice that we import

  • The task functions that we will be calling: run_fold and run_msa.
  • The environments of those tasks: fold_env and msa_env.
  • The dependency list of the run_msa task: MSA_PACKAGES
  • The image of the run_fold task: fold_image

We then assemble the image and the environment:

main_image = fold_image.with_pip_packages(*MSA_PACKAGES)

env = flyte.TaskEnvironment(
    name="multi_env",
    depends_on=[fold_env, msa_env],
    image=main_image,
)

The image for the main task (main_image) is built by starting with fold_image (the image for the run_fold task) and adding MSA_PACKAGES (the dependency list for the run_msa task). This ensures that main_image includes all dependencies needed by both the run_fold and run_msa tasks.

The environment for the main task is defined with:

  • The image main_image. This ensures that the main task has all the dependencies it needs.
  • A depends_on list that includes both fold_env and msa_env. This establishes the deploy-time dependencies on those environments.

Finally, we define the main task itself:

@env.task
def main(sequence: str) -> list[str]:
    """Given a sequence, outputs files containing the protein structure
    This requires model weights + gpus + large database on aws fsx lustre
    """
    print(f"Running AlphaFold2 for sequence: {sequence}")
    msa = run_msa(sequence)
    print(f"MSA result: {msa}, passing to fold task")
    results = run_fold(sequence, msa)
    print(f"Fold results: {results}")
    return results

Here we call, in turn, the run_msa and run_fold tasks. Since we call them directly rather than as remote tasks, we had to ensure that main_image includes all dependencies needed by both tasks.

The final piece of the puzzle is the if __name__ == "__main__": block that allows us to run the main task on the configured Flyte backend:

if __name__ == "__main__":
    flyte.init_from_config(root_dir=pathlib.Path(__file__).parent, log_level=logging.INFO)
    r = flyte.run(main, "AAGGTTCCAA")
    print(r.url)

Now you can run the workflow with:

python main.py