=== PAGE: https://www.union.ai/docs/v1/selfmanaged === # Documentation Welcome to the documentation. ## Subpages - **Union.ai Self-managed** - **Tutorials** - **Integrations** - **Reference** - **Community** - **Architecture** - **Platform deployment** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide === # Union.ai Self-managed Union.ai empowers AI development teams to rapidly ship high-quality code to production by offering optimized performance, unparalleled resource efficiency, and a delightful workflow authoring experience. With Union.ai your team can: * Run complex AI workloads with performance, scale, and efficiency. * Achieve millisecond-level execution times with reusable containers. * Scale out to multiple regions, clusters, and clouds as needed for resource availability, scale, or compliance. > [!NOTE] > Union.ai is built on top of the leading open-source workflow orchestrator, [Flyte](/docs/v1/flyte/). > > Union.ai Self-managed provides **all the features of Flyte, plus much more** > while letting you keep your data and workflow code on your infrastructure and under your own management. > > You can switch to another product version with the selector above. ### ๐Ÿ’ก **Introduction** Union.ai builds on the leading open-source workflow orchestrator, Flyte, to provide a powerful, scalable, and flexible platform for AI applications. ### ๐Ÿ”ข **Getting started** Build your first Union.ai workflow, exploring the major features of the platform along the way. ### ๐Ÿ”— **Core concepts** Understand the core concepts of the Union.ai platform. ### ๐Ÿ”— **Development cycle** Explore the Union.ai development cycle from experimentation to production. ### ๐Ÿ”— **Data input/output** Manage the input and output of data in your Union.ai workflow. ### ๐Ÿ”— **Programming** Learn about Union.ai-specific programming constructs. ### ๐Ÿ”— **Administration** Union.ai Self-managed administrators can manage users, projects, and resources. ### ๐Ÿ”— [Integrations](integrations) Union.ai Self-managed integrates with your cloud resources and external services. ### ๐Ÿ”— **FAQ** Frequently asked questions. ## Subpages - **Introduction** - **Getting started** - **Core concepts** - **Development cycle** - **Data input/output** - **Administration** - **Programming** - **FAQ** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/introduction === # Introduction Union.ai unifies your AI development on a single end-to-end platform, bringing together data, models and compute with workflows of execution on a single pane of glass. Union.ai builds on [Flyte](https://flyte.org), the open-source standard for orchestrating AI workflows. It offers all the features of Flyte while adding more capability to scale, control costs and serve models. There are three deployment options for Union.ai: **Serverless**, **BYOC** (Bring Your Own Cloud), and **Self-managed**. ## Flyte Flyte provides the building blocks need for an end-to-end AI platform: * Reusable, immutable tasks and workflows * Declarative task-level resource provisioning * GitOps-style versioning and branching * Strongly-typed interfaces between tasks enabling more reliable code * Caching, intra-task checkpointing, and spot instance provisioning * Task parallelism with *map tasks* * Dynamic workflows created at runtime for process flexibility Flyte is open source and free to use. You can switch to the Flyte docs [here](/docs/v1/flyte/). You can try out Flyte's technology: * In the cloud with [Union.ai Serverless](https://signup.union.ai). * On your machine with a **Development cycle > Running in a local cluster**. For production use, you have to **Platform deployment**. ## Union.ai Serverless [Union.ai Serverless](/docs/v1/serverless/) is a turn-key solution that provides a fully managed cloud environment for running your workflows. There is zero infrastructure to manage, and you pay only for the resources you use. Your data and workflow code is stored safely and securely in Union.ai's cloud infrastructure. Union.ai Serverless provides: * **All the features of Flyte** * Granular, task-level resource monitoring * Fine-grained role-based access control (RBAC) * Faster performance: * Launch plan caching: Cache launch plans, 10-100x speed-up * Optimized Propeller: more than 10 core optimizations * Faster cache: Revamped caching subsystem for 10x faster performance * Accelerated datasets: Retrieve repeated datasets and models more quickly * Faster launch plan resolution * Reusable containers (do not pay the pod spin-up penalty) * Interactive tasks: * Edit, debug and run tasks right in the pod through VS Code in the browser * Artifacts discovery and lineage * Reactive workflows: * Launch plans trigger (and kick off workflows) on artifact creation * Smart defaults and automatic linking * UI based workflow builder ## Union.ai BYOC [Union.ai BYOC](/docs/v1/byoc/) (Bring Your Own Cloud) lets you keep your data and workflow code on your infrastructure, while Union.ai takes care of the management. Union.ai BYOC provides: * **All the features of Flyte** * **All the features of Union.ai Serverless** * Accelerators and GPUs (including fractional GPUs) * Managed Ray and Spark * Multi-cluster and multi-cloud * Single sign-on (SSO) * SOC-2 Type 2 compliance ## Union.ai Self-managed [Union.ai Self-managed](/docs/v1/selfmanaged/) lets you keep full control of your data, code, and infrastructure. Union.ai Self-managed provides: * **All the features of Flyte** * **All the features of Union.ai Serverless** * **All the features of Union.ai BYOC** The only difference between Union.ai BYOC and Union.ai Self-managed is that with Self-managed you are responsible for the system infrastructure, either partially or fully, according to which option you choose: * Deploy and manage your data plane yourself on your infrastructure while Union.ai manages the control plane on our infrastructure. * Deploy and manage both your data plane and control plane on your infrastructure with support and guidance from Union.ai. This option is suitable for air-gapped deployments. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/getting-started === # Getting started This section gives you a quick introduction to writing and running Union.ai workflows. ## Gather your credentials After your administrator has onboarded you to Union.ai (see **Platform deployment**), you should have the following at hand: - Your Union.ai credentials. - The URL of your Union.ai instance. We will refer to this as `` below. ## Log into Union.ai Navigate to the UI at `` and log in with your credentials. Once you have logged in you should see the Union.ai UI. To get started, try selecting the default project, called `flytesnacks`, from the list of projects. This will take you to `flytesnacks` project dashboard: ![Union.ai UI](../../_static/images/quick-start/byoc-dashboard.png) This dashboard gives you an overview of the workflows and tasks in your project. Since you are just starting out, it will be empty. To build and deploy your first workflow, the first step is to **Getting started > Local setup**. ## Subpages - **Getting started > Local setup** - **Getting started > First project** - **Getting started > Understanding the code** - **Getting started > Running your workflow** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/getting-started/local-setup === # Local setup In this section we will set up your local environment so that you can start building and deploying Union.ai workflows from your local machine. ## Install `uv` First, [install `uv`](https://docs.astral.sh/uv/#getting-started). > [!NOTE] Using `uv` as best practice > The `uv` tool is our [recommended package and project manager](https://docs.astral.sh/uv/). > It replaces `pip`, `pip-tools`, `pipx`, `poetry`, `pyenv`, `twine`, `virtualenv`, and more. > > You can, of course, use other tools, > but all discussion in these pages will use `uv`, > so you will have to adapt the directions as appropriate. ## Ensure the correct version of Python is installed Union requires Python `>=3.9,<3.13`. We recommend using `3.12`. You can install it with: ```shell $ uv python install 3.12 ``` > [!NOTE] Uninstall higher versions of Python > When installing Python packages "as tools" (as we do below with the `union`), > `uv` will default to the latest version of Python available on your system. > If you have a version `>=3.13` installed, you will need to uninstall it since `union` requires `>=3.9,<3.13`. ## Install the `union` CLI Once `uv` is installed, use it to install the `union` CLI by installing the `union` Python package: ```shell $ uv tool install union ``` This will make the `union` CLI globally available on your system. > [!NOTE] Add the installation location to your PATH > `uv` installs tools in `~/.local/bin` by default. > Make sure this location is in your `PATH`, so you can run the `union` command from anywhere. > `uv` provides a convenience command to do this: `uv tool update-shell`. > > Note that later in this guide we will be running the `union` CLI to run your workflows. > In those cases you will be running `union` within the Python virtual environment of your workflow project. > You will not be using this globally installed instance of `union`. > This instance of `union` is only used during the configuration step, below, when no projects yet exist. ## Configure the connection to your cluster Next, you need to create a configuration file that contains your Union.ai connection information: ```shell $ union create login --host ``` `` is the URL of your Union.ai instance, mentioned in **Getting started > Gather your credentials**. This will create the `~/.union/config.yaml` with the configuration information to connect to your Union.ai instance. > [!NOTE] > These directions apply to Union.ai BYOC and Self-managed, where you connect to your own dedicated Union.ai instance. > To configure a connection to Union.ai Serverless, see the > [Serverless version of this page](/docs/v1/serverless//user-guide/getting-started/local-setup#configure-the-connection-to-your-cluster). See **Development cycle > Running in a local cluster** for more details on the format of the `yaml` file. By default, the Union CLI will look for a configuration file at `~/.union/config.yaml`. (See **Union CLI** for more details.) You can override this behavior to specify a different configuration file by setting the `UNION_CONFIG` environment variable: ```shell $ export UNION_CONFIG=~/.my-config-location/my-config.yaml ``` Alternatively, you can always specify the configuration file on the command line when invoking `union` by using the `--config` flag. For example: ```shell $ union --config ~/.my-config-location/my-config.yaml run my_script.py my_workflow ``` > [!WARNING] > If you have previously used Union.ai, you may have configuration files left over that will interfere with > access to Union.ai Serverless through the Union CLI tool. > Make sure to remove any files in `~/.unionai/` or `~/.union/` and unset the environment > variables `UNIONAI_CONFIG` and `UNION_CONFIG` to avoid conflicts. See **Development cycle > Running in a local cluster** for more details on the format of the `yaml` file. ## Check your CLI configuration To check your CLI configuration, run: ```shell $ union info ``` You should get a response like this: ```shell $ union info โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Union.ai CLI Info โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ โ”‚ โ”‚ union is the CLI to interact with Union.ai. Use the CLI to register, create and track task and workflow executions locally and remotely. โ”‚ โ”‚ โ”‚ โ”‚ Union.ai Version : 0.1.132 โ”‚ โ”‚ Flytekit Version : 1.14.3 โ”‚ โ”‚ Union.ai Endpoint : โ”‚ โ”‚ Config Source : file โ”‚ โ”‚ โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ ``` For more details on connection configuration see **Development cycle > Authentication**. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/getting-started/first-project === # First project In this section we will set up a new project. This involves creating a local project directory holding your project code and a corresponding Union.ai project to which you will deploy that code using the `union` CLI. ## Create a new Union.ai project Create a new project in the Union.ai UI by clicking on the project breadcrumb at the top left and selecting **All projects**: ![Select all projects](../../_static/images/user-guide/getting-started/first-project/select-all-projects.png) This will take you to the **Projects list**: ![Projects list](../../_static/images/user-guide/getting-started/first-project/projects-list.png) Click on the **New Project** button and fill in the details for your new project. For this example, let's create a project called **My project**: ![Create new project](../../_static/images/user-guide/getting-started/first-project/create-new-project.png "small") You now have a project on Union.ai named "My Project" (and with project ID `my-project`) into which you can register your workflows. > [!NOTE] Default project > Union.ai provides a default project (called **flytesnacks**) where all your workflows will be registered unless you specify otherwise. > In this section, however, we will be using the project we just created, not the default. ## Initialize a local project We will use the `union init` command to initialize a new local project corresponding to the project created on your Union.ai instance: ```shell $ union init --template union-simple my-project ``` The resulting directory will look like this: ```shell โ”œโ”€โ”€ LICENSE โ”œโ”€โ”€ README.md โ”œโ”€โ”€ hello_world.py โ”œโ”€โ”€ pyproject.toml โ””โ”€โ”€ uv.lock ``` > [!NOTE] Local project directory name same as Union.ai project ID > It is good practice to name your local project directory the same as your > Union.ai project ID, as we have done here. Next, let's look at the contents of the local project directory. Continue to **Getting started > Understanding the code**. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/getting-started/understanding-the-code === # Understanding the code This is a simple "Hello, world!" example consisting of flat directory: ```shell โ”œโ”€โ”€ LICENSE โ”œโ”€โ”€ README.md โ”œโ”€โ”€ hello_world.py โ”œโ”€โ”€ pyproject.toml โ””โ”€โ”€ uv.lock ``` ## Python code The `hello_world.py` file illustrates the essential components of a Union.ai workflow: ```python # Hello World import union image_spec = union.ImageSpec( # The name of the image. This image will be used byt he say_hello task name="say-hello-image", # Lock file with dependencies to install in image requirements="uv.lock", # Build the image using Union's built-in cloud builder (not locally on your machine) builder="union", ) @union.task(container_image=image_spec) def say_hello(name: str) -> str: return f"Hello, {name}!" @union.workflow def hello_world_wf(name: str = "world") -> str: greeting = say_hello(name=name) return greeting ``` ### ImageSpec The `ImageSpec` object is used to define the container image that will run the tasks in the workflow. Here we have the simplest possible `ImageSpec` object, which specifies: * The `name` of the image. * This name will be used to identify the image in the container registry. * The `requirements` parameter. * We specify that the requirements should be read from the `uv.lock` file. * The `builder` to use to build the image. * We specify `union` to indicate that the image is built using Union.ai's cloud image builder. See **Development cycle > ImageSpec** for more information. ### Tasks The `@union.task` decorator indicates a Python function that defines a **Core concepts > Tasks**. A task tasks some input and produces an output. When deployed to Union.ai cluster, each task runs in its own Kubernetes pod. For a full list of task parameters, see **Core concepts > Tasks > Task parameters**. ### Workflow The `@union.workflow` decorator indicates a function that defines a **Core concepts > Workflows**. This function contains references to the tasks defined elsewhere in the code. A workflow appears to be a Python function but is actually a [DSL](https://en.wikipedia.org/wiki/Domain-specific_language) that only supports a subset of Python syntax and semantics. When deployed to Union.ai, the workflow function is compiled to construct the directed acyclic graph (DAG) of tasks, defining the order of execution of task pods and the data flow dependencies between them. > [!NOTE] `@union.task` and `@union.workflow` syntax > * The `@union.task` and `@union.workflow` decorators will only work on functions at the top-level > scope of the module. > * You can invoke tasks and workflows as regular Python functions and even import and use them in > other Python modules or scripts. > * Task and workflow function signatures must be type-annotated with Python type hints. > * Task and workflow functions must be invoked with keyword arguments. ## pyproject.toml The `pyproject.toml` is the standard project configuration used by `uv`. It specifies the project dependencies and the Python version to use. The default `pyproject.toml` file created by `union init` from the `union-simple` template looks like this ```toml [project] name = "union-simple" version = "0.1.0" description = "A simple Union.ai project" readme = "README.md" requires-python = ">=3.9,<3.13" dependencies = ["union"] ``` (You can update the `name` and `description` to match the actual name of your project, `my-project`, if you like). The most important part of the file is the list of dependencies, in this case consisting of only one package, `union`. See [uv > Configuration > Configuration files](https://docs.astral.sh/uv/configuration/files/) for details. ## uv.lock The `uv.lock` file is generated from `pyproject.toml` by `uv sync` command. It contains the exact versions of the dependencies required by the project. The `uv.lock` included in the `init` template may not reflect the latest version of the dependencies, so you should update it by doing a fresh `uv sync`. See [uv > Concepts > Projects > Locking and syncing](https://docs.astral.sh/uv/concepts/projects/sync/) for details. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/getting-started/running-your-workflow === # Running your workflow ## Python virtual environment The first step is to ensure that your `uv.lock` file is properly generated from your `pyproject.toml` file and that your local Python virtual environment is properly set up. Using `uv`, you can install the dependencies with the command: ```shell $ uv sync ``` You can then activate the virtual environment with: ```shell $ source .venv/bin/activate ``` > [!NOTE] `activate` vs `uv run` > When running the `union` CLI within your local project you must run it in the virtual > environment _associated with_ that project. > This differs from our earlier usage of the tool when > **Getting started > Running your workflow > we installed `union` globally** in order to > **Getting started > Running your workflow > set up its configuration**. > > To run `union` within your project's virtual environment using `uv`, > you can prefix it use the `uv run` command. For example: > > `uv run union ...` > > Alternatively, you can activate the virtual environment with `source .venv/bin/activate` and then > run the `union` command directly. > > In our examples we assume that you are doing the latter. ## Run the code locally Because tasks and workflows are defined as regular Python functions, they can be executed in your local Python environment. You can run the workflow locally with the command **Union CLI > `union` CLI commands**: ```shell $ union run hello_world.py hello_world_wf ``` You should see output like this: ```shell Running Execution on local. Hello, world! ``` You can also pass in parameters to the workflow (assuming they declared in the workflow function): ```shell $ union run hello_world.py hello_world_wf --name="everybody" ``` You should see output like this: ```shell Running Execution on local. Hello, everybody! ``` ## Running remotely on Union.ai in the cloud Running you code in your local Python environment is useful for testing and debugging. But to run them at scale, you will need to deploy them (or as we say, "register" them) on to your Union.ai instance in the cloud. When task and workflow code is registered: * The `@union.task` function is loaded into a container defined by the `ImageSpec` object specified in the `container_image` parameter of the decorator. * The `@union.workflow` function is compiled into a directed acyclic graph that controls the running of the tasks invoked within it. To run the workflow on Union.ai in the cloud, use the **Union CLI > `union` CLI commands** and the ```shell $ union run --remote --project my-project --domain development hello_world.py hello_world_wf ``` The output displays a URL that links to the workflow execution in the UI: ```shell ๐Ÿ‘ Build submitted! โณ Waiting for build to finish at: https:///org/... โœ… Build completed in 0:01:57! [โœ”] Go to https:///org/... to see execution in the UI. ``` Click the link to see the execution in the UI. ## Register the workflow without running Above we used the `union run --remote` to register and immediately run a workflow on Union.ai. This is useful for quick testing, but for more complex workflows you may want to register the workflow first and then run it from the Union.ai interface. To do this, you can use the `union register` command to register the workflow code with Union.ai. The form of the command is: ```shell $ union register [] ``` in our case, from within the `getting-started` directory, you would do: ```shell $ union register --project my-project --domain development . ``` This registers all code in the current directory to Union.ai but does not immediately run anything. You should see the following output (or similar) in your terminal: ```shell Running union register from /Users/my-user/scratch/my-project with images ImageConfig(default*image=Image(name='default', fqn='cr.flyte.org/flyteorg/flytekit', tag='py3.12-1.14.6', digest=None), images=[Image(name='default', fqn='cr.flyte.org/flyteorg/flytekit', tag='py3.12-1.14.6', digest=None)]) and image destination folder /root on 1 package(s) ('/Users/my-user/scratch/my-project',) Registering against demo.hosted.unionai.cloud Detected Root /Users/my-user/my-project, using this to create deployable package... Loading packages ['my-project'] under source root /Users/my-user/my-project No output path provided, using a temporary directory at /var/folders/vn/72xlcb5d5jbbb3kk_q71sqww0000gn/T/tmphdu9wf6* instead Computed version is sSFSdBXwUmM98sYv930bSQ Image say-hello-image:lIpeqcBrlB8DlBq0NEMR3g found. Skip building. Serializing and registering 3 flyte entities [โœ”] Task: my-project.hello_world.say_hello [โœ”] Workflow: my-project.hello_world.hello_world_wf [โœ”] Launch Plan: my-project.hello_world.hello_world_wf Successfully registered 3 entities ``` ## Run the workflow from the Union.ai interface To run the workflow, you need to go to the Union.ai interface: 1. Navigate to the Union.ai dashboard. 2. In the left sidebar, click **Workflows**. 3. Search for your workflow, then select the workflow from the search results. 4. On the workflow page, click **Launch Workflow**. 5. In the "Create New Execution" dialog, you can change the workflow version, launch plan, and inputs (if present). Click "Advanced options" to change the security context, labels, annotations, max parallelism, override the interruptible flag, and overwrite cached inputs. 6. To execute the workflow, click **Launch**. You should see the workflow status change to "Running", then "Succeeded" as the execution progresses. To view the workflow execution graph, click the **Graph** tab above the running workflow. ## View the workflow execution on Union.ai When you view the workflow execution graph, you will see the following: ![Graph](../../_static/images/user-guide/getting-started/running-your-workflow/graph.png) Above the graph, there is metadata that describes the workflow execution, such as the duration and the workflow version. Next, click on the `evaluate_model` node to open up a sidebar that contains additional information about the task: ![Sidebar](../../_static/images/user-guide/getting-started/running-your-workflow/sidebar.png) === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts === # Core concepts Union.ai is a platform for building and orchestrating the execution of interconnected software processes across machines in a computer cluster. In Union.ai terminology, the software processes are called *tasks* and the overall organization of connections between tasks is called a *workflow*. The tasks in a workflow are connected to each other by their inputs and outputs. The output of one task becomes the input of another. More precisely, a workflow in Union.ai is a *directed acyclic graph (DAG)* of *nodes* where each node is a unit of execution and the edges between nodes represent the flow of data between them. The most common type of node is a task node (which encapsulates a task), though there are also workflow nodes (which encapsulate subworkflows) and branch nodes. In most contexts we just say that a workflow is a DAG of tasks. You define tasks and workflows in Python using the Union SDK. The Union SDK provides a set of decorators and classes that allow you to define tasks and workflows in a way that is easy to understand and work with. Once defined, tasks and workflows are deployed to your Union.ai instance (we say they are *registered* to the instance), where they are compiled into a form that can be executed on your Union.ai cluster. In addition to tasks and workflows, another important concept in Union.ai is the **Core concepts > Launch plans**. A launch plan is like a template that can be used to define the inputs to a workflow. Triggering a launch plan will launch its associated workflow with the specified parameters. ## Defining tasks and workflows Using the Union SDK, tasks and workflows are defined as Python functions using the `@union.task` and `@union.workflow` decorators, respectively: ```python import union @union.task def task_1(a: int, b: int, c: int) -> int: return a + b + c @union.task def task_2(m: int, n: int) -> int: return m * n @union.task def task_3(x: int, y: int) -> int: return x - y @union.workflow def my_workflow(a: int, b: int, c: int, m: int, n: int) -> int: x = task_1(a=a, b=b, c=c) y = task_2(m=m, n=n) return task_3(x=x, y=y) ``` Here we see three tasks defined using the `@union.task` decorator and a workflow defined using the `@union.workflow` decorator. The workflow calls `task_1` and `task_2` and passes the results to `task_3` before finally outputting the result of `task_3`. When the workflow is registered, Union.ai compiles the workflow into a directed acyclic graph (DAG) based on the input/output dependencies between the tasks. The DAG is then used to execute the tasks in the correct order, taking advantage of any parallelism that is possible. For example, the workflow above results in the following DAG: ![Workflow DAG](../../_static/images/user-guide/core-concepts/workflow-dag.png) ### Type annotation is required One important difference between Union.ai and generic Python is that in Union.ai all inputs and outputs *must be type annotated*. This is because tasks are strongly typed, meaning that the types of the inputs and outputs are validated at deployment time. See **Core concepts > Tasks > Tasks are strongly typed** for more details. ### Workflows *are not* full Python functions The definition of a workflow must be a valid Python function, so it can be run locally as a normal Python function during development, but only *a subset of Python syntax is allowed*, because it must also be compiled into a DAG that is deployed and executed on Union.ai. *Technically then, the language of a workflow function is a domain-specific language (DSL) that is a subset of Python.* See **Core concepts > Workflows** for more details. ## Registering tasks and workflows ### Registering on the command line with `union` or `uctl` In most cases, workflows and tasks (and possibly other things, such as launch plans) are defined in your project code and registered as a bundle using `union` or `uctl` For example: ```shell $ union register ./workflows --project my_project --domain development ``` Tasks can also be registered individually, but it is more common to register alongside the workflow that uses them. See **Development cycle > Running your code**. ### Registering in Python with `UnionRemote` As with all Union.ai command line actions, you can also perform registration of workflows and tasks programmatically with [`UnionRemote`](), specifically, [`UnionRemote.register_script`](), [`UnionRemote.register_workflow`](), and [`UnionRemote.register_task`](). ## Results of registration When the code above is registered to Union.ai, it results in the creation of five objects: * The tasks `workflows.my_example.task_1`, `workflows.my_example.task_2`, and `workflows.my_example.task_3` (see **Core concepts > Tasks** for more details). * The workflow `workflows.my_example.my_workflow`. * The default launch plan `workflows.my_example.my_workflow` (see **Core concepts > Launch plans** for more details). Notice that the task and workflow names are derived from the path, file name and function name of the Python code that defines them: `..`. The default launch plan for a workflow always has the same name as its workflow. ## Changing tasks and workflows Tasks and workflows are changed by altering their definition in code and re-registering. When a task or workflow with the same project, domain, and name as a preexisting one is re-registered, a new version of that entity is created. ## Inspecting tasks and workflows ### Inspecting workflows in the UI Select **Workflows** in the sidebar to display a list of all the registered workflows in the project and domain. You can search the workflows by name. Click on a workflow in the list to see the **workflow view**. The sections in this view are as follows: * **Recent Workflow Versions**: A list of recent versions of this workflow. Select a version to see the **Workflow version view**. This view shows the DAG and a list of all version of the task. You can switch between versions with the radio buttons. * **All Executions in the Workflow**: A list of all executions of this workflow. Click on an execution to go to the **Core concepts > Workflows > Viewing workflow executions**. * **Launch Workflow button**: In the top right of the workflow view, you can click the **Launch Workflow** button to run the workflow with the default inputs. ### Inspecting tasks in the UI Select **Tasks** in the sidebar to display a list of all the registered tasks in the project and domain. You can search the launch plans by name. To filter for only those that are archived, check the **Show Only Archived Tasks** box. Click on a task in the list to see the task view The sections in the task view are as follows: * **Inputs & Outputs**: The name and type of each input and output for the latest version of this task. * **Recent Task Versions**: A list of recent versions of this task. Select a version to see the **Task version view**: This view shows the task details and a list of all version of the task. You can switch between versions with the radio buttons. See **Core concepts > Tasks** for more information. * **All Executions in the Task**: A list of all executions of this task. Click on an execution to go to the execution view. * **Launch Task button**: In the top right of the task view, you can click the **Launch Task** button to run the task with the default inputs. ### Inspecting workflows on the command line with `uctl` To view all tasks within a project and domain: ```shell $ uctl get workflows \ --project \ --domain ``` To view a specific workflow: ```shell $ uctl get workflow \ --project \ --domain \ ``` See **Uctl CLI** for more details. ### Inspecting tasks on the command line with `uctl` To view all tasks within a project and domain: ```shell $ uctl get tasks \ --project \ --domain ``` To view a specific task: ```shell $ uctl get task \ --project \ --domain \ ``` See **Uctl CLI** for more details. ### Inspecting tasks and workflows in Python with `UnionRemote` Use the method [`UnionRemote.fetch_workflow`]() or [`UnionRemote.client.get_workflow`]() to get a workflow. See [`UnionRemote`]() for more options and details. Use the method [`UnionRemote.fetch_task`]() or [`UnionRemote.client.get_task`]() to get a task. See [`UnionRemote`]() for more options and details. ## Running tasks and workflows ### Running a task or workflow in the UI To run a workflow in the UI, click the **Launch Workflow** button in the workflow view. You can also run individual tasks in the UI by clicking the **Launch Task** button in the task view. ### Running a task or workflow locally on the command line with `union` or `python` You can execute a Union.ai workflow or task locally simply by calling it just like any regular Python function. For example, you can add the following to the above code: ```python if __name__ == "__main__": my_workflow(a=1, b=2, c=3, m=4, n=5) ``` If the file is saved as `my_example.py`, you can run it locally using the following command: ```shell $ python my_example.py ``` Alternatively, you can run the task locally with the `union` command line tool: To run it locally, you can use the following `union run` command: ```shell $ union run my_example.py my_workflow --a 1 --b 2 --c 3 --m 4 --n 5 ``` This has the advantage of allowing you to specify the input values as command line arguments. For more details on running workflows and tasks, see **Development cycle**. ### Running a task or workflow remotely on the command line with `union` To run a workflow remotely on your Union.ai installation, use the following command (this assumes that you have your **Development cycle > Setting up a production project**): ```shell $ union run --remote my_example.py my_workflow --a 1 --b 2 --c 3 --m 4 --n 5 ``` ### Running a task or workflow remotely in Python with `UnionRemote` To run a workflow or task remotely in Python, use the method [`UnionRemote.execute`](). See [`UnionRemote`]() for more options and details. ## Subpages - **Core concepts > Workflows** - **Core concepts > Tasks** - **Core concepts > Launch plans** - **Core concepts > Actors** - **Core concepts > Artifacts** - **Core concepts > App Serving** - **Core concepts > Caching** - **Core concepts > Workspaces** - **Core concepts > Named outputs** - **Core concepts > ImageSpec** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/workflows === # Workflows So far in our discussion of workflows, we have focused on top-level workflows decorated with `@union.workflow`. These are, in fact, more accurately termed **Core concepts > Workflows > Standard workflows** to differentiate them from the other types of workflows that exist in Union.ai: **Core concepts > Workflows > Subworkflows and sub-launch plans**, **Core concepts > Workflows > Dynamic workflows**, and **Core concepts > Workflows > Imperative workflows**. In this section, we will delve deeper into the fundamentals of all of these workflow types, including their syntax, structure, and behavior. ## Subpages - **Core concepts > Workflows > Standard workflows** - **Core concepts > Workflows > Subworkflows and sub-launch plans** - **Core concepts > Workflows > Dynamic workflows** - **Core concepts > Workflows > Imperative workflows** - **Core concepts > Workflows > Launching workflows** - **Core concepts > Workflows > Viewing workflows** - **Core concepts > Workflows > Viewing workflow executions** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/workflows/standard-workflows === # Standard workflows A standard workflow is defined by a Python function decorated with the `@union.workflow` decorator. The function is written in a domain specific language (DSL), a subset of Python syntax that describes the directed acyclic graph (DAG) that is deployed and executed on Union.ai. The syntax of a standard workflow definition can only include the following: * Calls to functions decorated with `@union.task` and assignment of variables to the returned values. * Calls to other functions decorated with `@union.workflow` and assignment of variables to the returned values (see **Core concepts > Workflows > Subworkflows and sub-launch plans**). * Calls to **Core concepts > Launch plans** (see **Core concepts > Workflows > Subworkflows and sub-launch plans > When to use sub-launch plans**) * Calls to functions decorated with `@union.dynamic` and assignment of variables to the returned values (see **Core concepts > Workflows > Dynamic workflows**). * The special **Programming > Conditionals**. * Statements using the **Programming > Chaining Entities**. ## Evaluation of a standard workflow When a standard workflow is **Core concepts > Workflows > Standard workflows > run locally in a Python environment** it is executed as a normal Python function. However, when it is registered to Union.ai, the top level `@union.workflow`-decorated function is evaluated as follows: * Inputs to the workflow are materialized as lazily-evaluated promises which are propagated to downstream tasks and subworkflows. * All values returned by calls to functions decorated with `@union.task` or `@union.dynamic` are also materialized as lazily-evaluated promises. The resulting structure is used to construct the Directed Acyclic Graph (DAG) and deploy the required containers to the cluster. The actual evaluation of these promises occurs when the tasks (or dynamic workflows) are executed in their respective containers. ## Conditional construct Because standard workflows cannot directly include Python `if` statements, a special `conditional` construct is provided that allows you to define conditional logic in a workflow. For details, see **Programming > Conditionals**. ## Chaining operator When Union.ai builds the DAG for a standard workflow, it uses the passing of values from one task to another to determine the dependency relationships between tasks. There may be cases where you want to define a dependency between two tasks that is not based on the output of one task being passed as an input to another. In that case, you can use the chaining operator `>>` to define the dependencies between tasks. For details, see **Programming > Chaining Entities**. ## Workflow decorator parameters The `@union.workflow` decorator can take the following parameters: * `failure_policy`: Use the options in **Flytekit SDK**. * `interruptible`: Indicates if tasks launched from this workflow are interruptible by default. See **Core concepts > Tasks > Task hardware environment > Interruptible instances**. * `on_failure`: Invoke this workflow or task on failure. The workflow specified must have the same parameter signature as the current workflow, with an additional parameter called `error`. * `docs`: A description entity for the workflow. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/workflows/subworkflows-and-sub-launch-plans === # Subworkflows and sub-launch plans In Union.ai it is possible to invoke one workflow from within another. A parent workflow can invoke a child workflow in two ways: as a **subworkflow** or via a **Core concepts > Launch plans > Running launch plans > Sub-launch plans**. In both cases the child workflow is defined and registered normally, exists in the system normally, and can be run independently. But, if the child workflow is invoked from within the parent **by directly calling the child's function**, then it becomes a **subworkflow**. The DAG of the subworkflow is embedded directly into the DAG of the parent and effectively become part of the parent workflow execution, sharing the same execution ID and execution context. On the other hand, if the child workflow is invoked from within the parent **Core concepts > Launch plans**, this is called a **sub-launch plan**. It results in a new top-level workflow execution being invoked with its own execution ID and execution context. It also appears as a separate top-level entity in the system. The only difference is that it happens to have been kicked off from within another workflow instead of from the command line or the UI. Here is an example: ```python import union @union.workflow def sub_wf(a: int, b: int) -> int: return t(a=a, b=b) # Get the default launch plan of sub_wf, which we name sub_wf_lp sub_wf_lp = union.LaunchPlan.get_or_create(sub_wf) @union.workflow def main_wf(): # Invoke sub_wf directly. # An embedded subworkflow results. sub_wf(a=3, b=4) # Invoke sub_wf through its default launch plan, here called sub_wf_lp # An independent subworkflow results. sub_wf_lp(a=1, b=2) ``` ## When to use subworkflows Subworkflows allow you to manage parallelism between a workflow and its launched sub-flows, as they execute within the same context as the parent workflow. Consequently, all nodes of a subworkflow adhere to the overall constraints imposed by the parent workflow. Here's an example illustrating the calculation of slope, intercept and the corresponding y-value. ```python import union @union.task def slope(x: list[int], y: list[int]) -> float: sum_xy = sum([x[i] * y[i] for i in range(len(x))]) sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) n = len(x) return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) @union.task def intercept(x: list[int], y: list[int], slope: float) -> float: mean_x = sum(x) / len(x) mean_y = sum(y) / len(y) intercept = mean_y - slope * mean_x return intercept @union.workflow def slope_intercept_wf(x: list[int], y: list[int]) -> (float, float): slope_value = slope(x=x, y=y) intercept_value = intercept(x=x, y=y, slope=slope_value) return (slope_value, intercept_value) @union.task def regression_line(val: int, slope_value: float, intercept_value: float) -> float: return (slope_value * val) + intercept_value # y = mx + c @union.workflow def regression_line_wf(val: int = 5, x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> float: slope_value, intercept_value = slope_intercept_wf(x=x, y=y) return regression_line(val=val, slope_value=slope_value, intercept_value=intercept_value) ``` The `slope_intercept_wf` computes the slope and intercept of the regression line. Subsequently, the `regression_line_wf` triggers `slope_intercept_wf` and then computes the y-value. It is possible to nest a workflow that contains a subworkflow within yet another workflow. Workflows can be easily constructed from other workflows, even if they also function as standalone entities. For example, each workflow in the example below has the capability to exist and run independently: ```python import union @union.workflow def nested_regression_line_wf() -> float: return regression_line_wf() ``` ## When to use sub-launch plans Sub-launch plans can be useful for implementing exceptionally large or complicated workflows that canโ€™t be adequately implemented as **Core concepts > Workflows > Dynamic workflows** or **Core concepts > Workflows > Subworkflows and sub-launch plans > map tasks**. Dynamic workflows and map tasks share the same context and single underlying Kubernetes resource definitions. Sub-launch plan invoked workflows do not share the same context. They are executed as separate top-level entities, allowing for better parallelism and scale. Here is an example of invoking a workflow multiple times through its launch plan: ```python import union @union.task def my_task(a: int, b: int, c: int) -> int: return a + b + c @union.workflow def my_workflow(a: int, b: int, c: int) -> int: return my_task(a=a, b=b, c=c) my_workflow_lp = union.LaunchPlan.get_or_create(my_workflow) @union.workflow def wf() -> list[int]: return [my_workflow_lp(a=i, b=i, c=i) for i in [1, 2, 3]] ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/workflows/dynamic-workflows === # Dynamic workflows A workflow whose directed acyclic graph (DAG) is computed at run-time is a [`dynamic`]() workflow. The tasks in a dynamic workflow are executed at runtime using dynamic inputs. A dynamic workflow shares similarities with the [`workflow`](), as it uses a Python-esque domain-specific language to declare dependencies between the tasks or define new workflows. A key distinction lies in the dynamic workflow being assessed at runtime. This means that the inputs are initially materialized and forwarded to the dynamic workflow, resembling the behavior of a task. However, the return value from a dynamic workflow is a [`Promise`]() object, which can be materialized by the subsequent tasks. Think of a dynamic workflow as a combination of a task and a workflow. It is used to dynamically decide the parameters of a workflow at runtime and is both compiled and executed at run-time. Dynamic workflows become essential when you need to do the following: - Handle conditional logic - Modify the logic of the code at runtime - Change or decide on feature extraction parameters on the fly ## Defining a dynamic workflow You can define a dynamic workflow using the `@union.dynamic` decorator. Within the `@union.dynamic` context, each invocation of a [`task`]() or a derivative of the [`Task`]() class leads to deferred evaluation using a Promise, rather than the immediate materialization of the actual value. While nesting other `@union.dynamic` and `@union.workflow` constructs within this task is possible, direct interaction with the outputs of a task/workflow is limited, as they are lazily evaluated. If you need to interact with the outputs, we recommend separating the logic in a dynamic workflow and creating a new task to read and resolve the outputs. The example below uses a dynamic workflow to count the common characters between any two strings. We define a task that returns the index of a character, where A-Z/a-z is equivalent to 0-25: ```python import union @union.task def return_index(character: str) -> int: if character.islower(): return ord(character) - ord("a") else: return ord(character) - ord("A") ``` We also create a task that prepares a list of 26 characters by populating the frequency of each character: ```python @union.task def update_list(freq_list: list[int], list_index: int) -> list[int]: freq_list[list_index] += 1 return freq_list ``` We define a task to calculate the number of common characters between the two strings: ```python @union.task def derive_count(freq1: list[int], freq2: list[int]) -> int: count = 0 for i in range(26): count += min(freq1[i], freq2[i]) return count ``` We define a dynamic workflow to accomplish the following: 1. Initialize an empty 26-character list to be passed to the `update_list` task. 2. Iterate through each character of the first string (`s1`) and populate the frequency list. 3. Iterate through each character of the second string (`s2`) and populate the frequency list. 4. Determine the number of common characters by comparing the two frequency lists. The looping process depends on the number of characters in both strings, which is unknown until runtime: ```python @union.dynamic def count_characters(s1: str, s2: str) -> int: # s1 and s2 should be accessible # Initialize empty lists with 26 slots each, corresponding to every alphabet (lower and upper case) freq1 = [0] * 26 freq2 = [0] * 26 # Loop through characters in s1 for i in range(len(s1)): # Calculate the index for the current character in the alphabet index = return_index(character=s1[i]) # Update the frequency list for s1 freq1 = update_list(freq_list=freq1, list_index=index) # index and freq1 are not accessible as they are promises # looping through the string s2 for i in range(len(s2)): # Calculate the index for the current character in the alphabet index = return_index(character=s2[i]) # Update the frequency list for s2 freq2 = update_list(freq_list=freq2, list_index=index) # index and freq2 are not accessible as they are promises # Count the common characters between s1 and s2 return derive_count(freq1=freq1, freq2=freq2) ``` A dynamic workflow is modeled as a task in the Union.ai backend, but the body of the function is executed to produce a workflow at runtime. In both dynamic and static workflows, the output of tasks are Promise objects. Union.ai executes the dynamic workflow within its container, resulting in a compiled DAG, which is then accessible in the UI. It uses the information acquired during the dynamic task's execution to schedule and execute each task within the dynamic workflow. Visualization of the dynamic workflow's graph in the UI is only available after it has completed its execution. When a dynamic workflow is executed, it generates the entire workflow structure as its output, termed the *futures file*. This name reflects the fact that the workflow has yet to be executed, so all subsequent outputs are considered futures. > [!NOTE] > Local execution works when a `@union.dynamic` decorator is used because Union treats it as a task that runs with native Python inputs. Finally, we define a standard workflow that triggers the dynamic workflow: ```python @union.workflow def start_wf(s1: str, s2: str) -> int: return count_characters(s1=s1, s2=s2) ``` You can run the workflow locally as follows: ```python if __name__ == "__main__": print(start_wf(s1="Pear", s2="Earth")) ``` ## Advantages of dynamic workflows ### Flexibility Dynamic workflows streamline the process of building pipelines, offering the flexibility to design workflows according to the unique requirements of your project. This level of adaptability is not achievable with static workflows. ### Lower pressure on `etcd` The workflow Custom Resource Definition (CRD) and the states associated with static workflows are stored in `etcd`, the Kubernetes database. This database maintains Union.ai workflow CRDs as key-value pairs, tracking the status of each node's execution. However, `etcd` has a hard limit on data size, encompassing the workflow and node status sizes, so it is important to ensure that static workflows don't excessively consume memory. In contrast, dynamic workflows offload the workflow specification (including node/task definitions and connections) to the object store. Still, the statuses of nodes are stored in the workflow CRD within `etcd`. Dynamic workflows help alleviate some pressure on `etcd` storage space, providing a solution to mitigate storage constraints. ## Dynamic workflows vs. map tasks Dynamic tasks come with overhead for large fan-out tasks as they store metadata for the entire workflow. In contrast, **Core concepts > Workflows > Dynamic workflows > map tasks** prove efficient for such extensive fan-out scenarios since they refrain from storing metadata, resulting in less noticeable overhead. ## Using dynamic workflows to achieve recursion Merge sort is a perfect example to showcase how to seamlessly achieve recursion using dynamic workflows. Union.ai imposes limitations on the depth of recursion to prevent misuse and potential impacts on the overall stability of the system. ```python from typing import Tuple import union @union.task def split(numbers: list[int]) -> tuple[list[int], list[int]]: length = len(numbers) return ( numbers[0 : int(length / 2)], numbers[int(length / 2) :] ) @union.task def merge(sorted_list1: list[int], sorted_list2: list[int]) -> list[int]: result = [] while len(sorted_list1) > 0 and len(sorted_list2) > 0: # Compare the current element of the first array with the current element of the second array. # If the element in the first array is smaller, append it to the result and increment the first array index. # Otherwise, do the same with the second array. if sorted_list1[0] < sorted_list2[0]: result.append(sorted_list1.pop(0)) else: result.append(sorted_list2.pop(0)) # Extend the result with the remaining elements from both arrays result.extend(sorted_list1) result.extend(sorted_list2) return result @union.task def sort_locally(numbers: list[int]) -> list[int]: return sorted(numbers) @union.dynamic def merge_sort_remotely(numbers: list[int], threshold: int) -> list[int]: split1, split2 = split(numbers=numbers) sorted1 = merge_sort(numbers=split1, threshold=threshold) sorted2 = merge_sort(numbers=split2, threshold=threshold) return merge(sorted_list1=sorted1, sorted_list2=sorted2) @union.dynamic def merge_sort(numbers: list[int], threshold: int=5) -> list[int]: if len(numbers) <= threshold: return sort_locally(numbers=numbers) else: return merge_sort_remotely(numbers=numbers, threshold=threshold) ``` By simply adding the `@union.dynamic` annotation, the `merge_sort_remotely` function transforms into a plan of execution, generating a workflow with four distinct nodes. These nodes run remotely on potentially different hosts, with Union.ai ensuring proper data reference passing and maintaining execution order with maximum possible parallelism. `@union.dynamic` is essential in this context because the number of times `merge_sort` needs to be triggered is unknown at compile time. The dynamic workflow calls a static workflow, which subsequently calls the dynamic workflow again, creating a recursive and flexible execution structure. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/workflows/imperative-workflows === # Imperative workflows Workflows are commonly created by applying the `@union.workflow` decorator to Python functions. During compilation, this involves processing the function's body and utilizing subsequent calls to underlying tasks to establish and record the workflow structure. This is the *declarative* approach and is suitable when manually drafting the workflow. However, in cases where workflows are constructed programmatically, an imperative style is more appropriate. For instance, if tasks have been defined already, their sequence and dependencies might have been specified in textual form (perhaps during a transition from a legacy system). In such scenarios, you want to orchestrate these tasks. This is where Union.ai's imperative workflows come into play, allowing you to programmatically construct workflows. ## Example To begin, we define the `slope` and `intercept` tasks: ```python import union @union.task def slope(x: list[int], y: list[int]) -> float: sum_xy = sum([x[i] * y[i] for i in range(len(x))]) sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) n = len(x) return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) @union.task def intercept(x: list[int], y: list[int], slope: float) -> float: mean_x = sum(x) / len(x) mean_y = sum(y) / len(y) intercept = mean_y - slope * mean_x return intercept ``` Create an imperative workflow: ```python imperative_wf = Workflow(name="imperative_workflow") ``` Add the workflow inputs to the imperative workflow: ```python imperative_wf.add_workflow_input("x", list[int]) imperative_wf.add_workflow_input("y", list[int]) ``` > If you want to assign default values to the workflow inputs, you can create a **Core concepts > Launch plans**. Add the tasks that need to be triggered from within the workflow: ```python node_t1 = imperative_wf.add_entity(slope, x=imperative_wf.inputs["x"], y=imperative_wf.inputs["y"]) node_t2 = imperative_wf.add_entity( intercept, x=imperative_wf.inputs["x"], y=imperative_wf.inputs["y"], slope=node_t1.outputs["o0"] ) ``` Lastly, add the workflow output: ```python imperative_wf.add_workflow_output("wf_output", node_t2.outputs["o0"]) ``` You can execute the workflow locally as follows: ```python if __name__ == "__main__": print(f"Running imperative_wf() {imperative_wf(x=[-3, 0, 3], y=[7, 4, -2])}") ``` You also have the option to provide a list of inputs and retrieve a list of outputs from the workflow: ```python wf_input_y = imperative_wf.add_workflow_input("y", list[str]) node_t3 = wf.add_entity(some_task, a=[wf.inputs["x"], wf_input_y]) wf.add_workflow_output( "list_of_outputs", [node_t1.outputs["o0"], node_t2.outputs["o0"]], python_type=list[str], ) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/workflows/launching-workflows === # Launching workflows From the **Core concepts > Workflows > Viewing workflows > Workflow view** (accessed, for example, by selecting a workflow in the **Core concepts > Workflows > Viewing workflows > Workflows list**) you can select **Launch Workflow** in the top right. This opens the **New Execution** dialog for workflows: ![New execution dialog settings](../../../_static/images/user-guide/core-concepts/workflows/launching-workflows/new-execution-dialog-settings.png) At the top you can select: * The specific version of this workflow that you want to launch. * The launch plan to be used to launch this workflow (by default it is set to the **Core concepts > Launch plans > Default launch plan**). Along the left side the following sections are available: * **Inputs**: The input parameters of the workflow function appear here as fields to be filled in. * **Settings**: * **Execution name**: A custom name for this execution. If not specified, a name will be generated. * **Overwrite cached outputs**: A boolean. If set to `True`, this execution will overwrite any previously-computed cached outputs. * **Raw output data config**: Remote path prefix to store raw output data. By default, workflow output will be written to the built-in metadata storage. Alternatively, you can specify a custom location for output at the organization, project-domain, or individual execution levels. This field is for specifying this setting at the workflow execution level. If this field is filled in it overrides any settings at higher levels. The parameter is expected to be a URL to a writable resource (for example, `http://s3.amazonaws.com/my-bucket/`). See **Data input/output > Task input and output > Raw data store**. * **Max parallelism**: Number of workflow nodes that can be executed in parallel. If not specified, project/domain defaults are used. If 0 then no limit is applied. * **Force interruptible**: A three valued setting for overriding the interruptible setting of the workflow for this particular execution. If not set, the workflow's interruptible setting is used. If set and **enabled** then `interruptible=True` is used for this execution. If set and **disabled** then `interruptible=False` is used for this execution. See **Core concepts > Tasks > Task hardware environment > Interruptible instances** * **Service account**: The service account to use for this execution. If not specified, the default is used. * **Environment variables**: Environment variables that will be available to tasks in this workflow execution. * **Labels**: Labels to apply to the execution resource. * **Notifications**: **Core concepts > Launch plans > Notifications** configured for this workflow execution. * **Debug**: The workflow execution details for debugging purposes. Select **Launch** to launch the workflow execution. This will take you to the **Core concepts > Workflows > Viewing workflow executions**. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/workflows/viewing-workflows === # Viewing workflows ## Workflows list The workflows list shows all workflows in the current project and domain: ![Workflows list](../../../_static/images/user-guide/core-concepts/workflows/viewing-workflows/workflows-list.png) You can search the list by name and filter for only those that are archived. To archive a workflow, select the archive icon ![Archive icon](../../../_static/images/user-guide/core-concepts/workflows/viewing-workflows/archive-icon.png). Each entry in the list provides some basic information about the workflow: * **Last execution time**: The time of the most recent execution of this workflow. * **Last 10 executions**: The status of the last 10 executions of this workflow. * **Inputs**: The input type for the workflow. * **Outputs**: The output type for the workflow. * **Description**: The description of the workflow. Select an entry on the list to go to that **Core concepts > Workflows > Viewing workflows > Workflow view**. ## Workflow view The workflow view provides details about a specific workflow. ![Workflow view](../../../_static/images/user-guide/core-concepts/workflows/viewing-workflows/workflow-view.png) This view provides: * A list of recent workflow versions: Selecting a version will take you to the **Core concepts > Workflows > Viewing workflows > Workflow view > Workflow versions list**. * A list of recent executions: Selecting an execution will take you to the **Core concepts > Workflows > Viewing workflow executions**. ### Workflow versions list The workflow versions list shows the a list of all versions of this workflow along with a graph view of the workflow structure: ![Workflow version list](../../../_static/images/user-guide/core-concepts/workflows/viewing-workflows/workflow-versions-list.png) ### Workflow and task descriptions Union.ai enables the use of docstrings to document your code. Docstrings are stored in the control plane and displayed on the UI for each workflow or task. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/workflows/viewing-workflow-executions === # Viewing workflow executions The **Executions list** shows all executions in a project and domain combination. An execution represents a single run of all or part of a workflow (including subworkflows and individual tasks). You can access it from the **Executions** link in the left navigation. ![Executions list](../../../_static/images/user-guide/core-concepts/workflows/viewing-workflow-executions/executions-list.png) ## Domain Settings This section displays any domain-level settings that have been configured for this project-domain combination. They are: * Security Context * Labels * Annotations * Raw output data config * Max parallelism ## All Executions in the Project For each execution in this project and domain you can see the following: * A graph of the **last 100 executions in the project**. * **Start time**: Select to view the **Core concepts > Workflows > Viewing workflow executions > Execution view**. * **Workflow/Task**: The **Core concepts > Workflows > Viewing workflows** or **Core concepts > Tasks > Viewing tasks** that ran in this execution. * **Version**: The version of the workflow or task that ran in this execution. * **Launch Plan**: The **Core concepts > Launch plans > Viewing launch plans** that was used to launch this execution. * **Schedule**: The schedule that was used to launch this execution (if any). * **Execution ID**: The ID of the execution. * **Status**: The status of the execution. One of **QUEUED**, **RUNNING**, **SUCCEEDED**, **FAILED** or **UNKNOWN**. * **Duration**: The duration of the execution. ## Execution view The execution view appears when you launch a workflow or task or select an already completed execution. An execution represents a single run of all or part of a workflow (including subworkflows and individual tasks). ![Execution view - nodes](../../../_static/images/user-guide/core-concepts/workflows/viewing-workflow-executions/execution-view-nodes.png) > [!NOTE] > An execution usually represents the run of an entire workflow. > But, because workflows are composed of tasks (and sometimes subworkflows) and Union.ai caches the outputs of those independently of the workflows in which they participate, it sometimes makes sense to execute a task or subworkflow independently. The top part of execution view provides detailed general information about the execution. The bottom part provides three tabs displaying different aspects of the execution: **Nodes**, **Graph**, and **Timeline**. ### Nodes The default tab within the execution view is the **Nodes** tab. It shows a list of the Union.ai nodes that make up this execution (A node in Union.ai is either a task or a (sub-)workflow). Selecting an item in the list opens the right panel showing more details of that specific node: ![](../../../_static/images/user-guide/core-concepts/workflows/viewing-workflow-executions/execution-view-node-side-panel.png) The top part of the side panel provides detailed information about the node as well as the **Rerun task** button. Below that, you have the following tabs: **Executions**, **Inputs**, **Outputs**, and **Task**. The **Executions** tab gives you details on the execution of this particular node as well as access to: * **Task level monitoring**: You can access the **Core concepts > Tasks > Task hardware environment > Task-level monitoring** information by selecting **View Utilization**. * **Logs**: You can access logs by clicking the text under **Logs**. See **Core concepts > Tasks > Viewing logs**. The **Inputs**, **Outputs** tabs display the data that was passed into and out of the node, respectively. If this node is a task (as opposed to a subworkflow) then the **Task** tab displays the Task definition structure. ### Graph The Graph tab displays a visual representation of the execution as a directed acyclic graph: ![](../../../_static/images/user-guide/core-concepts/workflows/viewing-workflow-executions/execution-view-graph.png) ### Timeline The Timeline tab displays a visualization showing the timing of each task in the execution: ![](../../../_static/images/user-guide/core-concepts/workflows/viewing-workflow-executions/execution-view-timeline.png) === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks === # Tasks Tasks are the fundamental units of compute in Union.ai. They are independently executable, strongly typed, and containerized building blocks that make up workflows. Workflows are constructed by chaining together tasks, with the output of one task feeding into the input of the next to form a directed acyclic graph. ## Tasks are independently executable Tasks are designed to be independently executable, meaning that they can be run in isolation from other tasks. And since most tasks are just Python functions, they can be executed on your local machine, making it easy to unit test and debug tasks locally before deploying them to Union.ai. Because they are independently executable, tasks can also be shared and reused across multiple workflows and, as long as their logic is deterministic, their input and outputs can be **Core concepts > Caching** to save compute resources and execution time. ## Tasks are strongly typed Tasks have strongly typed inputs and outputs, which are validated at deployment time. This helps catch bugs early and ensures that the data passing through tasks and workflows is compatible with the explicitly stated types. Under the hood, Union.ai uses the [Flyte type system]() and translates between the Flyte types and the Python types. Python type annotations make sure that the data passing through tasks and workflows is compatible with the explicitly stated types defined through a function signature. The Union.ai type system is also used for caching, data lineage tracking, and automatic serialization and deserialization of data as itโ€™s passed from one task to another. ## Tasks are containerized While (most) tasks are locally executable, when a task is deployed to Union.ai as part of the registration process it is containerized and run in its own independent Kubernetes pod. This allows tasks to have their own independent set of [software dependencies](./task-software-environment/_index) and [hardware requirements](./task-hardware-environment/_index). For example, a task that requires a GPU can be deployed to Union.ai with a GPU-enabled container image, while a task that requires a specific version of a software library can be deployed with that version of the library installed. ## Tasks are named, versioned, and immutable The fully qualified name of a task is a combination of its project, domain, and name. To update a task, you change it and re-register it under the same fully qualified name. This creates a new version of the task while the old version remains available. At the version level task are, therefore, immutable. This immutability is important for ensuring that workflows are reproducible and that the data lineage is accurate. ## Tasks are (usually) deterministic and cacheable When deciding if a unit of execution is suitable to be encapsulated as a task, consider the following questions: * Is there a well-defined graceful/successful exit criteria for the task? * A task is expected to exit after completion of input processing. * Is it deterministic and repeatable? * Under certain circumstances, a task might be cached or rerun with the same inputs. It is expected to produce the same output every time. You should, for example, avoid using random number generators with the current clock as seed. * Is it a pure function? That is, does it have side effects that are unknown to the system? * It is recommended to avoid side effects in tasks. * When side effects are unavoidable, ensure that the operations are idempotent. For details on task caching, see **Core concepts > Caching**. ## Workflows can contain many types of tasks One of the most powerful features of Union.ai is the ability to run widely differing computational workloads as tasks with a single workflow. Because of the way that Union.ai is architected, tasks within a single workflow can differ along many dimensions. While the total number of ways that tasks can be configured is quite large, the options fall into three categories: * **Task type**: These include standard Python tasks, map tasks, raw container tasks, and many specialized plugin tasks. For more information, see **Core concepts > Tasks > Other task types**. * **Software environment**: Define the task container image, dependencies, and even programming language. For more information, see [Task software environment](./task-software-environment/_index). * **Hardware environment**: Define the resource requirements (processor numbers, storage amounts) and machine node characteristics (CPU and GPU type). For more information, see [Task hardware environment](./task-hardware-environment/_index). ### Mix and match task characteristics Along these three dimensions, you can mix and match characteristics to build a task definition that performs exactly the job you want, while still taking advantage of all the features provided at the workflow level like output caching, versioning, and reproducibility. Tasks with diverse characteristics can be combined into a single workflow. For example, a workflow might contain: * A **Python task running on your default container image** with default dependencies and a default resource and hardware profile. * A **Python task running on a container image with additional dependencies** configured to run on machine nodes with a specific type of GPU. * A **raw container task** running a Java process. * A **plugin task** running a Spark job that spawns its own cluster-in-a-cluster. * A **map task** that runs multiple copies of a Python task in parallel. The ability to build workflows from such a wide variety of heterogeneous tasks makes Union.ai uniquely flexible. > [!NOTE] > Not all parameters are compatible. For example, with specialized plugin task types, some configurations are > not available (this depends on task plugin details). ## Task configuration The `@union.task` decorator can take a number of parameters that allow you to configure the task's behavior. For example, you can specify the task's software dependencies, hardware requirements, caching behavior, retry behavior, and more. For more information, see **Core concepts > Tasks > Task parameters**. ## Subpages - **Core concepts > Tasks > Map Tasks** - **Core concepts > Tasks > Other task types** - **Core concepts > Tasks > Task parameters** - **Core concepts > Tasks > Launching tasks** - **Core concepts > Tasks > Viewing tasks** - **Core concepts > Tasks > Task software environment** - **Core concepts > Tasks > Viewing logs** - **Core concepts > Tasks > Reference tasks** - **Core concepts > Tasks > Task hardware environment** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/map-tasks === ## Map tasks A map task allows you to execute many instances of a task within a single workflow node. This enables you to execute a task across a set of inputs without having to create a node for each input, resulting in significant performance improvements. Map tasks find application in various scenarios, including: * When multiple inputs require running through the same code logic. * Processing multiple data batches concurrently. Just like normal tasks, map tasks are automatically parallelized to the extent possible given resources available in the cluster. ```python THRESHOLD = 11 @union.task def detect_anomalies(data_point: int) -> bool: return data_point > THRESHOLD @union.workflow def map_workflow(data: list[int] = [10, 12, 11, 10, 13, 12, 100, 11, 12, 10]) -> list[bool]: # Use the map task to apply the anomaly detection function to each data point return union.map(detect_anomalies)(data_point=data) ``` > [!NOTE] > Map tasks can also map over launch plans. For more information and example code, see **Core concepts > Launch plans > Mapping over launch plans**. To customize resource allocations, such as memory usage for individual map tasks, you can leverage `with_overrides`. Hereโ€™s an example using the `detect_anomalies` map task within a workflow: ```python import union @union.workflow def map_workflow_with_resource_overrides( data: list[int] = [10, 12, 11, 10, 13, 12, 100, 11, 12, 10] ) -> list[bool]: return ( union.map(detect_anomalies)(data_point=data) .with_overrides(requests=union.Resources(mem="2Gi")) ) ``` You can also configure `concurrency` and `min_success_ratio` for a map task: - `concurrency` limits the number of mapped tasks that can run in parallel to the specified batch size. If the input size exceeds the concurrency value, multiple batches will run serially until all inputs are processed. If left unspecified, it implies unbounded concurrency. - `min_success_ratio` determines the minimum fraction of total jobs that must complete successfully before terminating the map task and marking it as successful. ```python @union.workflow def map_workflow_with_additional_params( data: list[int] = [10, 12, 11, 10, 13, 12, 100, 11, 12, 10] ) -> list[typing.Optional[bool]]: return union.map( detect_anomalies, concurrency=1, min_success_ratio=0.75 )(data_point=data) ``` For more details see [Map Task example](https://github.com/unionai-oss/union-cloud-docs-examples/tree/main/map_task) in the `unionai-examples` repository and [Map Tasks]() section. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-types === # Other task types Task types include: * **`PythonFunctionTask`**: This Python class represents the standard default task. It is the type that is created when you use the `@union.task` decorator. * **`ContainerTask`**: This Python class represents a raw container. It allows you to install any image you like, giving you complete control of the task. * **Shell tasks**: Use them to execute `bash` scripts within Union.ai. * **Specialized plugin tasks**: These include both specialized classes and specialized configurations of the `PythonFunctionTask`. They implement integrations with third-party systems. ## PythonFunctionTask This is the task type that is created when you add the `@union.task` decorator to a Python function. It represents a Python function that will be run within a single container. For example:: ```python @union.task def get_data() -> pd.DataFrame: """Get the wine dataset.""" return load_wine(as_frame=True).frame ``` See the [Python Function Task example](https://github.com/unionai-oss/union-cloud-docs-examples/tree/main/python_function_task). This is the most common task variant and the one that, thus far, we have focused on in this documentation. ## ContainerTask This task variant represents a raw container, with no assumptions made about what is running within it. Here is an example of declaring a `ContainerTask`: ```python greeting_task = ContainerTask( name="echo_and_return_greeting", image="alpine:latest", input_data_dir="/var/inputs", output_data_dir="/var/outputs", inputs=kwtypes(name=str), outputs=kwtypes(greeting=str), command=["/bin/sh", "-c", "echo 'Hello, my name is {{.inputs.name}}.' | tee -a /var/outputs/greeting"], ) ``` The `ContainerTask` enables you to include a task in your workflow that executes arbitrary code in any language, not just Python. In the following example, the tasks calculate an ellipse area. This name has to be unique in the entire project. Users can specify: `input_data_dir` -> where inputs will be written to. `output_data_dir` -> where Union.ai will expect the outputs to exist. The `inputs` and `outputs` specify the interface for the task; thus it should be an ordered dictionary of typed input and output variables. The image field specifies the container image for the task, either as an image name or an ImageSpec. To access the file that is not included in the image, use ImageSpec to copy files or directories into container `/root`. Cache can be enabled in a ContainerTask by configuring the cache settings in the `TaskMetadata` in the metadata parameter. ```python calculate_ellipse_area_haskell = ContainerTask( name="ellipse-area-metadata-haskell", input_data_dir="/var/inputs", output_data_dir="/var/outputs", inputs=kwtypes(a=float, b=float), outputs=kwtypes(area=float, metadata=str), image="ghcr.io/flyteorg/rawcontainers-haskell:v2", command=[ "./calculate-ellipse-area", "{{.inputs.a}}", "{{.inputs.b}}", "/var/outputs", ], metadata=TaskMetadata(cache=True, cache_version="1.0"), ) calculate_ellipse_area_julia = ContainerTask( name="ellipse-area-metadata-julia", input_data_dir="/var/inputs", output_data_dir="/var/outputs", inputs=kwtypes(a=float, b=float), outputs=kwtypes(area=float, metadata=str), image="ghcr.io/flyteorg/rawcontainers-julia:v2", command=[ "julia", "calculate-ellipse-area.jl", "{{.inputs.a}}", "{{.inputs.b}}", "/var/outputs", ], metadata=TaskMetadata(cache=True, cache_version="1.0"), ) @workflow def wf(a: float, b: float): area_haskell, metadata_haskell = calculate_ellipse_area_haskell(a=a, b=b) area_julia, metadata_julia = calculate_ellipse_area_julia(a=a, b=b) ``` See the [Container Task example](https://github.com/unionai-oss/union-cloud-docs-examples/tree/main/container_task). ## Shell tasks Shell tasks enable the execution of shell scripts within Union.ai. To create a shell task, provide a name for it, specify the bash script to be executed, and define inputs and outputs if needed: ### Example ```python from pathlib import Path from typing import Tuple import union from flytekit import kwtypes from flytekit.extras.tasks.shell import OutputLocation, ShellTask t1 = ShellTask( name="task_1", debug=True, script=""" set -ex echo "Hey there! Let's run some bash scripts using a shell task." echo "Showcasing shell tasks." >> {inputs.x} if grep "shell" {inputs.x} then echo "Found it!" >> {inputs.x} else echo "Not found!" fi """, inputs=kwtypes(x=FlyteFile), output_locs=[OutputLocation(var="i", var_type=FlyteFile, location="{inputs.x}")], ) t2 = ShellTask( name="task_2", debug=True, script=""" set -ex cp {inputs.x} {inputs.y} tar -zcvf {outputs.j} {inputs.y} """, inputs=kwtypes(x=FlyteFile, y=FlyteDirectory), output_locs=[OutputLocation(var="j", var_type=FlyteFile, location="{inputs.y}.tar.gz")], ) t3 = ShellTask( name="task_3", debug=True, script=""" set -ex tar -zxvf {inputs.z} cat {inputs.y}/$(basename {inputs.x}) | wc -m > {outputs.k} """, inputs=kwtypes(x=FlyteFile, y=FlyteDirectory, z=FlyteFile), output_locs=[OutputLocation(var="k", var_type=FlyteFile, location="output.txt")], ) ``` Here's a breakdown of the parameters of the `ShellTask`: - The `inputs` parameter allows you to specify the types of inputs that the task will accept - The `output_locs` parameter is used to define the output locations, which can be `FlyteFile` or `FlyteDirectory` - The `script` parameter contains the actual bash script that will be executed (`{inputs.x}`, `{outputs.j}`, etc. will be replaced with the actual input and output values). - The `debug` parameter is helpful for debugging purposes We define a task to instantiate `FlyteFile` and `FlyteDirectory`. A `.gitkeep` file is created in the `FlyteDirectory` as a placeholder to ensure the directory exists: ```python @union.task def create_entities() -> Tuple[union.FlyteFile, union.FlyteDirectory]: working_dir = Path(union.current_context().working_directory) flytefile = working_dir / "test.txt" flytefile.touch() flytedir = working_dir / "testdata" flytedir.mkdir(exist_ok=True) flytedir_file = flytedir / ".gitkeep" flytedir_file.touch() return flytefile, flytedir ``` We create a workflow to define the dependencies between the tasks: ```python @union.workflow def shell_task_wf() -> union.FlyteFile: x, y = create_entities() t1_out = t1(x=x) t2_out = t2(x=t1_out, y=y) t3_out = t3(x=x, y=y, z=t2_out) return t3_out ``` You can run the workflow locally: ```python if __name__ == "__main__": print(f"Running shell_task_wf() {shell_task_wf()}") ``` ## Specialized plugin task classes and configs Union.ai supports a wide variety of plugin tasks. Some of these are enabled as specialized task classes, others as specialized configurations of the default `@union.task` (`PythonFunctionTask`). They enable things like: * Querying external databases (AWS Athena, BigQuery, DuckDB, SQL, Snowflake, Hive). * Executing specialized processing right in Union.ai (Spark in virtual cluster, Dask in Virtual cluster, Sagemaker, Airflow, Modin, Ray, MPI and Horovod). * Handing off processing to external services(AWS Batch, Spark on Databricks, Ray on external cluster). * Data transformation (Great Expectations, DBT, Dolt, ONNX, Pandera). * Data tracking and presentation (MLFlow, Papermill). See the [Integration section]() for examples. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-parameters === # Task parameters You pass the following parameters to the `@union.task` decorator: * `accelerator`: The accelerator to use for this task. For more information, see [Specifying accelerators](). * `cache`: See **Core concepts > Caching**. * `cache_serialize`: See **Core concepts > Caching**. * `cache_version`: See **Core concepts > Caching**. * `cache_ignore_input_vars`: Input variables that should not be included when calculating the hash for the cache. * `container_image`: See **Core concepts > ImageSpec**. * `deprecated`: A string that can be used to provide a warning message for deprecated task. The absence of a string, or an empty string, indicates that the task is active and not deprecated. * `docs`: Documentation about this task. * `enable_deck`: If true, this task will output a Deck which can be used to visualize the task execution. See **Development cycle > Decks**. ```python @union.task(enable_deck=True) def my_task(my_str: str): print("hello {my_str}") ``` * `environment`: See **Core concepts > Tasks > Task software environment > Environment variables**. * `interruptible`: See **Core concepts > Tasks > Task hardware environment > Interruptible instances**. * `limits`: See **Core concepts > Tasks > Task hardware environment > Customizing task resources**. * `node_dependency_hints`: A list of tasks, launch plans, or workflows that this task depends on. This is only for dynamic tasks/workflows, where Union.ai cannot automatically determine the dependencies prior to runtime. Even on dynamic tasks this is optional, but in some scenarios it will make registering the workflow easier, because it allows registration to be done the same as for static tasks/workflows. For example this is useful to run launch plans dynamically, because launch plans must be registered before they can be run. Tasks and workflows do not have this requirement. ```python @union.workflow def workflow0(): launchplan0 = LaunchPlan.get_or_create(workflow0) # Specify node_dependency_hints so that launchplan0 # will be registered on flyteadmin, despite this being a dynamic task. @union.dynamic(node_dependency_hints=[launchplan0]) def launch_dynamically(): # To run a sub-launchplan it must have previously been registered on flyteadmin. return [launchplan0]*10 ``` * `pod_template`: See **Core concepts > Tasks > Task parameters > Task hardware environment**. * `pod_template_name`: See **Core concepts > Tasks > Task parameters > Task hardware environment**. * `requests`: See **Core concepts > Tasks > Task hardware environment > Customizing task resources** * `retries`: Number of times to retry this task during a workflow execution. Tasks can define a retry strategy to let the system know how to handle failures (For example: retry 3 times on any kind of error). For more information, see **Core concepts > Tasks > Task hardware environment > Interruptible instances** There are two kinds of retries *system retries* and *user retries*. * `secret_requests`: See **Development cycle > Managing secrets** * `task_config`: Configuration for a specific task type. See the [Union.ai Connectors documentation](../../integrations/connectors) and [Union.ai plugins documentation]() for the right object to use. * `task_resolver`: Provide a custom task resolver. * `timeout`: The max amount of time for which one execution of this task should be executed for. The execution will be terminated if the runtime exceeds the given timeout (approximately). To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as `failure`. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout can be handled in the [TaskMetadata](). ## Use `partial` to provide default arguments to tasks You can use the `functools.partial` function to assign default or constant values to the parameters of your tasks: ```python import functools import union @union.task def slope(x: list[int], y: list[int]) -> float: sum_xy = sum([x[i] * y[i] for i in range(len(x))]) sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) n = len(x) return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) @union.workflow def simple_wf_with_partial(x: list[int], y: list[int]) -> float: partial_task = functools.partial(slope, x=x) return partial_task(y=y) ``` ## Named outputs By default, Union.ai employs a standardized convention to assign names to the outputs of tasks or workflows. Each output is sequentially labeled as `o1`, `o2`, `o3`, ... `on`, where `o` serves as the standard prefix, and `1`, `2`, ... `n` indicates the positional index within the returned values. However, Union.ai allows the customization of output names for tasks or workflows. This customization becomes beneficial when you're returning multiple outputs and you wish to assign a distinct name to each of them. The following example illustrates the process of assigning names to outputs for both a task and a workflow. Define a `NamedTuple` and assign it as an output to a task: ```python import union from typing import NamedTuple slope_value = NamedTuple("slope_value", [("slope", float)]) @union.task def slope(x: list[int], y: list[int]) -> slope_value: sum_xy = sum([x[i] * y[i] for i in range(len(x))]) sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) n = len(x) return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) ``` Likewise, assign a `NamedTuple` to the output of `intercept` task: ```python intercept_value = NamedTuple("intercept_value", [("intercept", float)]) @union.task def intercept(x: list[int], y: list[int], slope: float) -> intercept_value: mean_x = sum(x) / len(x) mean_y = sum(y) / len(y) intercept = mean_y - slope * mean_x return intercept ``` > [!NOTE] > While it's possible to create `NamedTuple`s directly within the code, > it's often better to declare them explicitly. This helps prevent potential linting errors in tools like mypy. > > ```python > def slope() -> NamedTuple("slope_value", slope=float): > pass > ``` You can easily unpack the `NamedTuple` outputs directly within a workflow. Additionally, you can also have the workflow return a `NamedTuple` as an output. > [!NOTE] > Remember that we are extracting individual task execution outputs by dereferencing them. > This is necessary because `NamedTuple`s function as tuples and require this dereferencing: ```python slope_and_intercept_values = NamedTuple("slope_and_intercept_values", [("slope", float), ("intercept", float)]) @union.workflow def simple_wf_with_named_outputs(x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> slope_and_intercept_values: slope_value = slope(x=x, y=y) intercept_value = intercept(x=x, y=y, slope=slope_value.slope) return slope_and_intercept_values(slope=slope_value.slope, intercept=intercept_value.intercept) ``` You can run the workflow locally as follows: ```python if __name__ == "__main__": print(f"Running simple_wf_with_named_outputs() {simple_wf_with_named_outputs()}") ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/launching-tasks === # Launching tasks From the **Core concepts > Tasks > Viewing tasks > Task view** (accessed, for example, by selecting a task in the **Core concepts > Tasks > Viewing tasks > Tasks list**) you can select **Launch Task** in the top right: This opens the **New Execution** dialog for tasks: ![](../../../_static/images/user-guide/core-concepts/tasks/launching-tasks/new-execution-dialog.png) The settings are similar to those for workflows. At the top you can select: * The specific version of this task that you want to launch. Along the left side the following sections are available: * **Inputs**: The input parameters of the task function appear here as fields to be filled in. * **Settings**: * **Execution name**: A custom name for this execution. If not specified, a name will be generated. * **Overwrite cached outputs**: A boolean. If set to `True`, this execution will overwrite any previously-computed cached outputs. * **Raw output data config**: Remote path prefix to store raw output data. By default, workflow output will be written to the built-in metadata storage. Alternatively, you can specify a custom location for output at the organization, project-domain, or individual execution levels. This field is for specifying this setting at the workflow execution level. If this field is filled in it overrides any settings at higher levels. The parameter is expected to be a URL to a writable resource (for example, `http://s3.amazonaws.com/my-bucket/`). See **Data input/output > Task input and output > Raw data store** **Max parallelism**: Number of workflow nodes that can be executed in parallel. If not specified, project/domain defaults are used. If 0 then no limit is applied. * **Force interruptible**: A three valued setting for overriding the interruptible setting of the workflow for this particular execution. If not set, the workflow's interruptible setting is used. If set and **enabled** then `interruptible=True` is used for this execution. If set and **disabled** then `interruptible=False` is used for this execution. See **Core concepts > Tasks > Task hardware environment > Interruptible instances** * **Service account**: The service account to use for this execution. If not specified, the default is used. * **Environment variables**: Environment variables that will be available to tasks in this workflow execution. * **Labels**: Labels to apply to the execution resource. * **Notifications**: **Core concepts > Launch plans > Notifications** configured for this workflow execution. * **Debug**: The workflow execution details for debugging purposes. Select **Launch** to launch the task execution. This will take you to the **Core concepts > Workflows > Viewing workflow executions**. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/viewing-tasks === # Viewing tasks ## Tasks list Selecting **Tasks** in the sidebar displays a list of all the registered tasks: ![Tasks list](../../../_static/images/user-guide/core-concepts/tasks/viewing-tasks/tasks-list.png) You can search the tasks by name and filter for only those that are archived. Each task in the list displays some basic information about the task: * **Inputs**: The input type for the task. * **Outputs**: The output type for the task. * **Description**: A description of the task. Select an entry on the list to go to that **Core concepts > Tasks > Viewing tasks > Task view**. ## Task view Selecting an individual task from the **Core concepts > Tasks > Viewing tasks > Tasks list** will take you to the task view: ![Task view](../../../_static/images/user-guide/core-concepts/tasks/viewing-tasks/task-view.png) Here you can see: * **Inputs & Outputs**: The input and output types for the task. * Recent task versions. Selecting one of these takes you to the **Core concepts > Tasks > Viewing tasks > Task view > Task versions list** * Recent executions of this task. Selecting one of these takes you to the **Core concepts > Workflows > Viewing workflow executions**. ### Task versions list The task versions list give you detailed information about a specific version of a task: ![Task versions list](../../../_static/images/user-guide/core-concepts/tasks/viewing-tasks/task-versions-list.png) * **Image**: The Docker image used to run this task. * **Env Vars**: The environment variables used by this task. * **Commands**: The JSON object defining this task. At the bottom is a list of all versions of the task with the current one selected. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-software-environment === # Task software environment The @union.task decorator provides the following parameters to specify the software environment in which a task runs: * `container_image`: Can be either a string referencing a specific image on a container repository, or an ImageSpec defining a build. See **Core concepts > Tasks > Task software environment > Local image building** for details. * `environment`: See **Core concepts > Tasks > Task software environment > Environment variables** for details. ## Subpages - **Core concepts > Tasks > Task software environment > Local image building** - **Core concepts > Tasks > Task software environment > ImageSpec with ECR** - **Core concepts > Tasks > Task software environment > ImageSpec with GAR** - **Core concepts > Tasks > Task software environment > ImageSpec with ACR** - **Core concepts > Tasks > Task software environment > Environment variables** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-software-environment/image-spec === # Local image building With Union.ai, every task in a workflow runs within its own dedicated container. Since a container requires a container image to run, every task in Union.ai must have a container image associated with it. You can specify the container image to be used by a task by defining an `ImageSpec` object and passing it to the `container_image` parameter of the `@union.task` decorator. When you register the workflow, the container image is built locally and pushed to the container registry that you specify. When the workflow is executed, the container image is pulled from that registry and used to run the task. > [!NOTE] > See the [ImageSpec API documentation]() for full documentation of `ImageSpec` class parameters and methods. To illustrate the process, we will walk through an example. ## Project structure ```shell โ”œโ”€โ”€ requirements.txt โ””โ”€โ”€ workflows โ”œโ”€โ”€ __init__.py โ””โ”€โ”€ imagespec-simple-example.py ``` ### requirements.txt ```shell union pandas ``` ### imagespec-simple-example.py ```python import typing import pandas as pd import union image_spec = union.ImageSpec( registry="ghcr.io/", name="simple-example-image", base_image="ghcr.io/flyteorg/flytekit:py3.11-latest", requirements="requirements.txt" ) @union.task(container_image=image_spec) def get_pandas_dataframe() -> typing.Tuple[pd.DataFrame, pd.Series]: df = pd.read_csv("https://storage.googleapis.com/download.tensorflow.org/data/heart.csv") print(df.head()) return df[["age", "thalach", "trestbps", "chol", "oldpeak"]], df.pop("target") @union.workflow() def wf() -> typing.Tuple[pd.DataFrame, pd.Series]: return get_pandas_dataframe() ``` ## Install and configure `union` and Docker To install Docker, see **Getting started > Local setup > Install Docker and get access to a container registry**. To configure `union` to connect to your Union.ai instance, see [Getting started](../../../getting-started/_index). ## Set up an image registry You will need an image registry where the container image can be stored and pulled by Union.ai when the task is executed. You can use any image registry that you have access to, including public registries like Docker Hub or GitHub Container Registry. Alternatively, you can use a registry that is part of your organization's infrastructure such as AWS Elastic Container Registry (ECR) or Google Artifact Registry (GAR). The registry that you choose must be one that is accessible to the Union.ai instance where the workflow will be executed. Additionally, you will need to ensure that the specific image, once pushed to the registry, is itself publicly accessible. In this example, we use GitHub's `ghcr.io` container registry. See [Working with the Container registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry) for more information. * For an example using Amazon ECR see **Core concepts > Tasks > Task software environment > ImageSpec with ECR**. * For an example using Google Artifact Registry see **Core concepts > Tasks > Task software environment > ImageSpec with GAR**. * For an example using Azure Container Registry see **Core concepts > Tasks > Task software environment > ImageSpec with ACR**. ## Authenticate to the registry You will need to set up your local Docker client to authenticate with GHCR. This is needed for `union` CLI to be able to push the image built according to the `ImageSpec` to GHCR. Follow the directions [Working with the Container registry > Authenticating to the Container registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry). ## Set up your project and domain on Union.ai You will need to set up a project on your Union.ai instance to which you can register your workflow. See **Development cycle > Setting up a production project**. ## Understand the requirements The `requirements.txt` file contains the `union` package and the `pandas` package, both of which are needed by the task. ## Set up a virtual Python environment Set up a virtual Python environment and install the dependencies defined in the `requirements.txt` file. Assuming you are in the local project root, run `pip install -r requirements.txt`. ## Run the workflow locally You can now run the workflow locally. In the project root directory, run: `union run workflows/imagespec-simple-example.py wf`. See **Development cycle > Running your code** for more details. > [!NOTE] > When you run the workflow in your local Python environment, the image is not built or pushed (in fact, no container image is used at all). ## Register the workflow To register the workflow to Union.ai, in the local project root, run: ```shell $ union register workflows/imagespec-simple-example.py ``` `union` will build the container image and push it to the registry that you specified in the `ImageSpec` object. It will then register the workflow to Union.ai. To see the registered workflow, go to the UI and navigate to the project and domain that you created above. ## Ensure that the image is publicly accessible If you are using the `ghcr.io` image registry, you must switch the visibility of your container image to Public before you can run your workflow on Union.ai. See [Configuring a package's access control and visibility](https://docs.github.com/en/packages/learn-github-packages/configuring-a-packages-access-control-and-visibility#about-inheritance-of-access-permissions-and-visibility). ## Run the workflow on Union.ai Assuming your image is publicly accessible, you can now run the workflow on Union.ai by clicking **Launch Workflow**. > [!WARNING] Make sure your image is accessible > If you try to run a workflow that uses a private container image or an image that is inaccessible for some other reason, the system will return an error: > > ``` > ... Failed to pull image ... > ... Error: ErrImagePull > ... Back-off pulling image ... > ... Error: ImagePullBackOff > ``` ## Multi-image workflows You can also specify different images per task within the same workflow. This is particularly useful if some tasks in your workflow have a different set of dependencies where most of the other tasks can use another image. In this example we specify two tasks: one that uses CPUs and another that uses GPUs. For the former task, we use the default image that ships with union while for the latter task, we specify a pre-built image that enables distributed training with the Kubeflow Pytorch integration. ```python import numpy as np import torch.nn as nn @task( requests=Resources(cpu="2", mem="16Gi"), container_image="ghcr.io/flyteorg/flytekit:py3.9-latest", ) def get_data() -> Tuple[np.ndarray, np.ndarray]: ... # get dataset as numpy ndarrays @task( requests=Resources(cpu="4", gpu="1", mem="16Gi"), container_image="ghcr.io/flyteorg/flytecookbook:kfpytorch-latest", ) def train_model(features: np.ndarray, target: np.ndarray) -> nn.Module: ... # train a model using gpus ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-software-environment/image-spec-with-ecr === # ImageSpec with ECR In this section we explain how to set up and use AWS Elastic Container Registry (ECR) to build and deploy task container images using `ImageSpec`. ## Prerequisites If you are using ECR in the same AWS account as your Union.ai data plane, then you do not need to configure anything. Access to ECR in the same account is enabled by default. If you want to store your task container images in an ECR instance in an AWS account _other than the one that holds your data plane_, then you will have to configure that ECR instance to permit access from your data plane. See **Enabling AWS resources > Enabling AWS ECR** for details. ## Set up the image repository Unlike GitHub Container Registry, ECR does not allow you to simply push an arbitrarily named image to the registry. Instead, you must first create a repository in the ECR instance and then push the image to that repository. > [!NOTE] Registry, repository, and image > In ECR terminology the **registry** is the top-level storage service. The registry holds a collection of **repositories**. > Each repository corresponds to a named image and holds all versions of that image. > > When you push an image to a registry, you are actually pushing it to a repository within that registry. > Strictly speaking, the term *image* refers to a specific *image version* within that repository. This means that you have to decide on the name of your image and create a repository by that name first, before registering your workflow. We will assume the following: * The ECR instance you will be using has the base URL `123456789012.dkr.ecr.us-east-1.amazonaws.com`. * Your image will be called `simple-example-image`. In the AWS console, go to **Amazon ECR > Repositories** and find the correct ECR registry If you are in the same account as your Union.ai data plane you should go directly to the ECR registry that was set up for you by Union.ai. If there are multiple ECR registries present, consult with your Union.ai administrator to find out which one to use. Under **Create a Repository**, click **Get Started**: ![](../../../../_static/images/user-guide/core-concepts/tasks/task-software-environment/imagespec-with-ecr/create-repository-1.png) On the **Create repository** page: * Select **Private** for the repository visibility, assuming you want to make it private. You can, alternatively, select **Public**, but in most cases, the main reason for using ECR is to keep your images private. * Enter the name of the repository: ![](../../../../_static/images/user-guide/core-concepts/tasks/task-software-environment/imagespec-with-ecr/create-repository-2.png) and then scroll down to click **Create repository**: ![](../../../../_static/images/user-guide/core-concepts/tasks/task-software-environment/imagespec-with-ecr/create-repository-3.png) Your repository is now created. ## Authenticate to the registry You will need to set up your local Docker client to authenticate with ECR. This is needed for `union` to be able to push the image built according to the `ImageSpec` to ECR. To do this, you will need to [install the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html), use it to run the `aws ecr get-login-password` command to get the appropriate password, then perform a `docker login` with that password. See [Private registry authentication](https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html) for details. ## Register your workflow to Union.ai You can register tasks with `ImageSpec` declarations that reference this repository. For example, to use the example repository shown here, we would alter the Python code in the **Core concepts > Tasks > Task software environment**, to have the following `ImageSpec` declaration: ```python image_spec = union.ImageSpec( registry="123456789012.dkr.ecr.us-eas-1.amazonaws.com", name="simple-example-image", base_image="ghcr.io/flyteorg/flytekit:py3.11-latest", requirements="image-requirements.txt" ) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-software-environment/image-spec-with-gar === # ImageSpec with GAR In this section we explain how to set up and use Google Artifact Registry (GAR) to build and deploy task container images using `ImageSpec`. ## Prerequisites If you are using GAR in the same Google Cloud Platform (GCP) project as your Union.ai data plane, then you do not need to configure anything. Access to GAR in the same project is enabled by default. If you want to store your task container images in a GAR repository in a GCP project _other than the one that holds your data plane_, you must enable the node pool of your data plane to access that GAR. See **Enabling GCP resources > Enabling Google Artifact Registry** for details. ## Set up the image repository Unlike GitHub Container Registry, GAR does not allow you to simply push an arbitrarily named image to the registry. Instead, you must first create a repository in the GAR instance and then push the image to that repository. > [!NOTE] Registry, repository, and image > In GAR terminology the **registry** is the top-level storage service. The registry holds a collection of **repositories**. > Each repository in turn holds some number of images, and each specific image name can have different versions. > > Note that this differs from the arrangement in AWS ECR where the repository name and image name are essentially the same. > > When you push an image to GAR, you are actually pushing it to an image name within a repository within that registry. > Strictly speaking, the term *image* refers to a specific *image version* within that repository. This means that you have to decide on the name of your repository and create it, before registering your workflow. You can, however, decide on the image name later, when you push the image to the repository. We will assume the following: * The GAR instance you will be using has the base URL `us-east1-docker.pkg.dev/my-union-dataplane/my-registry/`. * Your repository will be called `my-image-repository`. * Your image will be called `simple-example-image`. In the GCP console, within your Union.ai data plane project, go to **Artifact Registry**. You should see a list of repositories. The existing ones are used internally by Union.ai. For your own work you should create a new one. Click **Create Repository**: ![](../../../../_static/images/user-guide/core-concepts/tasks/task-software-environment/imagespec-with-gar/gar-create-repository-1.png) On the **Create repository** page, * Enter the name of the repository. In this example it would be `my-image-repository`. * Select **Docker** for the artifact type. * Select the region. If you want to access the GAR without further configuration, make sure this the same region as your Union.ai data plane. * Click **Create**: ![](../../../../_static/images/user-guide/core-concepts/tasks/task-software-environment/imagespec-with-gar/gar-create-repository-2.png) Your GAR repository is now created. ## Authenticate to the registry You will need to set up your local Docker client to authenticate with GAR. This is needed for `union` to be able to push the image built according to the `ImageSpec` to GAR. Directions can be found in the GAR console interface. Click on **Setup Instructions**: ![](../../../../_static/images/user-guide/core-concepts/tasks/task-software-environment/imagespec-with-gar/gar-setup-instructions.png) The directions are also reproduced below. (We show the directions for the `us-east1` region. You may need to adjust the command accordingly): > [!NOTE] Setup Instructions > Follow the steps below to configure your client to push and pull packages using this repository. > You can also [view more detailed instructions here](https://cloud.google.com/artifact-registry/docs/docker/authentication?authuser=1). > For more information about working with artifacts in this repository, see the [documentation](https://cloud.google.com/artifact-registry/docs/docker?authuser=1). > > **Initialize gcloud** > > The [Google Cloud SDK](https://cloud.google.com/sdk/docs/?authuser=1) is used to generate an access token when authenticating with Artifact Registry. > Make sure that it is installed and initialized with [Application Default Credentials](https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login?authuser=1) before proceeding. > > **Configure Docker** > > Run the following command to configure `gcloud` as the credential helper for the Artifact Registry domain associated with this repository's location: > > ```shell > $ gcloud auth configure-docker us-east1-docker.pkg.dev > ``` ## Register your workflow to Union.ai You can now register tasks with `ImageSpec` declarations that reference this repository. For example, to use the example GAR repository shown here, we would alter the Python code in the **Core concepts > Tasks > Task software environment**, to have the following `ImageSpec` declaration: ```python image_spec = union.ImageSpec( registry="us-east1-docker.pkg.dev/my-union-dataplane/my-registry/my-image-repository", name="simple-example-image", base_image="ghcr.io/flyteorg/flytekit:py3.11-latest", requirements="image-requirements.txt" ) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-software-environment/image-spec-with-acr === # ImageSpec with ACR In this section we explain how to use [Azure Container Registry (ACR)](https://azure.microsoft.com/en-us/products/container-registry) to build and deploy task container images using `ImageSpec`. Before proceeding, make sure that you have [enabled Azure Container Registry](../../../integrations/enabling-azure-resources/enabling-azure-container-registry) for you Union.ai installation. ## Authenticate to the registry Authenticate with the container registry ```bash az login az acr login --name ``` Refer to [Individual login with Microsoft Entra ID](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-authentication?tabs=azure-cli#individual-login-with-microsoft-entra-id) in the Azure documentation for additional details. ## Register your workflow to Union.ai You can now register tasks with `ImageSpec` declarations that reference this repository. For example, to use an existing ACR repository, we would alter the Python code in the **Core concepts > Tasks > Task software environment**, to have the following `ImageSpec` declaration: ```python image_spec = union.ImageSpec( registry=".azurecr.io", name="my-repository/simple-example-image", base_image="ghcr.io/flyteorg/flytekit:py3.11-latest", requirements="image-requirements.txt" ) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-software-environment/environment-variables === # Environment variables The `environment` parameter lets you specify the values of any variables that you want to be present within the task container execution environment. For example: ```python @union.task(environment={"MY_ENV_VAR": "my_value"}) def my_task() -> str: return os.environ["MY_ENV_VAR"] ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/viewing-logs === # Viewing logs In the **Core concepts > Workflows > Viewing workflow executions**, selecting a task from the list in the **Nodes** tab will open the task details in the right panel. ## Cloud provider logs In addition to the **Task Logs** link, you will also see a link to your cloud provider's logs (**Cloudwatch Logs** for AWS, **Stackdriver Logs** for GCP, and **Azure Logs** for Azure): ![Cloud provider logs link](../../../_static/images/user-guide/core-concepts/tasks/viewing-logs/cloud-provider-logs-link.png) Assuming you are logged into your cloud provider account with the appropriate permissions, this link will take you to the logs specific to the container in which this particular task execution is running. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/reference-tasks === # Reference tasks A reference_task references tasks that have already been defined, serialized, and registered. You can reference tasks from other projects and create workflows that use tasks declared by others. These tasks can be in their own containers, python runtimes, flytekit versions, and even different languages. > [!NOTE] > Reference tasks cannot be run locally. To test locally, mock them out. ## Example 1. Create a file called `task.py` and insert this content into it: ```python import union @union.task def add_two_numbers(a: int, b: int) -> int: return a + b ``` 2. Register the task: ```shell $ union register --project flytesnacks --domain development --version v1 task.py ``` 3. Create a separate file `wf_ref_task.py` and copy the following code into it: ```python from flytekit import reference_task @reference_task( project="flytesnacks", domain="development", name="task.add_two_numbers", version="v1", ) def add_two_numbers(a: int, b: int) -> int: ... @union.workflow def wf(a: int, b: int) -> int: return add_two_numbers(a, b) ``` 4. Register the `wf` workflow: ```shell $ union register --project flytesnacks --domain development wf_ref_task.py ``` 5. In the Union.ai UI, run the workflow `wf_ref_task.wf`. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-hardware-environment === # Task hardware environment ## Customizing task resources You can customize the hardware environment in which your task code executes. Depending on your needs, there are two different of ways to define and register tasks with their own custom hardware requirements: * Configuration in the `@union.task` decorator * Defining a `PodTemplate` ### Using the `@union.task` decorator You can specify `requests` and `limits` on: * CPU number * GPU number * Memory size * Ephemeral storage size See **Core concepts > Tasks > Task hardware environment > Customizing task resources** for details. ### Using PodTemplate If your needs are more complex, you can use Kubernetes-level configuration to constrain a task to only run on a specific machine type. This requires that you coordinate with Union.ai to set up the required machine types and node groups with the appropriate node assignment configuration (node selector labels, node affinities, taints, tolerations, etc.) In your task definition you then use a `PodTemplate` that uses the matching node assignment configuration to make sure that the task will only be scheduled on the appropriate machine type. ### `pod_template` and `pod_template_name` @union.task parameters The `pod_template` parameter can be used to supply a custom Kubernetes `PodTemplate` to the task. This can be used to define details about node selectors, affinity, tolerations, and other Kubernetes-specific settings. The `pod_template_name` is a related parameter that can be used to specify the name of an already existing `PodTemplate` resource which will be used in this task. For details see [Configuring task pods with Kubernetes PodTemplates](). ## Accelerators If you specify GPUs, you can also specify the type of GPU to be used by setting the `accelerator` parameter. See **Core concepts > Tasks > Task hardware environment > Accelerators** for more information. ## Task-level monitoring You can also monitor the hardware resources used by a task. See **Core concepts > Tasks > Task hardware environment > Task-level monitoring** for details. ## Subpages - **Core concepts > Tasks > Task hardware environment > Customizing task resources** - **Core concepts > Tasks > Task hardware environment > Accelerators** - **Core concepts > Tasks > Task hardware environment > Retries and timeouts** - **Core concepts > Tasks > Task hardware environment > Interruptible instances** - **Core concepts > Tasks > Task hardware environment > Task-level monitoring** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-hardware-environment/customizing-task-resources === # Customizing task resources When defining a task function, you can specify resource requirements for the pod that runs the task. Union.ai will take this into account to ensure that the task pod is scheduled to run on a Kubernetes node that meets the specified resource profile. Resources are specified in the `@union.task` decorator. Here is an example: ```python from flytekit.extras.accelerators import A100 @union.task( requests=Resources(mem="120Gi", cpu="44", ephemeral_storage="100Gi"), limits=Resources(mem="200Gi", cpu="100", gpu="12", ephemeral_storage="200Gi"), accelerator=GPUAccelerator("nvidia-tesla-a100") ) def my_task() ... ``` There are three separate resource-related settings: * `requests` * `limits` * `accelerator` ## The `requests` and `limits` settings The `requests` and `limits` settings each takes a [`Resource`]() object, which itself has five possible attributes: * `cpu`: Number of CPU cores (in whole numbers or millicores (`m`)). * `gpu`: Number of GPU cores (in whole numbers or millicores (`m`)). * `mem`: Main memory (in `Mi`, `Gi`, etc.). * `ephemeral_storage`: Ephemeral storage (in `Mi`, `Gi` etc.). Note that CPU and GPU allocations can be specified either as whole numbers or in millicores (`m`). For example, `cpu="2500m"` means two and a half CPU cores and `gpu="3000m"`, meaning three GPU cores. The type of ephemeral storage used depends on the node type and configuration you request from the Union.ai team. By default, all nodes will use network-attached storage for ephemeral storage. However, if a node type has attached NVMe SSD storage, you can request that the Union.ai team configure your cluster to use the attached NVMe as ephemeral storage for that node type. The `requests` setting tells the system that the task requires _at least_ the resources specified and therefore the pod running this task should be scheduled only on a node that meets or exceeds the resource profile specified. The `limits` setting serves as a hard upper bound on the resource profile of nodes to be scheduled to run the task. The task will not be scheduled on a node that exceeds the resource profile specified (in any of the specified attributes). > [!NOTE] GPUs take only `limits` > GPUs should only be specified in the `limits` section of the task decorator: > * You should specify GPU requirements only in `limits`, not in `requests`, because Kubernetes will use the `limits` value as the `requests` value anyway. > * You _can_ specify GPU in both `limits` and `requests` but the two values must be equal. > * You cannot specify GPU `requests` without specifying `limits`. ## The `accelerator` setting The `accelerator` setting further specifies the *type* of specialized hardware required for the task. This can be a GPU, a specific variation of a GPU, a fractional GPU, or a different hardware device, such as a TPU. See **Core concepts > Tasks > Task hardware environment > Accelerators** for more information. ## Execution defaults and resource quotas The execution defaults and resource quotas can be found on the right sidebar of the Dashboard. They can be edited by selecting the gear icon: ![](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/customizing-task-resources/execution-defaults-gear.png) This will open a dialog: ![](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/customizing-task-resources/execution-defaults-dialog.png) > [!NOTE] > An ephemeral storage default value of zero means that the task pod will consume storage on the node as needed. > This makes it possible for a pod to get evicted if a node doesn't have enough storage. If your tasks are built to rely on > ephemeral storage, we recommend being explicit with the ephemeral storage you request to avoid pod eviction. ## Task resource validation If you attempt to execute a workflow with unsatisfiable resource requests, the execution will fail immediately rather than being allowed to queue forever. To remedy such a failure, you should make sure that the appropriate node types are: * Physically available in your cluster, meaning you have arranged with the Union.ai team to include them when **Configuring your data plane**. * Specified in the task decorator (via the `requests`, `limits`, `accelerator`, or other parameters). Go to the **Resources > Compute** dashboard to find the available node types and their resource profiles. To make changes to your cluster configuration, go to the [Union.ai Support Portal](https://get.support.union.ai/servicedesk/customer/portal/1/group/6/create/30). ## The `with_overrides` method When `requests`, `limits`, or `accelerator` are specified in the `@union.task` decorator, they apply every time that a task is invoked from a workflow. In some cases, you may wish to change the resources specified from one invocation to another. To do that, use the [`with_overrides` method](../../../../api-reference/flytekit-sdk/packages/flytekit.core.node#with_overrides) of the task function. For example: ```python @union.task def my_task(ff: FlyteFile): ... @union.workflow def my_workflow(): my_task(ff=smallFile) my_task(ff=bigFile).with_overrides(requests=Resources(mem="120Gi", cpu="10")) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-hardware-environment/accelerators === # Accelerators > [!NOTE] _Accelerators_ and _Accelerated datasets_ are entirely different things > An accelerator, in Union.ai, is a specialized hardware device that is used to accelerate the execution of a task. > **Data input/output > Accelerated datasets**, on the other hand, is a Union.ai feature that enables quick access to large datasets from within a task. > These concepts are entirely different and should not be confused. Union.ai allows you to specify **Core concepts > Tasks > Task hardware environment > Customizing task resources** for the number of GPUs available for a given task. However, in some cases, you may want to be more specific about the type of GPU or other specialized device to be used. You can use the `accelerator` parameter to specify specific GPU types, variations of GPU types, fractional GPUs, or other specialized hardware devices such as TPUs. Your Union.ai installation will come pre-configured with the GPUs and other hardware that you requested during onboarding. Each device type has a constant name that you can use to specify the device in the `accelerator` parameter. For example: ```python from flytekit.extras.accelerators import A100 @union.task( limits=Resources(gpu="1"), accelerator=A100, ) def my_task(): ... ``` ## Finding your available accelerators You can find the accelerators available in your Union.ai installation by going to the **Usage > Compute** dashboard in the UI. In the **Accelerators** section, you will see a list of available accelerators and the named constants to be used in code to refer to them. ## Requesting the provisioning of accelerators If you need a specific accelerator that is not available in your Union.ai installation, you can request it by contacting the Union.ai team. Just click on the **Adjust Configuration** button under **Usage** in the UI (or go [here](https://get.support.union.ai/servicedesk/customer/portal/1/group/6/create/30)). ## Using predefined accelerator constants There are a number of predefined accelerator constants available in the `flytekit.extras.accelerators` module. The predefined list is not exhaustive, but it includes the most common accelerators. If you know the name of the accelerator, but there is no predefined constant for it, you can simply pass the string name to the task decorator directly. Note that in order for a specific accelerator to be available in your Union.ai installation, it must have been provisioned by the Union.ai team. If using the constants, you can import them directly from the module, e.g.: ```python from flytekit.extras.accelerators import T4 @union.task( limits=Resources(gpu="1"), accelerator=T4, ) def my_task(): ... ``` if you want to use a fractional GPU, you can use the `partitioned` method on the accelerator constant, e.g.: ```python from flytekit.extras.accelerators import A100 @union.task( limits=Resources(gpu="1"), accelerator=A100.partition_2g_10gb, ) def my_task(): ... ``` ## List of predefined accelerator constants * `A10G`: [NVIDIA A10 Tensor Core GPU](https://www.nvidia.com/en-us/data-center/products/a10-gpu/) * `L4`: [NVIDIA L4 Tensor Core GPU](https://www.nvidia.com/en-us/data-center/l4/) * `K80`: [NVIDIA Tesla K80 GPU](https://www.nvidia.com/en-gb/data-center/tesla-k80/) * `M60`: [NVIDIA Tesla M60 GPU](https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/solutions/resources/documents1/nvidia-m60-datasheet.pdf) * `P4`: [NVIDIA Tesla P4 GPU](https://images.nvidia.com/content/pdf/tesla/184457-Tesla-P4-Datasheet-NV-Final-Letter-Web.pdf) * `P100`: [NVIDIA Tesla P100 GPU](https://www.nvidia.com/en-us/data-center/tesla-p100/) * `T4`: [NVIDIA T4 Tensor Core GPU](https://www.nvidia.com/en-us/data-center/tesla-t4/) * `V100` [NVIDIA Tesla V100 GPU](https://www.nvidia.com/en-us/data-center/tesla-v100/) * `A100`: An entire [NVIDIA A100 GPU](https://www.nvidia.com/en-us/data-center/a100/). Fractional partitions are also available: * `A100.partition_1g_5gb`: 5GB partition of an A100 GPU. * `A100.partition_2g_10gb`: 10GB partition of an A100 GPU - 2x5GB slices with 2/7th of the SM (streaming multiprocessor). * `A100.partition_3g_20gb`: 20GB partition of an A100 GPU - 4x5GB slices, with 3/7th fraction of the SM. * `A100.partition_4g_20gb`: 20GB partition of an A100 GPU - 4x5GB slices, with 4/7th fraction of the SM. * `A100.partition_7g_40gb`: 40GB partition of an A100 GPU - 8x5GB slices, with 7/7th fraction of the SM. * `A100_80GB`: An entire [NVIDIA A100 80GB GPU](https://www.nvidia.com/en-us/data-center/a100/). Fractional partitions are also available: * `A100_80GB.partition_1g_10gb`: 10GB partition of an A100 80GB GPU - 2x5GB slices with 1/7th of the SM (streaming multiprocessor). * `A100_80GB.partition_2g_20gb`: 2GB partition of an A100 80GB GPU - 4x5GB slices with 2/7th of the SM. * `A100_80GB.partition_3g_40gb`: 3GB partition of an A100 80GB GPU - 8x5GB slices with 3/7th of the SM. * `A100_80GB.partition_4g_40gb`: 4GB partition of an A100 80GB GPU - 8x5GB slices with 4/7th of the SM. * `A100_80GB.partition_7g_80gb`: 7GB partition of an A100 80GB GPU - 16x5GB slices with 7/7th of the SM. For more information on partitioning, see [Partitioned GPUs](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#partitioning). === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-hardware-environment/retries-and-timeouts === # Retries and timeouts ## Retry types Union.ai allows you to automatically retry failing tasks. This section explains the configuration and application of retries. Errors causing task failure are categorized into two main types, influencing the retry logic differently: * `SYSTEM`: These errors arise from infrastructure-related failures, such as hardware malfunctions or network issues. They are typically transient and can often be resolved with a retry. * `USER`: These errors are due to issues in the user-defined code, like a value error or a logic mistake, which usually require code modifications to resolve. ## Configuring retries Retries in Union.ai are configurable to address both `USER` and `SYSTEM` errors, allowing for tailored fault tolerance strategies: `USER` error can be handled by setting the `retries` attribute in the task decorator to define how many times a task should retry. This requires a `FlyteRecoverableException` to be raised in the task definition, any other exception will not be retried: ```python import random from flytekit import task from flytekit.exceptions.user import FlyteRecoverableException @task(retries=3) def compute_mean(data: List[float]) -> float: if random() < 0.05: raise FlyteRecoverableException("Something bad happened ๐Ÿ”ฅ") return sum(data) / len(data) ``` ## Retrying interruptible tasks Tasks marked as interruptible can be preempted and retried without counting against the USER error budget. This is useful for tasks running on preemptible compute resources like spot instances. See **Core concepts > Tasks > Task hardware environment > Interruptible instances** ## Retrying map tasks For map tasks, the interruptible behavior aligns with that of regular tasks. The retries field in the task annotation is not necessary for handling SYSTEM errors, as these are managed by the platformโ€™s configuration. Alternatively, the USER budget is set by defining retries in the task decorator. See **Core concepts > Tasks > Map Tasks**. ## Timeouts To protect against zombie tasks that hang due to system-level issues, you can supply the timeout argument to the task decorator to make sure that problematic tasks adhere to a maximum runtime. In this example, we make sure that the task is terminated after itโ€™s been running for more that one hour. ```python from datetime import timedelta @task(timeout=timedelta(hours=1)) def compute_mean(data: List[float]) -> float: return sum(data) / len(data) ``` Notice that the timeout argument takes a built-in Python `timedelta` object. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-hardware-environment/interruptible-instances === # Interruptible instances > [!NOTE] > In AWS, the term *spot instance* is used. > In GCP, the equivalent term is *spot vm*. > Here we use the term *interruptible instance* generically for both providers. An interruptible instance is a machine instance made available to your cluster by your cloud provider that is not guaranteed to be always available. As a result, interruptible instances are cheaper than regular instances. In order to use an interruptible instance for a compute workload you have to be prepared for the possibility that an attempt to run the workload could fail due to lack of available resources and will need to be retried. When onboarding your organization onto Union.ai, you **Configuring your data plane**. Among the options available is the choice of whether to use interruptible instances. For each interruptible instance node group that you specify, an additional on-demand node group (though identical in every other respect to the interruptible one) will also be configured. This on-demand node group will be used as a fallback when attempts to complete the task on the interruptible instance have failed. ## Configuring tasks to use interruptible instances To schedule tasks on interruptible instances and retry them if they fail, specify the `interruptible` and `retries` parameters in the `@union.task` decorator. For example: ```python @union.task(interruptible=True, retries=3) ``` * A task will only be scheduled on an interruptible instance if it has the parameter `interruptible=True` (or if its workflow has the parameter `interruptible=True` and the task does not have an explicit `interruptible` parameter). * An interruptible task, like any other task, can have a `retries` parameter. * If an interruptible task does not have an explicitly set `retries` parameter, then the `retries` value defaults to `1`. * An interruptible task with `retries=n` will be attempted `n` times on an interruptible instance. If it still fails after `n` attempts, the final (`n+1`) retry will be done on the fallback on-demand instance. ## Workflow level interruptible Interruptible is also available **Core concepts > Workflows**. If you set it there, it will apply to all tasks in the workflow that do not themselves have an explicit value set. A task-level interruptible setting always overrides whatever the workflow-level setting is. ## Advantages and disadvantages of interruptible instances The advantage of using interruptible instance for a task is simply that it is less costly than using an on-demand instance (all other parameters being equal). However, there are two main disadvantages: 1. The task is successfully scheduled on an interruptible instance but is interrupted. In the worst case scenario, for `retries=n` the task may be interrupted `n` times until, finally, the fallback on-demand instance is used. Clearly, this may be a problem for time-critical tasks. 2. Interruptible instances of the selected node type may simply be unavailable on the initial attempt to schedule. When this happens, the task may hang indefinitely until an interruptible instance becomes available. Note that this is a distinct failure mode from the previous one where an interruptible node is successfully scheduled but is then interrupted. In general, we recommend that you use interruptible instances whenever available, but only for tasks that are not time-critical. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/tasks/task-hardware-environment/task-level-monitoring === # Task-level monitoring In the **Core concepts > Workflows > Viewing workflow executions**, selecting a task within the list will open the right panel. In that panel, you will find the **View Utilization** button: ![View Utilization](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/task-level-monitoring/execution-view-right-panel-executions-view-util.png) Clicking this will take you to the **task-level monitoring** page: ![Task-level monitoring](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/task-level-monitoring/task-level-monitoring.png) ## Execution Resources This tab displays details about the resources used by this specific task. As an example, let's say that the definition of this task in your Python code has the following task decorator: ```python @union.task( requests=Resources(cpu="44", mem="120Gi"), limits=Resources(cpu="44", mem="120Gi") ) ``` These parameters are reflected in the displayed **Memory Quota** and **CPU Cores Quota** charts as explained below: ### Memory Quota ![Memory Quota](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/task-level-monitoring/task-level-monitoring-memory-quota.png) This chart shows the memory consumption of the task. * **Limit** refers to the value of the `limits.mem` parameter (the `mem` parameter within the `Resources` object assigned to `limits`) * **Allocated** refers to the maximum of the value of the `requests.mem` parameter (the `mem` parameter within the `Resources` object assigned to `requests`) the amount of memory actually used by the task. * **Used** refers to the actual memory used by the task. This chart displays the ratio of memory used over memory requested, as a percentage. Since the memory used can sometimes exceed the memory requested, this percentage may exceed 100. ### CPU Cores Quota ![CPU Cores Quota](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/task-level-monitoring/task-level-monitoring-cpu-cores-quota.png) This chart displays the number of CPU cores being used. * **Limit** refers to the value of the `limits.cpu` parameter (the `cpu` parameter within the `Resources` object assigned to `limits`) * **Allocated** refers to the value of the `requests.cpu` parameter (the `cpu` parameter within the `Resources` object assigned to `requests`) * **Used** refers to the actual number of CPUs used by the task. ### GPU Memory Utilization ![GPU Memory Utilization](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/task-level-monitoring/task-level-monitoring-gpu-memory-utilization.png) This chart displays the amount of GPU memory used for each GPU. ### GPU Utilization ![GPU Utilization](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/task-level-monitoring/task-level-monitoring-gpu-utilization.png) This chart displays the GPU core utilization as a percentage of the GPUs allocated (the `requests.gpu` parameter). ## Execution Logs (Preview) ![Execution Logs (Preview)](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/task-level-monitoring/task-level-monitoring-execution-logs.png) This tab is a preview feature that displays the `stdout` (the standard output) of the container running the task. Currently, it only shows content while the task is actually running. ## Map Tasks When the task you want to monitor is a **map task**, accessing the utilization data is a bit different. Here is the task execution view of map task. Open the drop-down to reveal each subtask within the map task: ![](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/task-level-monitoring/map-task-1.png) Drill down by clicking on one of the subtasks: ![](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/task-level-monitoring/map-task-2.png) This will bring you to the individual subtask information panel, where the **View Utilization** button for the subtask can be found: ![](../../../../_static/images/user-guide/core-concepts/tasks/task-hardware-environment/task-level-monitoring/map-task-3.png) Clicking on View Utilization will take you to the task-level monitoring page for the subtask, which will have the same structure and features as the task-level monitoring page for a standard task (see above). === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/launch-plans === # Launch plans A launch plan is a template for a workflow invocation. It brings together: * A **Core concepts > Workflows** * A (possibly partial) set of inputs required to initiate that workflow * Optionally, **Core concepts > Launch plans > Notifications** and **Core concepts > Launch plans > Schedules** When invoked, the launch plan starts the workflow, passing the inputs as parameters. If the launch plan does not contain the entire set of required workflow inputs, additional input arguments must be provided at execution time. ## Default launch plan Every workflow automatically comes with a *default launch plan*. This launch plan does not define any default inputs, so they must all be provided at execution time. A default launch plan always has the same name as its workflow. ## Launch plans are versioned Like tasks and workflows, launch plans are versioned. A launch plan can be updated to change, for example, the set of inputs, the schedule, or the notifications. Each update creates a new version of the launch plan. ## Custom launch plans Additional launch plans, other than the default one, can be defined for any workflow. In general, a given workflow can be associated with multiple launch plans, but a given launch plan is always associated with exactly one workflow. ## Viewing launch plans for a workflow To view the launch plans for a given workflow, in the UI, navigate to the workflow's page and click **Launch Workflow**. You can choose which launch plan to use to launch the workflow from the **Launch Plan** dropdown menu. The default launch plan will be selected by default. If you have not defined any custom launch plans for the workflow, only the default plan will be available. If you have defined one or more custom launch plans, they will be available in the dropdown menu along with the default launch plan. For more details, see **Core concepts > Launch plans > Running launch plans**. ## Registering a launch plan ### Registering a launch plan on the command line In most cases, launch plans are defined alongside the workflows and tasks in your project code and registered as a bundle with the other entities using the CLI (see **Development cycle > Running your code**). ### Registering a launch plan in Python with `UnionRemote` As with all Union.ai command line actions, you can also perform registration of launch plans programmatically with [`UnionRemote`](../../development-cycle/union-remote), specifically, `UnionRemote.register_launch_plan`. ### Results of registration When the code above is registered to Union.ai, it results in the creation of four objects: * The task `workflows.launch_plan_example.my_task` * The workflow `workflows.launch_plan_example.my_workflow` * The default launch plan `workflows.launch_plan_example.my_workflow` (notice that it has the same name as the workflow) * The custom launch plan `my_workflow_custom_lp` (this is the one we defined in the code above) ### Changing a launch plan Launch plans are changed by altering their definition in code and re-registering. When a launch plan with the same project, domain, and name as a preexisting one is re-registered, a new version of that launch plan is created. ## Subpages - **Core concepts > Launch plans > Defining launch plans** - **Core concepts > Launch plans > Viewing launch plans** - **Core concepts > Launch plans > Notifications** - **Core concepts > Launch plans > Schedules** - **Core concepts > Launch plans > Activating and deactivating** - **Core concepts > Launch plans > Running launch plans** - **Core concepts > Launch plans > Reference launch plans** - **Core concepts > Launch plans > Mapping over launch plans** - **Core concepts > Launch plans > Reactive workflows** - **Core concepts > Launch plans > Concurrency control** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/launch-plans/defining-launch-plans === # Defining launch plans You can define a launch plan with the [`LaunchPlan` class](../../../api-reference/flytekit-sdk/packages/flytekit.core.launch_plan). This is a simple example of defining a launch plan: ```python import union @union.workflow def my_workflow(a: int, b: str) -> str: return f"Result: {a} and {b}" # Create a default launch plan default_lp = @union.LaunchPlan.get_or_create(workflow=my_workflow) # Create a named launch plan named_lp = @union.LaunchPlan.get_or_create( workflow=my_workflow, name="my_custom_launch_plan" ) ``` ## Default and Fixed Inputs Default inputs can be overridden at execution time, while fixed inputs cannot be changed. ```python import union # Launch plan with default inputs lp_with_defaults = union.LaunchPlan.get_or_create( workflow=my_workflow, name="with_defaults", default_inputs={"a": 42, "b": "default_value"} ) # Launch plan with fixed inputs lp_with_fixed = union.LaunchPlan.get_or_create( workflow=my_workflow, name="with_fixed", fixed_inputs={"a": 100} # 'a' will always be 100, only 'b' can be specified ) # Combining default and fixed inputs lp_combined = union.LaunchPlan.get_or_create( workflow=my_workflow, name="combined_inputs", default_inputs={"b": "default_string"}, fixed_inputs={"a": 200} ) ``` ## Scheduled Execution ```python import union from datetime import timedelta from flytekit.core.schedule import CronSchedule, FixedRate # Using a cron schedule (runs at 10:00 AM UTC every Monday) cron_lp = union.LaunchPlan.get_or_create( workflow=my_workflow, name="weekly_monday", default_inputs={"a": 1, "b": "weekly"}, schedule=CronSchedule( schedule="0 10 * * 1", # Cron expression: minute hour day-of-month month day-of-week kickoff_time_input_arg=None ) ) # Using a fixed rate schedule (runs every 6 hours) fixed_rate_lp = union.LaunchPlan.get_or_create( workflow=my_workflow, name="every_six_hours", default_inputs={"a": 1, "b": "periodic"}, schedule=FixedRate( duration=timedelta(hours=6) ) ) ``` ## Labels and Annotations Labels and annotations help with organization and can be used for filtering or adding metadata. ```python import union from flytekit.models.common import Labels, Annotations # Adding labels and annotations lp_with_metadata = union.LaunchPlan.get_or_create( workflow=my_workflow, name="with_metadata", default_inputs={"a": 1, "b": "metadata"}, labels=Labels({"team": "data-science", "env": "staging"}), annotations=Annotations({"description": "Launch plan for testing", "owner": "jane.doe"}) ) ``` ## Execution Parameters ```python import union # Setting max parallelism to limit concurrent task execution lp_with_parallelism = union.LaunchPlan.get_or_create( workflow=my_workflow, name="with_parallelism", default_inputs={"a": 1, "b": "parallel"}, max_parallelism=10 # Only 10 task nodes can run concurrently ) # Disable caching for this launch plan's executions lp_no_cache = union.LaunchPlan.get_or_create( workflow=my_workflow, name="no_cache", default_inputs={"a": 1, "b": "fresh"}, overwrite_cache=True # Always execute fresh, ignoring cached results ) # Auto-activate on registration lp_auto_activate = union.LaunchPlan.get_or_create( workflow=my_workflow, name="auto_active", default_inputs={"a": 1, "b": "active"}, auto_activate=True # Launch plan will be active immediately after registration ) ``` ## Security and Authentication We can also override the auth role (either an iam role or a kubernetes service account) used to execute a launch plan. ```python import union from flytekit.models.common import AuthRole from flytekit import SecurityContext # Setting auth role for the launch plan lp_with_auth = union.LaunchPlan.get_or_create( workflow=my_workflow, name="with_auth", default_inputs={"a": 1, "b": "secure"}, auth_role=AuthRole( assumable_iam_role="arn:aws:iam::12345678:role/my-execution-role" ) ) # Setting security context lp_with_security = union.LaunchPlan.get_or_create( workflow=my_workflow, name="with_security", default_inputs={"a": 1, "b": "context"}, security_context=SecurityContext( run_as=SecurityContext.K8sServiceAccount(name="my-service-account") ) ) ``` ## Raw Output Data Configuration ```python from flytekit.models.common import RawOutputDataConfig # Configure where large outputs should be stored lp_with_output_config = LaunchPlan.get_or_create( workflow=my_workflow, name="with_output_config", default_inputs={"a": 1, "b": "output"}, raw_output_data_config=RawOutputDataConfig( output_location_prefix="s3://my-bucket/workflow-outputs/" ) ) ``` ## Putting It All Together A pretty comprehensive example follows below. This custom launch plan has d ```python comprehensive_lp = LaunchPlan.get_or_create( workflow=my_workflow, name="comprehensive_example", default_inputs={"b": "configurable"}, fixed_inputs={"a": 42}, schedule=CronSchedule(schedule="0 9 * * *"), # Daily at 9 AM UTC notifications=[ Notification( phases=["SUCCEEDED", "FAILED"], email=EmailNotification(recipients_email=["team@example.com"]) ) ], labels=Labels({"env": "production", "team": "data"}), annotations=Annotations({"description": "Daily data processing"}), max_parallelism=20, overwrite_cache=False, auto_activate=True, auth_role=AuthRole(assumable_iam_role="arn:aws:iam::12345678:role/workflow-role"), raw_output_data_config=RawOutputDataConfig( output_location_prefix="s3://results-bucket/daily-run/" ) ) ``` These examples demonstrate the flexibility of Launch Plans in Flyte, allowing you to customize execution parameters, inputs, schedules, and more to suit your workflow requirements. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/launch-plans/viewing-launch-plans === # Viewing launch plans ## Viewing launch plans in the UI Select **Launch Plans** in the sidebar to display a list of all the registered launch plans in the project and domain: ![Launch plans list](../../../_static/images/user-guide/core-concepts/launch-plans/viewing-launch-plans/launch-plans-list.png) You can search the launch plans by name and filter for only those that are archived. The columns in the launch plans table are defined as follows: * **Name**: The name of the launch plan. Click to inspect a specific launch plan in detail. * **Triggers**: * If the launch plan is active, a green **Active** badge is shown. When a launch plan is active, any attached schedule will be in effect and the launch plan will be invoked according to that schedule. * Shows whether the launch plan has a **Core concepts > Launch plans > Reactive workflows**. To filter for only those launch plans with a trigger, check the **Has Triggers** box in the top right. * **Last Execution**: The last execution timestamp of this launch plan, irrespective of how the last execution was invoked (by schedule, by trigger, or manually). * **Last 10 Executions**: A visual representation of the last 10 executions of this launch plan, irrespective of how these executions were invoked (by schedule, by trigger, or manually). Select an entry on the list to go to that specific launch plan: ![Launch plan view](../../../_static/images/user-guide/core-concepts/launch-plans/viewing-launch-plans/launch-plan-view.png) Here you can see: * **Launch Plan Detail (Latest Version)**: * **Expected Inputs**: The input and output types for the launch plan. * **Fixed Inputs**: If the launch plan includes predefined input values, they are shown here. * **Launch Plan Versions**: A list of all versions of this launch plan. * **All executions in the Launch Plan**: A list of all executions of this launch plan. In the top right you can see if this launch plan is active (and if it is, which version, specifically, is active). There is also a control for changing the active version or deactivating the launch plan entirely. See **Core concepts > Launch plans > Activating and deactivating** for more details. ## Viewing launch plans on the command line with `uctl` To view all launch plans within a project and domain: ```shell $ uctl get launchplans \ --project \ --domain ``` To view a specific launch plan: ```shell $ uctl get launchplan \ --project \ --domain \ ``` See the **Uctl CLI** for more details. ## Viewing launch plans in Python with `UnionRemote` Use the method `UnionRemote.client.list_launch_plans_paginated` to get the list of launch plans. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/launch-plans/notifications === # Notifications A launch plan may be associated with one or more notifications, which are triggered when the launch plan's workflow is completed. There are three types of notifications: * `Email`: Sends an email to the specified recipients. * `PagerDuty`: Sends a PagerDuty notification to the PagerDuty service (with recipients specified). PagerDuty then forwards the notification as per your PagerDuty configuration. * `Slack`: Sends a Slack notification to the email address of a specified channel. This requires that you configure your Slack account to accept notifications. Separate notifications can be sent depending on the specific end state of the workflow. The options are: * `WorkflowExecutionPhase.ABORTED` * `WorkflowExecutionPhase.FAILED` * `WorkflowExecutionPhase.SUCCEEDED` * `WorkflowExecutionPhase.TIMED_OUT` For example: ```python from datetime import datetime import union from flytekit import ( WorkflowExecutionPhase, Email, PagerDuty, Slack ) @union.task def add_numbers(a: int, b: int, c: int) -> int: return a + b + c @union.task def generate_message(s: int, kickoff_time: datetime) -> str: return f"sum: {s} at {kickoff_time}" @union.workflow def my_workflow(a: int, b: int, c: int, kickoff_time: datetime) -> str: return generate_message( add_numbers(a, b, c), kickoff_time, ) union.LaunchPlan.get_or_create( workflow=my_workflow, name="my_workflow_custom_lp", fixed_inputs={"a": 3}, default_inputs={"b": 4, "c": 5}, notifications=[ Email( phases=[WorkflowExecutionPhase.FAILED], recipients_email=["me@example.com", "you@example.com"], ), PagerDuty( phases=[WorkflowExecutionPhase.SUCCEEDED], recipients_email=["myboss@example.com"], ), Slack( phases=[ WorkflowExecutionPhase.SUCCEEDED, WorkflowExecutionPhase.ABORTED, WorkflowExecutionPhase.TIMED_OUT, ], recipients_email=["your_slack_channel_email"], ), ], ) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/launch-plans/schedules === # Schedules Launch plans let you schedule the invocation of your workflows. A launch plan can be associated with one or more schedules, where at most one schedule is active at any one time. If a schedule is activated on a launch plan, the workflow will be invoked automatically by the system at the scheduled time with the inputs provided by the launch plan. Schedules can be either fixed-rate or `cron`-based. To set up a schedule, you can use the `schedule` parameter of the `LaunchPlan.get_or_create()` method. ## Fixed-rate schedules In the following example we add a [FixedRate](../../../api-reference/flytekit-sdk/packages/flytekit.core.schedule#flytekitcoreschedulefixedrate) that will invoke the workflow every 10 minutes. ```python from datetime import timedelta import union from flytekit import FixedRate @union.task def my_task(a: int, b: int, c: int) -> int: return a + b + c @union.workflow def my_workflow(a: int, b: int, c: int) -> int: return my_task(a=a, b=b, c=c) union.LaunchPlan.get_or_create( workflow=my_workflow, name="my_workflow_custom_lp", fixed_inputs={"a": 3}, default_inputs={"b": 4, "c": 5}, schedule=FixedRate( duration=timedelta(minutes=10) ) ) ``` Above, we defined the duration of the `FixedRate` schedule using `minutes`. Fixed rate schedules can also be defined using `days` or `hours`. ## Cron schedules A [`CronSchedule`](../../../api-reference/flytekit-sdk/packages/flytekit.core.schedule#flytekitcoreschedulecronschedule) allows you to specify a schedule using a `cron` expression: ```python import union from flytekit import CronSchedule @union.task def my_task(a: int, b: int, c: int) -> int: return a + b + c @union.workflow def my_workflow(a: int, b: int, c: int) -> int: return my_task(a=a, b=b, c=c) union.LaunchPlan.get_or_create( workflow=my_workflow, name="my_workflow_custom_lp", fixed_inputs={"a": 3}, default_inputs={"b": 4, "c": 5}, schedule=CronSchedule( schedule="*/10 * * * *" ) ) ``` ### Cron expression format A `cron` expression is a string that defines a schedule using five space-separated fields, each representing a time unit. The format of the string is: ``` minute hour day-of-month month day-of-week ``` Each field can contain values and special characters. The fields are defined as follows: | Field | Values | Special characters | |----------------|---------------------|--------------------| | `minute` | `0-59` | `* / , -` | | `hour` | `0-23` | `* / , -` | | `day-of-month` | `1-31` | `* / , - ?` | | `month` | `1-12` or `JAN-DEC` | `* / , -` | | `day-of-week` | `0-6` or` SUN-SAT` | `* / , - ?` | * The `month` and `day-of-week` abbreviations are not case-sensitive. * The `,` (comma) is used to specify multiple values. For example, in the `month` field, `JAN,FEB,MAR` means every January, February, and March. * The `-` (dash) specifies a range of values. For example, in the `day-of-month` field, `1-15` means every day from `1` through `15` of the specified month. * The `*` (asterisk) specifies all values of the field. For example, in the `hour` field, `*` means every hour (on the hour), from `0` to `23`. You cannot use `*` in both the `day-of-month` and `day-of-week` fields in the same `cron` expression. If you use it in one, you must use `?` in the other. * The `/` (slash) specifies increments. For example, in the `minute` field, `1/10` means every tenth minute, starting from the first minute of the hour (that is, the 11th, 21st, and 31st minute, and so on). * The `?` (question mark) specifies any value of the field. For example, in the `day-of-month` field you could enter `7` and, if any day of the week was acceptable, you would enter `?` in the `day-of-week` field. ### Cron expression examples | Expression | Description | |--------------------|-------------------------------------------| | `0 0 * * *` | Midnight every day. | | `0 12 * * MON-FRI` | Noon every weekday. | | `0 0 1 * *` | Midnight on the first day of every month. | | `0 0 * JAN,JUL *` | Midnight every day in January and July. | | `*/5 * * * *` | Every five minutes. | | `30 2 * * 1` | At 2:30 AM every Monday. | | `0 0 15 * ?` | Midnight on the 15th of every month. | ### Cron aliases The following aliases are also available. An alias is used in place of an entire `cron` expression. | Alias | Description | Equivalent to | |------------|------------------------------------------------------------------|-----------------| | `@yearly` | Once a year at midnight at the start of 1 January. | `0 0 1 1 *` | | `@monthly` | Once a month at midnight at the start of first day of the month. | `0 0 1 * *` | | `@weekly` | Once a week at midnight at the start of Sunday. | `0 0 * * 0` | | `@daily` | Once a day at midnight. | `0 0 * * *` | | `@hourly` | Once an hour at the beginning of the hour. | `0 * * * *` | ## kickoff_time_input_arg Both `FixedRate` and `CronSchedule` can take an optional parameter called `kickoff_time_input_arg` This parameter is used to specify the name of a workflow input argument. Each time the system invokes the workflow via this schedule, the time of the invocation will be passed to the workflow through the specified parameter. For example: ```python from datetime import datetime, timedelta import union from flytekit import FixedRate @union.task def my_task(a: int, b: int, c: int) -> int: return a + b + c @union.workflow def my_workflow(a: int, b: int, c: int, kickoff_time: datetime ) -> str: return f"sum: {my_task(a=a, b=b, c=c)} at {kickoff_time}" union.LaunchPlan.get_or_create( workflow=my_workflow, name="my_workflow_custom_lp", fixed_inputs={"a": 3}, default_inputs={"b": 4, "c": 5}, schedule=FixedRate( duration=timedelta(minutes=10), kickoff_time_input_arg="kickoff_time" ) ) ``` Here, each time the schedule calls `my_workflow`, the invocation time is passed in the `kickoff_time` argument. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/launch-plans/activating-and-deactivating === # Activating and deactivating You can set an active/inactive status on launch plans. Specifically: * Among the versions of a given launch plan (as defined by name), at most one can be set to active. All others are inactive. * If a launch plan version that has a schedule attached is activated, then its schedule also becomes active and its workflow will be invoked automatically according to that schedule. * When a launch plan version with a schedule is inactive, its schedule is inactive and will not be used to invoke its workflow. Launch plans that do not have schedules attached can also have an active version. For such non-scheduled launch plans, this status serves as a flag that can be used to distinguish one version from among the others. It can, for example, be used by management logic to determine which version of a launch plan to use for new invocations. Upon registration of a new launch plan, the first version is automatically inactive. If it has a schedule attached, the schedule is also inactive. Once activated, a launch plan version remains active even as new, later, versions are registered. A launch plan version with a schedule attached can be activated through either the UI, `uctl`, or [`UnionRemote`](../../../user-guide/development-cycle/union-remote). ## Activating and deactivating a launch plan in the UI To activate a launch plan, go to the launch plan view and click **Add active launch plan** in the top right corner of the screen: ![Activate schedule](../../../_static/images/user-guide/core-concepts/launch-plans/activating-and-deactivating/add-active-launch-plan.png) A modal will appear that lets you select which launch plan version to activate: ![Activate schedule](../../../_static/images/user-guide/core-concepts/launch-plans/activating-and-deactivating/update-active-launch-plan-dialog.png) This modal will contain all versions of the launch plan that have an attached schedule. Note that at most one version (and therefore at most one schedule) of a launch plan can be active at any given time. Selecting the launch plan version and clicking **Update** activates the launch plan version and schedule. The launch plan version and schedule are now activated. The launch plan will be triggered according to the schedule going forward. > [!WARNING] > Non-scheduled launch plans cannot be activated via the UI. > The UI does not support activating launch plans that do not have schedules attached. > You can activate them with `uctl` or `UnionRemote`. To deactivate a launch plan, navigate to a launch plan with an active schedule, click the **...** icon in the top-right corner of the screen beside **Active launch plan**, and click โ€œDeactivateโ€. ![Deactivate schedule](../../../_static/images/user-guide/core-concepts/launch-plans/activating-and-deactivating/deactivate-launch-plan.png) A confirmation modal will appear, allowing you to deactivate the launch plan and its schedule. > [!WARNING] > Non-scheduled launch plans cannot be deactivated via the UI. > The UI does not support deactivating launch plans that do not have schedules attached. > You can deactivate them with `uctl` or `UnionRemote`. ## Activating and deactivating a launch plan on the command line with `uctl` To activate a launch plan version with `uctl`, execute the following command: ```shell $ uctl update launchplan \ --activate \ --project \ --domain \ \ --version ``` To deactivate a launch plan version with `uctl`, execute the following command: ```shell $ uctl update launchplan \ --deactivate \ --project \ --domain \ \ --version ``` See **Uctl CLI** for more details. ## Activating and deactivating a launch plan in Python with `UnionRemote` To activate a launch plan using version `UnionRemote`: ```python from union.remote import UnionRemote from flytekit.configuration import Config remote = UnionRemote(config=Config.auto(), default_project=, default_domain=) launch_plan = remote.fetch_launch_plan(ame=, version=).id remote.client.update_launch_plan(launch_plan.id, "ACTIVE") ``` To deactivate a launch plan version using `UnionRemote`: ```python from union.remote import UnionRemote from flytekit.remote import Config remote = UnionRemote(config=Config.auto(), default_project=, default_domain=) launch_plan = remote.fetch_launch_plan(ame=, version=) remote.client.update_launch_plan(launch_plan.id, "INACTIVE") ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/launch-plans/running-launch-plans === # Running launch plans ## Running a launch plan in the UI To invoke a launch plan, go to the **Workflows** list, select the desired workflow, click **Launch Workflow**. In the new execution dialog, select the desired launch plan from the **Launch Plan** dropdown menu and click **Launch**. ## Running a launch plan on the command line with `uctl` To invoke a launch plan via the command line, first generate the execution spec file for the launch plan: ```shell $ uctl get launchplan \ --project --domain \ \ --execFile .yaml ``` Then you can execute the launch plan with the following command: ```shell $ uctl create execution \ --project \ --domain \ --execFile .yaml ``` See **Uctl CLI** for more details. ## Running a launch plan in Python with `UnionRemote` The following code executes a launch plan using `UnionRemote`: ```python import union from flytekit.remote import Config remote = union.UnionRemote(config=Config.auto(), default_project=, default_domain=) launch_plan = remote.fetch_launch_plan(name=, version=) remote.execute(launch_plan, inputs=) ``` See the [UnionRemote](../../development-cycle/union-remote) for more details. ## Sub-launch plans The above invocation examples assume you want to run your launch plan as a top-level entity within your project. However, you can also invoke a launch plan from *within a workflow*, creating a *sub-launch plan*. This causes the invoked launch plan to kick off its workflow, passing any parameters specified to that workflow. This differs from the case of **Core concepts > Workflows > Subworkflows and sub-launch plans** where you invoke one workflow function from within another. A subworkflow becomes part of the execution graph of the parent workflow and shares the same execution ID and context. On the other hand, when a sub-launch plan is invoked a full, top-level workflow is kicked off with its own execution ID and context. See **Core concepts > Workflows > Subworkflows and sub-launch plans** for more details. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/launch-plans/reference-launch-plans === # Reference launch plans A reference launch plan references previously defined, serialized, and registered launch plans. You can reference launch plans from other projects and create workflows that use launch plans declared by others. When you create a reference launch plan, be sure to verify that the workflow interface corresponds to that of the referenced workflow. > [!NOTE] > Reference launch plans cannot be run locally. To test locally, mock them out. ## Example In this example, we create a reference launch plan for the [`simple_wf`](https://github.com/flyteorg/flytesnacks/blob/master/examples/basics/basics/workflow.py#L25) workflow from the [Flytesnacks repository](https://github.com/flyteorg/flytesnacks). 1. Clone the Flytesnacks repository: ```shell $ git clone git@github.com:flyteorg/flytesnacks.git ``` 2. Navigate to the `basics` directory: ```shell $ cd flytesnacks/examples/basics ``` 3. Register the `simple_wf` workflow: ```shell $ union register --project flytesnacks --domain development --version v1 basics/workflow.py. ``` 4. Create a file called `simple_wf_ref_lp.py` and copy the following code into it: ```python import union from flytekit import reference_launch_plan @reference_launch_plan( project="flytesnacks", domain="development", name="basics.workflow.simple_wf", version="v1", ) def simple_wf_lp( x: list[int], y: list[int] ) -> float: return 1.0 @union.workflow def run_simple_wf() -> float: x = [-8, 2, 4] y = [-2, 4, 7] return simple_wf_lp(x=x, y=y) ``` 5. Register the `run_simple_wf` workflow: ```shell $ union register simple_wf_ref_lp.py ``` 6. In the Union.ai UI, run the workflow `run_simple_wf`. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/launch-plans/mapping-over-launch-plans === # Mapping over launch plans You can map over launch plans the same way you can **Core concepts > Launch plans > Mapping over launch plans > map over tasks** to execute workflows in parallel across a series of inputs. You can either map over a `LaunchPlan` object defined in one of your Python modules or a **Core concepts > Launch plans > Reference launch plans** that points to a previously registered launch plan. ## Launch plan defined in your code Here we define a workflow called `interest_workflow` that we want to parallelize, along with a launch plan called `interest_workflow_lp`, in a file we'll call `map_interest_wf.py`. We then write a separate workflow, `map_interest_wf`, that uses a `map` to parallelize `interest_workflow` over a list of inputs. ```python import union # Task to calculate monthly interest payment on a loan @union.task def calculate_interest(principal: int, rate: float, time: int) -> float: return (principal * rate * time) / 12 # Workflow using the calculate_interest task @union.workflow def interest_workflow(principal: int, rate: float, time: int) -> float: return calculate_interest(principal=principal, rate=rate, time=time) # Create LaunchPlan for interest_workflow lp = union.LaunchPlan.get_or_create( workflow=interest_workflow, name="interest_workflow_lp", ) # Mapping over the launch plan to calculate interest for multiple loans @union.workflow def map_interest_wf() -> list[float]: principal = [1000, 5000, 10000] rate = [0.05, 0.04, 0.03] # Different interest rates for each loan time = [12, 24, 36] # Loan periods in months return union.map(lp)(principal=principal, rate=rate, time=time) # Mapping over the launch plan to calculate interest for multiple loans while fixing an input @union.workflow def map_interest_fixed_principal_wf() -> list[float]: rate = [0.05, 0.04, 0.03] # Different interest rates for each loan time = [12, 24, 36] # Loan periods in months # Note: principal is set to 1000 for all the calculations return union.map(lp, bound_inputs={'principal':1000})(rate=rate, time=time) ``` You can run the `map_interest` workflow locally: ```shell $ union run map_interest_wf.py map_interest_wf ``` You can also run the `map_interest` workflow remotely on Union.ai: ```shell $ union run --remote map_interest_wf.py map_interest_wf ``` ## Previously registered launch plan To demonstrate the ability to map over previously registered launch plans, in this example, we map over the [`simple_wf`](https://github.com/flyteorg/flytesnacks/blob/master/examples/basics/basics/workflow.py#L25) launch plan from the basic workflow example in the [Flytesnacks repository](https://github.com/flyteorg/flytesnacks). Recall that when a workflow is registered, an associated launch plan is created automatically. One of these launch plans will be leveraged in this example, though custom launch plans can also be used. 1. Clone the Flytesnacks repository: ```shell $ git clone git@github.com:flyteorg/flytesnacks.git ``` 2. Navigate to the `basics` directory: ```shell $ cd flytesnacks/examples/basics ``` 3. Register the `simple_wf` workflow: ```shell $ union register --project flytesnacks --domain development --version v1 basics/workflow.py ``` Note that the `simple_wf` workflow is defined as follows: ```python @union.workflow def simple_wf(x: list[int], y: list[int]) -> float: slope_value = slope(x=x, y=y) intercept_value = intercept(x=x, y=y, slope=slope_value) return intercept_value ``` 4. Create a file called `map_simple_wf.py` and copy the following code into it: ```python import union from flytekit import reference_launch_plan @reference_launch_plan( project="flytesnacks", domain="development", name="basics.workflow.simple_wf", version="v1", ) def simple_wf_lp( x: list[int], y: list[int] ) -> float: pass @union.workflow def map_simple_wf() -> list[float]: x = [[-3, 0, 3], [-8, 2, 4], [7, 3, 1]] y = [[7, 4, -2], [-2, 4, 7], [3, 6, 4]] return union.map(simple_wf_lp)(x=x, y=y) ``` Note the fact that the reference launch plan has an interface that corresponds exactly to the registered `simple_wf` we wish to map over. 5. Register the `map_simple_wf` workflow. Reference launch plans cannot be run locally, so we will register the `map_simple_wf` workflow to Union.ai and run it remotely. ```shell $ union register map_simple_wf.py ``` 6. In the Union.ai UI, run the `map_simple_wf` workflow. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/launch-plans/reactive-workflows === # Reactive workflows Reactive workflows leverage **Core concepts > Artifacts** as the medium of exchange between workflows, such that when an upstream workflow emits an artifact, an artifact-driven trigger in a downstream workflow passes the artifact to a new downstream workflow execution. A trigger is a rule defined in a launch plan that specifies that when a certain event occurs -- for instance, a new version of a particular artifact is materialized -- a particular launch plan will be executed. Triggers allow downstream data consumers, such as machine learning engineers, to automate their workflows to react to the output of upstream data producers, such as data engineers, while maintaining separation of concerns and eliminating the need for staggered schedules and manual executions. Updating any trigger associated with a launch plan will create a new version of the launch plan, similar to how schedules are handled today. This means that multiple launch plans, each with different triggers, can be created to act on the same underlying workflow. Launch plans with triggers must be activated in order for the trigger to work. > [!NOTE] > Currently, there are only artifact event-based triggers, but in the future, triggers will be expanded to include other event-based workflow triggering mechanisms. ## Scope Since a trigger is part of a launch plan, it is scoped as follows: * Project * Domain * Launch plan name * Launch plan version ## Trigger types ### Artifact events An artifact event definition contains the following: * Exactly one artifact that will activate the trigger when a new version of the artifact is created * A workflow that is the target of the trigger * (Optionally) Inputs to the workflow that will be executed by the trigger. It is possible to pass information from the source artifact, the source artifact itself, and other artifacts to the workflow that will be triggered. For more information, see **Core concepts > Artifacts > Connecting workflows with artifact event triggers**. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/launch-plans/concurrency-control === # Concurrency control Concurrency control allows you to limit the number of concurrently running workflow executions for a specific launch plan, identified by its unique `project`, `domain`, and `name`. This control is applied across all versions of that launch plan. > [!NOTE] > To clone and run the example code on this page, see the [Flytesnacks repo](https://github.com/flyteorg/flytesnacks/tree/master/examples/productionizing/). ## How it works When a new execution for a launch plan with a `ConcurrencyPolicy` is requested, Flyte performs a check to count the number of currently active executions for that same launch plan (`project/domain/name`), irrespective of their versions. This check is done using a database query that joins the `executions` table with the `launch_plans` table. It filters for executions that are in an active phase (e.g., `QUEUED`, `RUNNING`, `ABORTING`, etc.) and belong to the launch plan name being triggered. If the number of active executions is already at or above the `max_concurrency` limit defined in the policy of the launch plan version being triggered, the new execution will be handled according to the specified `behavior`. ## Basic usage Here's an example of how to define a launch plan with concurrency control: ```python from flytekit import ConcurrencyPolicy, ConcurrencyLimitBehavior, LaunchPlan, workflow @workflow def my_workflow() -> str: return "Hello, World!" # Create a launch plan with concurrency control concurrency_limited_lp = LaunchPlan.get_or_create( name="my_concurrent_lp", workflow=my_workflow, concurrency=ConcurrencyPolicy( max_concurrency=3, behavior=ConcurrencyLimitBehavior.SKIP, ), ) ``` ## Scheduled workflows with concurrency control Concurrency control is particularly useful for scheduled workflows to prevent overlapping executions: ```python from flytekit import ConcurrencyPolicy, ConcurrencyLimitBehavior, CronSchedule, LaunchPlan, workflow @workflow def scheduled_workflow() -> str: # This workflow might take a long time to complete return "Processing complete" # Create a scheduled launch plan with concurrency control scheduled_lp = LaunchPlan.get_or_create( name="my_scheduled_concurrent_lp", workflow=scheduled_workflow, concurrency=ConcurrencyPolicy( max_concurrency=1, # Only allow one execution at a time behavior=ConcurrencyLimitBehavior.SKIP, ), schedule=CronSchedule(schedule="*/5 * * * *"), # Runs every 5 minutes ) ``` ## Defining the policy A `ConcurrencyPolicy` is defined with two main parameters: - `max_concurrency` (integer): The maximum number of workflows that can be running concurrently for this launch plan name. - `behavior` (enum): What to do when the `max_concurrency` limit is reached. Currently, only `SKIP` is supported, which means new executions will not be created if the limit is hit. ```python from flytekit import ConcurrencyPolicy, ConcurrencyLimitBehavior policy = ConcurrencyPolicy( max_concurrency=5, behavior=ConcurrencyLimitBehavior.SKIP ) ``` ## Key behaviors and considerations ### Version-agnostic check, version-specific enforcement The concurrency check counts all active workflow executions of a given launch plan (`project/domain/name`). However, the enforcement (i.e., the `max_concurrency` limit and `behavior`) is based on the `ConcurrencyPolicy` defined in the specific version of the launch plan you are trying to launch. **Example scenario:** 1. Launch plan `MyLP` version `v1` has a `ConcurrencyPolicy` with `max_concurrency = 3`. 2. Three executions of `MyLP` (they could be `v1` or any other version) are currently running. 3. You try to launch `MyLP` version `v2`, which has a `ConcurrencyPolicy` with `max_concurrency = 10`. - **Result**: This `v2` execution will launch successfully because its own limit (10) is not breached by the current 3 active executions. 4. Now, with 4 total active executions (3 original + the new `v2`), you try to launch `MyLP` version `v1` again. - **Result**: This `v1` execution will **fail**. The check sees 4 active executions, and `v1`'s policy only allows a maximum of 3. ### Concurrency limit on manual trigger Upon manual trigger of an execution (via `pyflyte` for example) which would breach the concurrency limit, you should see this error in the console: ```bash _InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.RESOURCE_EXHAUSTED details = "Concurrency limit (1) reached for launch plan my_workflow_lp. Skipping execution." > ``` ### Scheduled execution behavior When the scheduler attempts to trigger an execution and the concurrency limit is met, the creation will fail and the error message from FlyteAdmin will be logged in FlyteScheduler logs. **This will be transparent to the user. A skipped execution will not appear as skipped in the UI or project execution page**. ## Limitations ### "At most" enforcement While the system aims to respect `max_concurrency`, it acts as an "at most" limit. Due to the nature of scheduling, workflow execution durations, and the timing of the concurrency check (at launch time), there might be periods where the number of active executions is below `max_concurrency` even if the system could theoretically run more. For example, if `max_concurrency` is 5 and all 5 workflows finish before the next scheduled check/trigger, the count will drop. The system prevents exceeding the limit but doesn't actively try to always maintain `max_concurrency` running instances. ### Notifications for skipped executions Currently, there is no built-in notification system for skipped executions. When a scheduled execution is skipped due to concurrency limits, it will be logged in FlyteScheduler but no user notification will be sent. This is an area for future enhancement. ## Best practices 1. **Use with scheduled workflows**: Concurrency control is most beneficial for scheduled workflows that might take longer than the schedule interval to complete. 2. **Set appropriate limits**: Consider your system resources and the resource requirements of your workflows when setting `max_concurrency`. 3. **Monitor skipped executions**: Regularly check FlyteAdmin logs to monitor if executions are being skipped due to concurrency limits. 4. **Version management**: Be aware that different versions of the same launch plan can have different concurrency policies, but the check is performed across all versions. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/actors === # Actors Actors allow you to reuse a container and environment between tasks, avoiding the cost of starting a new container for each task. This can be useful when you have a task that requires a lot of setup or has a long startup time. To create an actor, instantiate the **Core concepts > Actors > `ActorEnvironment`** class, then add the instance as a decorator to the task that requires that environment. ### `ActorEnvironment` parameters * **container_image:** The container image to use for the task. This container must have the `union` python package installed, so this must be updated from the default (i.e. `cr.flyte.org/flyteorg/flytekit:py3.11-latest`). * **environment:** Environment variables as key, value pairs in a Python dictionary. * **limits:** Compute resource limits. * **replica_count:** The number of workers to provision that are able to accept tasks. * **requests:** Compute resource requests per task. * **secret_requests:** Keys (ideally descriptive) that can identify the secrets supplied at runtime. For more information, see **Development cycle > Managing secrets**. * **ttl_seconds:** How long to keep the Actor alive while no tasks are being run. The following example shows how to create a basic `ActorEnvironment` and use it for one task: ```python # hello_world.py import os import union image = union.ImageSpec( registry=os.environ.get("DOCKER_REGISTRY", None), packages=["union"], ) actor = union.ActorEnvironment( name="my-actor", replica_count=1, ttl_seconds=30, requests=union.Resources( cpu="2", mem="300Mi", ), container_image=image, ) @actor.task def say_hello() -> str: return "hello" @union.workflow def wf(): say_hello() ``` You can learn more about the trade-offs between actors and regular tasks, as well as the efficiency gains you can expect **Core concepts > Actors > Actors and regular tasks**. ## Caching on Actor Replicas The `@actor_cache` decorator provides a powerful mechanism to cache the results of Python callables on individual actor replicas. This is particularly beneficial for workflows involving repetitive tasks, such as data preprocessing, model loading, or initialization of shared resources, where caching can minimize redundant operations and improve overall efficiency. Once a callable is cached on a replica, subsequent tasks that use the same actor can access the cached result, significantly improving performance and efficiency. ### When to Use `@actor_cache` - **Shared Initialization Costs:** For expensive, shared initialization processes that multiple tasks rely on. - **Repetitive Task Execution:** When tasks repeatedly require the same resource or computation on the same actor replica. - **Complex Object Caching:** Use custom Python objects as keys to define unique cache entries. Below is a simplified example showcasing the use of `@actor_cache` for caching repetitive tasks. This dummy example demonstrates caching model that is loaded by the `load_model` task. ```python # caching_basic.py from time import sleep import os import union image = union.ImageSpec( registry=os.environ.get("DOCKER_REGISTRY", None), packages=["union"], ) actor = union.ActorEnvironment( name="my-actor", container_image=image, replica_count=1, ) @union.actor_cache def load_model(state: int) -> callable: sleep(4) # simulate model loading return lambda value: state + value @actor.task def evaluate(value: int, state: int) -> int: model = load_model(state=state) return model(value) @union.workflow def wf(init_value: int = 1, state: int = 3) -> int: out = evaluate(value=init_value, state=state) out = evaluate(value=out, state=state) out = evaluate(value=out, state=state) out = evaluate(value=out, state=state) return out ``` > [!NOTE] > In order to get the `@actor_cache` functionality, you must pin `union` to at least `0.1.121`. ![Actor caching example 1](../../../_static/images/user-guide/core-concepts/actors/caching/actor-cache-example-1.png) You can see that the first call of `evaluate` took considerable time as it involves allocating a node for the task, creating a container, and loading the model. The subsequent calls of `evaluate` execute in a fraction of the time. You can see examples of more advanced actor usage **Core concepts > Actors > Actor examples**. ## Subpages - **Core concepts > Actors > Actors and regular tasks** - **Core concepts > Actors > Actor examples** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/actors/actors-and-regular-tasks === # Actors and regular tasks When deciding whether to use actors or traditional tasks in your workflows, it's important to consider the benefits and trade-offs. This page outlines key scenarios where actors shine and where they may not be the best fit. | When to Use Actors | When Not to Use Actors | | ------------------ | ---------------------- | | **Short Running Tasks** Traditional tasks spin up a new container and pod for each task, which adds overhead. Actors allow tasks to run on the same container, removing the repeated cost of pod creation, image pulling, and initialization. Actors offer the most benefit for short running tasks where the startup overhead is a larger component of total task runtime. | **Long Running Tasks** For long running tasks, container initialization overhead is minimal, therefore the performance benefits of actors become negligible when task runtime significantly exceeds startup time. | | **Map Tasks with Large Input Arrays** Map tasks by default share the same image and resource definitions, making them a great use case for actors. Actors provide the greatest benefit when the input array is larger than the desired concurrency. For example, consider an input array with 2,000 entries and a concurrency level of 50. Without actors, map tasks would spin up 2,000 individual containersโ€”one for each entry. With actors, only 50 containers are needed, corresponding to the number of replicas, dramatically reducing overhead. | **Map Tasks with Small Input Arrays** In a map task where the number of actor replicas matches the input array size, the same number of pods and container are initialized when a map task is used without an actor. For example, if there are 10 inputs and 10 replicas, 10 pods are created, resulting in no reduction in overhead. | | **State Management and Efficient Initialization** Actors excel when state persistence between tasks is valuable. You can use `@actor_cache` to cache Python objects. For example, this lets you load a large model or dataset into memory once per replica, and access it across tasks run on that replica. You can also serve a model or initialize shared resources in an init container. Each task directed to that actor replica can then reuse the same model or resource. | **Strict Task Isolation Is Critical** While actors clear Python caches, global variables, and custom environment variables after each task, they still share the same container. The shared environment introduces edge cases where you could intentionally or unintentionally impact downstream tasks. For example, if you write to a file in one task, that file will remain mutated for the next task that is run on that actor replica. If strict isolation between tasks is a hard requirement, regular tasks provide a safer option. | | **Shared Dependencies and Resources** If multiple tasks can use the same container image and have consistent resource requirements, actors are a natural fit. | | # Efficiency Gains from Actors with Map Tasks Let's see how using Actors with map tasks can cut runtime in half! We compare three scenarios: 1. **Regular map tasks without specifying concurrency.** This is the fasted expected configuration as flyte will spawn as many pods as there are elements in the input array, allowing Kubernetes to manage scheduling based on available resources. 2. **Regular map tasks with fixed concurrency.** This limits the number of pods that are alive at any given time. 3. **Map tasks with Actors.** Here we set the number of replicas to match the concurrency of the previous example. These will allow us to compare actors to vanilla map tasks when both speed is maximized and when alive pods are matched one-to-one. ## "Hello World" Benchmark This benchmark simply runs a task that returns "Hello World", which is a near instantaneous task. | Task Type | Concurrency/Replicas | Duration (seconds) | | -------------- | -------------------- | ------------------ | | Without Actors | unbound | 111 | | Without Actors | 25 | 1195 | | With Actors | 25 | 42 | **Key Takeaway:** For near instantaneous tasks, using a 25-replica Actor with map tasks reduces runtime by 96% if live pods are matched, and 62% when map task concurrency is unbounded. ## "5s Sleep" Benchmark This benchmark simply runs a task that sleeps for five seconds. | Task Type | Concurrency/Replicas | Duration (seconds) | | -------------- | -------------------- | ------------------ | | Without Actors | unbound | 174 | | Without Actors | 100 | 507 | | With Actors | 100 | 87 | **Key Takeaway:** For five-second long tasks, using a 100-replica Actor with map tasks reduces runtime by 83% if live pods are matched, and 50% when map task concurrency is unbounded. If you have short running map tasks, you can cut your runtime in half. If you are already using concurrency limits on your map tasks, you can expect even better improvements! === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/actors/actor-examples === # Actor examples ### Refactoring from Regular Tasks to Actors Notice that converting a non-actor workflow to use actors is as simple as replacing the `@union.task` decorator with the `@actor.task` decorator. Additionally, task decorator arguments can be moved either to the actor environment or the actor task decorator, depending on whether they apply to the entire environment (e.g. resource specifications) or to a single task execution (e.g. caching arguments). ```diff import union + actor = union.ActorEnvironment( + name = "myenv", + replica_count = 10, + ttl_seconds = 120, + requests = union.Resources(mem="1Gi"), + container_image = "myrepo/myimage-with-scipy:latest", +) + - @union.task(requests=union.Resources(mem="1Gi")) + @actor.task def add_numbers(a: float, b: float) -> float: return a + b - @union.task(container_image="myrepo/myimage-with-scipy:latest") + @actor.task def calculate_distance(point_a: list[int], point_b: list[int]) -> float: from scipy.spatial.distance import euclidean return euclidean(point_a, point_b) - @union.task(cache=True, cache_version="v1") + @actor.task(cache=True, cache_version="v1") def is_even(number: int) -> bool: return number % 2 == 0 @union.workflow def distance_add_wf(point_a: list[int], point_b: list[int]) -> float: distance = calculate_distance(point_a=point_a, point_b=point_b) return add_numbers(a=distance, b=1.5) @union.workflow def is_even_wf(point_a: list[int]) -> list[bool]: return union.map(is_even)(number=point_a) ``` ## Multiple instances of the same task In this example, the `actor.task`-decorated task is invoked multiple times in one workflow, and will use the same `ActorEnvironment` on each invocation: ```python # plus_one.py import os import union image = union.ImageSpec( registry=os.environ.get("DOCKER_REGISTRY", None), packages=["union"], ) actor = union.ActorEnvironment( name="my-actor", replica_count=1, ttl_seconds=300, requests=union.Resources(cpu="2", mem="500Mi"), container_image=image, ) @actor.task def plus_one(input: int) -> int: return input + 1 @union.workflow def wf(input: int = 0) -> int: a = plus_one(input=input) b = plus_one(input=a) c = plus_one(input=b) return plus_one(input=c) ``` ## Multiple tasks Every task execution in the following example will execute in the same `ActorEnvironment`. You can use the same environment for multiple tasks in the same workflow and tasks across workflow definitions, using both subworkflows and launch plans: ```python # multiple_tasks.py import os import union image = union.ImageSpec( registry=os.environ.get("DOCKER_REGISTRY", None), packages=["union"], ) actor = union.ActorEnvironment( name="my-actor", replica_count=1, ttl_seconds=30, requests=union.Resources(cpu="1", mem="450Mi"), container_image=image, ) @actor.task def say_hello(name: str) -> str: return f"hello {name}" @actor.task def scream_hello(name: str) -> str: return f"HELLO {name}" @union.workflow def my_child_wf(name: str) -> str: return scream_hello(name=name) my_child_wf_lp = union.LaunchPlan.get_default_launch_plan(union.current_context(), my_child_wf) @union.workflow def my_parent_wf(name: str) -> str: a = say_hello(name=name) b = my_child_wf(name=a) return my_child_wf_lp(name=b) ``` ## Custom PodTemplates Both tasks in the following example will be executed in the same `ActorEnvironment`, which is created with a `PodTemplate` for additional configuration. ```python # pod_template.py import os from kubernetes.client.models import ( V1Container, V1PodSpec, V1ResourceRequirements, V1EnvVar, ) import union image = union.ImageSpec( registry=os.environ.get("DOCKER_REGISTRY", None), packages=["union", "flytekitplugins-pod"], ) pod_template = union.PodTemplate( primary_container_name="primary", pod_spec=V1PodSpec( containers=[ V1Container( name="primary", image=image, resources=V1ResourceRequirements( requests={ "cpu": "1", "memory": "1Gi", }, limits={ "cpu": "1", "memory": "1Gi", }, ), env=[V1EnvVar(name="COMP_KEY_EX", value="compile_time")], ), ], ), ) actor = union.ActorEnvironment( name="my-actor", replica_count=1, ttl_seconds=30, pod_template=pod_template, ) @actor.task def get_and_set() -> str: os.environ["RUN_KEY_EX"] = "run_time" return os.getenv("COMP_KEY_EX") @actor.task def check_set() -> str: return os.getenv("RUN_KEY_EX") @union.workflow def wf() -> tuple[str,str]: return get_and_set(), check_set() ``` ## Example: `@actor_cache` with `map` With map tasks, each task is executed within the same environment, making actors a natural fit for this pattern. If a task has an expensive operation, like model loading, caching it with `@actor_cache` can improve performance. This example shows how to cache model loading in a mapped task to avoid redundant work and save resources. ```python # caching_map_task.py from functools import partial from pathlib import Path from time import sleep import os import union image = union.ImageSpec( registry=os.environ.get("DOCKER_REGISTRY", None), packages=["union"], ) actor = union.ActorEnvironment( name="my-actor", container_image=image, replica_count=2, ) class MyModel: """Simple model that multiples value with model_state.""" def __init__(self, model_state: int): self.model_state = model_state def __call__(self, value: int): return self.model_state * value @union.task(container_image=image, cache=True, cache_version="v1") def create_model_state() -> union.FlyteFile: working_dir = Path(union.current_context().working_directory) model_state_path = working_dir / "model_state.txt" model_state_path.write_text("4") return model_state_path @union.actor_cache def load_model(model_state_path: union.FlyteFile) -> MyModel: # Simulate model loading time. This can take a long time # because the FlyteFile download is large, or when the # model is loaded onto the GPU. sleep(10) with model_state_path.open("r") as f: model_state = int(f.read()) return MyModel(model_state=model_state) @actor.task def inference(value: int, model_state_path: union.FlyteFile) -> int: model = load_model(model_state_path) return model(value) @union.workflow def run_inference(values: list[int] = list(range(20))) -> list[int]: model_state = create_model_state() inference_ = partial(inference, model_state_path=model_state) return union.map(inference_)(value=values) ``` ## Example: Caching with Custom Objects Finally, we can cache custom objects by defining the `__hash__` and `__eq__` methods. These methods allow `@actor_cache` to determine if an object is the same between runs, ensuring that expensive operations are skipped if the object hasnโ€™t changed. ```python # caching_custom_object.py from time import sleep import os import union image = union.ImageSpec( registry=os.environ.get("DOCKER_REGISTRY", None), packages=["union"], ) actor = union.ActorEnvironment( name="my-actor", container_image=image, replica_count=1, ) class MyObj: def __init__(self, state: int): self.state = state def __hash__(self): return hash(self.state) def __eq__(self, other): return self.state == other.state @union.actor_cache def get_state(obj: MyObj) -> int: sleep(2) return obj.state @actor.task def construct_and_get_value(state: int) -> int: obj = MyObj(state=state) return get_state(obj) @union.workflow def wf(state: int = 2) -> int: value = construct_and_get_value(state=state) value = construct_and_get_value(state=value) value = construct_and_get_value(state=value) value = construct_and_get_value(state=value) return value ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/artifacts === # Artifacts Union.ai produces many intermediate outputs when running tasks and workflows. These outputs are stored internally in Union.ai and are accessible through the relevant executions, but are not usually directly accessible to users. The Artifact service indexes and adds semantic meaning to outputs of all Union.ai task and workflow executions, such as models, files, or any other kinds of data, enabling you to directly access, track, and orchestrate pipelines through the outputs themselves. Artifacts allow you to store additional metadata for these outputs in the form of **Core concepts > Artifacts > Partitions**, which are key-value pairs that describe the artifact and which can be used to query the Artifact Service to locate artifacts. Artifacts allow for loose coupling of workflowsโ€”for example, a downstream workflow can be configured to consume the latest result of an upstream workflow. With this higher-order abstraction, Union.ai aims to ease collaboration across teams, provide for reactivity and automation, and give you a broader view of how artifacts move across executions. ## Versioning Artifacts are uniquely identified and versioned by the following information: * Project * Domain * Artifact name * Artifact version You can set an artifact's name in code when you **Core concepts > Artifacts > Declaring artifacts** and the artifact version is automatically generated when the artifact is materialized as part of any task or workflow execution that emits an artifact with this name. Any execution of a task or workflow that emits an artifact creates a new version of that artifact. ## Partitions When you declare an artifact, you can define partitions for it that enable semantic grouping of artifacts. Partitions are metadata that take the form of key-value pairs, with the keys defined at registration time and the values supplied at runtime. You can specify up to 10 partition keys for an artifact. You can set an optional partition called `time_partition` to capture information about the execution timestamp to your desired level of granularity. For more information, see **Core concepts > Artifacts > Declaring artifacts**. > [!NOTE] > The `time_partition` partition is not enabled by default. To enable it, set `time_partitioned=True` in the artifact declaration. > For more information, see the **Core concepts > Artifacts > Declaring artifacts > Time-partitioned artifact**. ## Queries To consume an artifact in a workflow, you can define a query containing the artifactโ€™s name as well as any required partition values. You then supply the query as an input value to the workflow definition. At execution time, the query will return the most recent version of the artifact that meets the criteria by default. You can also query for a specific artifact version. For more information on querying for and consuming artifacts in workflows, see **Core concepts > Artifacts > Consuming artifacts in workflows**. To query for artifacts programmatically in a Python script using `UnionRemote`, see [UnionRemote](../../development-cycle/union-remote). > [!NOTE] `UnionRemote` vs `FlyteRemote` > `UnionRemote` is identical to `FlyteRemote`, with additional functionality to handle artifacts. > You cannot interact with artifacts using `FlyteRemote`. ## Lineage Once an artifact is materialized, its lineage is visible in the UI. For more information, see **Core concepts > Artifacts > Viewing artifacts**. ## Subpages - **Core concepts > Artifacts > Declaring artifacts** - **Core concepts > Artifacts > Materializing artifacts** - **Core concepts > Artifacts > Consuming artifacts in workflows** - **Core concepts > Artifacts > Connecting workflows with artifact event triggers** - **Core concepts > Artifacts > Viewing artifacts** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/artifacts/declaring-artifacts === # Declaring artifacts In order to define a task or workflow that emits an artifact, you must first declare the artifact and the keys for any **Core concepts > Artifacts > Partitions** you wish for it to have. For the `Artifact` class parameters and methods, see the [Artifact API documentation](). ## Basic artifact In the following example, an artifact called `BasicTaskData` is declared, along with a task that emits that artifact. Since it is a basic artifact, it doesn't have any partitions. > [!NOTE] > To use the example code on this page, you will need to add your `registry` > to the `pandas_image` ImageSpec block. ```python # basic.py import pandas as pd import union from typing_extensions import Annotated pandas_image = union.ImageSpec( packages=["pandas==2.2.2"] ) BasicTaskData = union.Artifact( name="my_basic_artifact" ) @union.task(container_image=pandas_image) def t1() -> Annotated[pd.DataFrame, BasicTaskData]: my_df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) return BasicTaskData.create_from(my_df) @union.workflow def wf() -> pd.DataFrame: return t1() ``` ## Time-partitioned artifact By default, time partitioning is not enabled for artifacts. To enable it, declare the artifact with `time_partitioned` set to `True`. You can optionally set the granularity for the time partition to `MINUTE`, `HOUR`, `DAY`, or `MONTH`; the default is `DAY`. You must also pass a value to `time_partition`, which you can do at runtime or by binding `time_partition` to an input. ### Passing a value to `time_partition` at runtime ```python # time_partition_runtime.py from datetime import datetime import pandas as pd import union from flytekit.core.artifact import Granularity from typing_extensions import Annotated pandas_image = union.ImageSpec( packages=["pandas==2.2.2"] ) BasicArtifact = union.Artifact( name="my_basic_artifact", time_partitioned=True, time_partition_granularity=Granularity.HOUR ) @union.task(container_image=pandas_image) def t1() -> Annotated[pd.DataFrame, BasicArtifact]: df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) dt = datetime.now() return BasicArtifact.create_from(df, time_partition=dt) @union.workflow def wf() -> pd.DataFrame: return t1() ``` ### Passing a value to `time_partition` by input ```python # time_partition_input.py from datetime import datetime import pandas as pd import union from flytekit.core.artifact import Granularity from typing_extensions import Annotated pandas_image = union.ImageSpec( packages=["pandas==2.2.2"] ) BasicArtifact = union.Artifact( name="my_basic_artifact", time_partitioned=True, time_partition_granularity=Granularity.HOUR ) @union.task(container_image=pandas_image) def t1(date: datetime) -> Annotated[pd.DataFrame, BasicArtifact]: df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) return BasicArtifact.create_from(df, time_partition=date) @union.workflow def wf(run_date: datetime): return t1(date=run_date) ``` ## Artifact with custom partition keys You can specify up to 10 custom partition keys when declaring an artifact. Custom partition keys can be set at runtime or be passed as inputs. ### Passing a value to a custom partition key at runtime ```python # partition_keys_runtime.py from datetime import datetime import pandas as pd import union from flytekit.core.artifact import Inputs, Granularity from typing_extensions import Annotated pandas_image = union.ImageSpec( packages=["pandas==2.2.2"] ) BasicArtifact = union.Artifact( name="my_basic_artifact", time_partitioned=True, time_partition_granularity=Granularity.HOUR, partition_keys=["key1"] ) @union.task(container_image=pandas_image) def t1( key1: str, date: datetime ) -> Annotated[pd.DataFrame, BasicArtifact(key1=Inputs.key1)]: df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) return BasicArtifact.create_from( df, time_partition=date ) @union.workflow def wf(): run_date = datetime.now() values = ["value1", "value2", "value3"] for value in values: t1(key1=value, date=run_date) ``` ### Passing a value to a custom partition key by input ```python # partition_keys_input.py from datetime import datetime import pandas as pd import union from flytekit.core.artifact import Inputs, Granularity from typing_extensions import Annotated pandas_image = union.ImageSpec( packages=["pandas==2.2.2"] ) BasicArtifact = union.Artifact( name="my_basic_artifact", time_partitioned=True, time_partition_granularity=Granularity.HOUR, partition_keys=["key1"] ) @union.task(container_image=pandas_image) def t1( key1: str, dt: datetime ) -> Annotated[pd.DataFrame, BasicArtifact(key1=Inputs.key1)]: df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) return BasicArtifact.create_from( df, time_partition=dt, key1=key1 ) @union.workflow def wf(dt: datetime, val: str): t1(key1=val, dt=dt) ``` ## Artifact with model card example You can attach a model card with additional metadata to your artifact, formatted in Markdown: ```python # model_card.py import pandas as pd import union from union.artifacts import ModelCard from typing_extensions import Annotated pandas_image = union.ImageSpec( packages=["pandas==2.2.2"] ) BasicArtifact = union.Artifact(name="my_basic_artifact") def generate_md_contents(df: pd.DataFrame) -> str: contents = "# Dataset Card\n" "\n" "## Tabular Data\n" contents = contents + df.to_markdown() return contents @union.task(container_image=pandas_image) def t1() -> Annotated[pd.DataFrame, BasicArtifact]: df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) return BasicArtifact.create_from( df, ModelCard(generate_md_contents(df)) ) @union.workflow def wf(): t1() ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/artifacts/materializing-artifacts === # Materializing artifacts You can materialize an artifact by executing the task or workflow that emits the artifact. In the example below, to materialize the `BasicArtifact` artifact, the `t1` task must be executed. The `wf` workflow runs the `t1` task three times with different values for the `key1` partition each time. Note that each time `t1` is executed, it emits a new version of the `BasicArtifact` artifact. > [!NOTE] > To use the example code on this page, you will need to add your `registry` to the `pandas_image` ImageSpec block. ```python # partition_keys_runtime.py from datetime import datetime import pandas as pd import union from flytekit.core.artifact import Inputs, Granularity from typing_extensions import Annotated pandas_image = union.ImageSpec( packages=["pandas==2.2.2"] ) BasicArtifact = union.Artifact( name="my_basic_artifact", time_partitioned=True, time_partition_granularity=Granularity.HOUR, partition_keys=["key1"] ) @union.task(container_image=pandas_image) def t1( key1: str, date: datetime ) -> Annotated[pd.DataFrame, BasicArtifact(key1=Inputs.key1)]: df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) return BasicArtifact.create_from( df, time_partition=date ) @union.workflow def wf(): run_date = datetime.now() values = ["value1", "value2", "value3"] for value in values: t1(key1=value, date=run_date) ``` > [!NOTE] > You can also materialize an artifact by executing the `create_artifact` method of `UnionRemote`. > For more information, see the [UnionRemote documentation](../../development-cycle/union-remote). === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/artifacts/consuming-artifacts-in-workflows === # Consuming artifacts in workflows ## Defining a workflow that consumes an artifact You can define a workflow that consumes an artifact by defining a query and passing it as an input to the consuming workflow. The following code defines a query, `data_query`, that searches across all versions of `BasicArtifact` that match the partition values. This query binds parameters to the workflow's `key1` and `time_partition` inputs and returns the most recent version of the artifact. > [!NOTE] > To use the example code on this page, you will need to add your `registry` to the `pandas_image` ImageSpec block. ```python # query.py from datetime import datetime import pandas as pd import union from flytekit.core.artifact import Inputs pandas_image = union.ImageSpec( packages=["pandas==2.2.2"] ) BasicArtifact = union.Artifact( name="my_basic_artifact" ) @union.task(container_image=pandas_image) def t1(key1: str, dt: datetime, data: pd.DataFrame): print(f"key1: {key1}") print(f"Date: {dt}") print(f"Data retrieved from query: {data}") data_query = BasicArtifact.query( time_partition=Inputs.dt, key1=Inputs.key1, ) @union.workflow def query_wf( key1: str, dt: datetime, data: pd.DataFrame = data_query ): t1(key1=key1, dt=dt, data=data) ``` You can also directly reference a particular artifact version in a query using the `get()` method: ```python data = BasicArtifact.get(//BasicArtifact@) ``` > [!NOTE] > For a full list of Artifact class methods, see the [Artifact API documentation](). ## Launching a workflow that consumes an artifact To launch a workflow that consumes an artifact as one of its inputs, navigate to the workflow in the UI and click **Launch Workflow**: ![Launch workflow UI with artifact query](../../../_static/images/user-guide/core-concepts/artifacts/consuming-artifacts-in-workflows/launch-workflow-artifact-query.png) In the `query_wf` example, the workflow takes three inputs: `key1`, `dt`, and a `BasicArtifact` artifact query. In order to create the workflow execution, you would enter values for `key1` and `dt` and click **Launch**. The artifacts service will supply the latest version of the `BasicData` artifact that meets the partition query criteria. You can also override the artifact query from the launch form by clicking **Override**, directly supplying the input that the artifact references (in this case, a blob store URI), and clicking **Launch**: ![Launch workflow UI with artifact query override](../../../_static/images/user-guide/core-concepts/artifacts/consuming-artifacts-in-workflows/launch-workflow-artifact-query-override.png) === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/artifacts/connecting-workflows-with-artifact-event-triggers === # Connecting workflows with artifact event triggers In the following example, we define an upstream workflow and a downstream workflow, and define a **Core concepts > Launch plans > Reactive workflows** in a launch plan to connect the two workflows via an **Core concepts > Launch plans > Reactive workflows > Trigger types > Artifact events**. ## Imports > [!NOTE] > To use the example code on this page, you will need to add your `registry` to the `pandas_image` ImageSpec block. First we import the required packages: ```python from datetime import datetime import pandas as pd import union from union.artifacts import OnArtifact from flytekit.core.artifact import Inputs from typing_extensions import Annotated ``` ## Upstream artifact and workflow definition Then we define an upstream artifact and a workflow that emits a new version of `UpstreamArtifact` when executed: ```python UpstreamArtifact = union.Artifact( name="my_upstream_artifact", time_partitioned=True, partition_keys=["key1"], ) @union.task(container_image=pandas_image) def upstream_t1(key1: str) -> Annotated[pd.DataFrame, UpstreamArtifact(key1=Inputs.key1)]: dt = datetime.now() my_df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) return UpstreamArtifact.create_from(my_df, key1=key1, time_partition=dt) @union.workflow def upstream_wf() -> pd.DataFrame: return upstream_t1(key1="value1") ``` ## Artifact event definition Next we define the artifact event that will link the upstream and downstream workflows together: ```python on_upstream_artifact = OnArtifact( trigger_on=UpstreamArtifact, ) ``` ## Downstream workflow definition Then we define the downstream task and workflow that will be triggered when the upstream artifact is created: ```python @union.task def downstream_t1(): print("Downstream task triggered") @union.workflow def downstream_wf(): downstream_t1() ``` ## Launch plan with trigger definition Finally, we create a launch plan with a trigger set to an `OnArtifact` object to link the two workflows via the `Upstream` artifact. The trigger will initiate an execution of the downstream `downstream_wf` workflow upon the creation of a new version of the `Upstream` artifact. ```python downstream_triggered = union.LaunchPlan.create( "downstream_with_trigger_lp", downstream_wf, trigger=on_upstream_artifact ) ``` > [!NOTE] > The `OnArtifact` object must be attached to a launch plan in order for the launch plan to be triggered by the creation of a new version of the artifact. ## Full example code Here is the full example code file: ```python # trigger_on_artifact.py from datetime import datetime import pandas as pd import union from union.artifacts import OnArtifact from flytekit.core.artifact import Inputs from typing_extensions import Annotated pandas_image = union.ImageSpec( packages=["pandas==2.2.2"] ) UpstreamArtifact = union.Artifact( name="my_upstream_artifact", time_partitioned=True, partition_keys=["key1"], ) @union.task(container_image=pandas_image) def upstream_t1(key1: str) -> Annotated[pd.DataFrame, UpstreamArtifact(key1=Inputs.key1)]: dt = datetime.now() my_df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) return UpstreamArtifact.create_from(my_df, key1=key1, time_partition=dt) @union.workflow def upstream_wf() -> pd.DataFrame: return upstream_t1(key1="value1") on_upstream_artifact = OnArtifact( trigger_on=UpstreamArtifact, ) @union.task def downstream_t1(): print("Downstream task triggered") @union.workflow def downstream_wf(): downstream_t1() downstream_triggered = union.LaunchPlan.create( "downstream_with_trigger_lp", downstream_wf, trigger=on_upstream_artifact ) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/artifacts/viewing-artifacts === # Viewing artifacts ## Artifacts list Artifacts can be viewed in the UI by navigating to the artifacts app in the left sidebar: ![Artifacts overview](../../../_static/images/user-guide/core-concepts/artifacts/viewing-artifacts/artifacts-list.png) ## Artifact view Selecting a specific artifact from the artifact list will take you to that artifact's **Overview** page: ![Single artifact overview](../../../_static/images/user-guide/core-concepts/artifacts/viewing-artifacts/artifact-view.png) Here you can see relevant metadata about the artifact, including: * Its version * Its partitions * The task or workflow that produced it * Its creation time * Its object store URI * Code for accessing the artifact via [UnionRemote](../../development-cycle/union-remote) You can also view the artifact's object structure, model card, and lineage graph. ### Artifact lineage graph Once an artifact is materialized, you can view its lineage in the UI, including the specific upstream task or workflow execution that created it, and any downstream workflows that consumed it. You can traverse the lineage graph by clicking between artifacts and inspecting any relevant workflow executions in order to understand and reproduce any step in the AI development process. ![Artifact lineage overview](../../../_static/images/user-guide/core-concepts/artifacts/viewing-artifacts/artifact-lineage.png) You can navigate through the lineage graph by clicking from artifact to artifact. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/serving === # App Serving Union.ai lets you build and serve your own web apps, enabling you to build: - **Model endpoints** with generic web frameworks like FastAPI or optimized inference frameworks like vLLM and SGLang. - **AI inference-time** components like MCP servers, ephemeral agent memory state stores, etc. - **Interactive dashboards** and other interfaces to interact with and visualize data and models from your workflows using frameworks like Streamlit, Gradio, Tensorboard, FastHTML, Dash, Panel, Voila, FiftyOne. - **Flyte Connectors**, which are **Core concepts > App Serving > light-weight, long running services** that connect to external services like OpenAI, BigQuery, and Snowflake. - **Any other web services** like **Serving > Custom Webhooks** that can be implemented via web frameworks like FastAPI, Starlette. ## Example app We will start with a simple Streamlit app. In this case we will use the default Streamlit "Hello, World!" app. In a local directory, create the following file: ```shell โ””โ”€โ”€ app.py ``` ## App declaration The file `app.py` contains the app declaration: ```python """A simple Union.ai app using Streamlit""" import union import os # The `ImageSpec` for the container that will run the `App`. # `union-runtime` must be declared as a dependency, # in addition to any other dependencies needed by the app code. # Use Union remote Image builder to build the app container image image = union.ImageSpec( name="streamlit-app", packages=["union-runtime>=0.1.18", "streamlit==1.51.0"], builder="union" ) # The `App` declaration. # Uses the `ImageSpec` declared above. # In this case we do not need to supply any app code # as we are using the built-in Streamlit `hello` app. app = union.app.App( name="streamlit-hello", container_image=image, args="streamlit hello --server.port 8080", port=8080, limits=union.Resources(cpu="1", mem="1Gi"), ) ``` Here the `App` constructor is initialized with the following parameters: * `name`: The name of the app. This name will be displayed in app listings (via CLI and UI) and used to refer to the app when deploying and stopping. * `container_image`: The container image that will be used to for the container that will run the app. Here we use a prebuilt container provided by Union.ai that support Streamlit. * `args`: The command that will be used within the container to start the app. The individual strings in this array will be concatenated and the invoked as a single command. * `port`: The port of the app container from which the app will be served. * `limits`: A `union.Resources` object defining the resource limits for the app container. The same object is used for the same purpose in the `@union.task` decorator in Union.ai workflows. See **Core concepts > Tasks > Task hardware environment > Customizing task resources > The `requests` and `limits` settings** for details. The parameters above are the minimum needed to initialize the app. There are a few additional available parameters that we do not use in this example (but we will cover later): * `include`: A list of files to be added to the container at deployment time, containing the custom code that defines the specific functionality of your app. * `inputs`: A `List` of `union.app.Input` objects. Used to provide default inputs to the app on startup. * `requests`: A `union.Resources` object defining the resource requests for the app container. The same object is used for the same purpose in the `@union.task` decorator in Union.ai workflows (see **Core concepts > Tasks > Task hardware environment > Customizing task resources > The `requests` and `limits` settings** for details). * `min_replicas`: The minimum number of replica containers permitted for this app. This defines the lower bound for auto-scaling the app. The default is 0 . * `max_replicas`: The maximum number of replica containers permitted for this app. This defines the upper bound for auto-scaling the app. The default is 1 . ## Deploy the app Deploy the app with: ```shell $ union deploy apps APP_FILE APP_NAME ``` * `APP_FILE` is the Python file that contains one or more app declarations. * `APP_NAME` is the name of (one of) the declared apps in APP_FILE. The name of an app is the value of the `name` parameter passed into the `App` constructor. If an app with the name `APP_NAME` does not yet exist on the system then this command creates that app and starts it. If an app by that name already exists then this command stops the app, updates its code and restarts it. In this case, we do the following: ```shell $ union deploy apps app.py streamlit-hello ``` This will return output like the following: ```shell โœจ Creating Application: streamlit-demo Created Endpoint at: https://withered--firefly--8ca31.apps.demo.hosted.unionai.cloud/ ``` Click on the displayed endpoint to go to the app: ![A simple app](../../../_static/images/user-guide/core-concepts/serving/streamlit-hello.png) ## Viewing deployed apps Go to **Apps** in the left sidebar in Union.ai to see a list of all your deployed apps: ![Apps list](../../../_static/images/user-guide/core-concepts/serving/apps-list.png) To connect to an app click on its **Endpoint**. To see more information about the app, click on its **Name**. This will take you to the **App view**: ![App view](../../../_static/images/user-guide/core-concepts/serving/app-view.png) Buttons to **Copy Endpoint** and **Start app** are available at the top of the view. You can also view all apps deployed in your Union.ai instance from the command-line with: ```shell $ union get apps ``` This will display the app list: ```shell โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”“ โ”ƒ Name โ”ƒ Link โ”ƒ Status โ”ƒ Desired State โ”ƒ CPU โ”ƒ Memory โ”ƒ โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ โ”‚ streamlit-query-2 โ”‚ Click Here โ”‚ Started โ”‚ Stopped โ”‚ 2 โ”‚ 2Gi โ”‚ โ”‚ streamlit-demo-1 โ”‚ Click Here โ”‚ Started โ”‚ Started โ”‚ 3 โ”‚ 2Gi โ”‚ โ”‚ streamlit-query-3 โ”‚ Click Here โ”‚ Started โ”‚ Started โ”‚ 2 โ”‚ 2Gi โ”‚ โ”‚ streamlit-demo โ”‚ Click Here โ”‚ Unassigned โ”‚ Started โ”‚ 2 โ”‚ 2Gi โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ## Stopping apps To stop an app from the command-line, perform the following command: ```shell $ union stop apps --name APP_NAME ``` `APP_NAME` is the name of an app deployed on the Union.ai instance. ## Subpages - **Core concepts > App Serving > Serving custom code** - **Core concepts > App Serving > Serving a Model from a Workflow With FastAPI** - **Core concepts > App Serving > API Key Authentication with FastAPI** - **Core concepts > App Serving > Cache a HuggingFace Model as an Artifact** - **Core concepts > App Serving > Deploy Optimized LLM Endpoints with vLLM and SGLang** - **Core concepts > App Serving > Deploying Custom Flyte Connectors** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/serving/adding-your-own-code === # Serving custom code In the introductory section we saw how to define and deploy a simple Streamlit app. The app deployed was the default hello world Streamlit example app. In this section, we will expand on this by adding our own custom code to the app. ## Example app We will initialize the app in `app.py` as before, but now we will add two files containing our own code, `main.py` and `utils.py`. In a local directory, create the following files: ```shell โ”œโ”€โ”€ app.py โ”œโ”€โ”€ main.py โ””โ”€โ”€ utils.py ``` ## App declaration The file `app.py` contains the app declaration: ```python """A Union.ai app with custom code""" import os import union # The `ImageSpec` for the container that will run the `App`. # `union-runtime` must be declared as a dependency, # in addition to any other dependencies needed by the app code. # Set the environment variable `REGISTRY` to be the URI for your container registry. # If you are using `ghcr.io` as your registry, make sure the image is public. image = union.ImageSpec( name="streamlit-app", packages=["streamlit==1.51.0", "union-runtime>=0.1.18", "pandas==2.2.3", "numpy==2.2.3"], builder="union" ) # The `App` declaration. # Uses the `ImageSpec` declared above. # Your core logic of the app resides in the files declared # in the `include` parameter, in this case, `main.py` and `utils.py`. app = union.app.App( name="streamlit-custom-code", container_image=image, args="streamlit run main.py --server.port 8080", port=8080, include=["main.py", "utils.py"], limits=union.Resources(cpu="1", mem="1Gi"), ) ``` Compared to the first example we have added one more parameter: * `include`: A list of files to be added to the container at deployment time, containing the custom code that defines the specific functionality of your app. ## Custom code In this example we include two files containing custom logic: `main.py` and `utils.py`. The file `main.py` contains the bulk of our custom code: ```python """Streamlit App that plots data""" import streamlit as st from utils import generate_data all_columns = ["Apples", "Orange", "Pineapple"] with st.container(border=True): columns = st.multiselect("Columns", all_columns, default=all_columns) all_data = st.cache_data(generate_data)(columns=all_columns, seed=101) data = all_data[columns] tab1, tab2 = st.tabs(["Chart", "Dataframe"]) tab1.line_chart(data, height=250) tab2.dataframe(data, height=250, use_container_width=True) ``` The file `utils.py` contains a supporting data generating function that is imported into the file above ```python """Function to generate sample data.""" import numpy as np import pandas as pd def generate_data(columns: list[str], seed: int = 42): rng = np.random.default_rng(seed) data = pd.DataFrame(rng.random(size=(20, len(columns))), columns=columns) return data ``` ## Deploy the app Deploy the app with: ```shell $ union deploy apps app.py streamlit-custom-code ``` The output displays the console URL and endpoint for the Streamlit app: ```shell โœจ Deploying Application: streamlit-custom-code ๐Ÿ”Ž Console URL: https:///org/... [Status] Pending: OutOfDate: The Configuration is still working to reflect the latest desired specification. [Status] Started: Service is ready ๐Ÿš€ Deployed Endpoint: https://.apps. ``` Navigate to the endpoint to see the Streamlit App! ![Streamlit App](../../../_static/images/user-guide/core-concepts/serving/custom-code-streamlit.png) ## App deployment with included files When a new app is deployed for the first time (i.e., there is no app registered with the specified `name`), a container is spun up using the specified `container_image` and the files specified in `include` are copied into the container. The `args` is the then executed in the container, starting the app. If you alter the `include` code you need to re-deploy your app. When `union deploy apps` is called using an app name that corresponds to an already existing app, the app code is updated in the container and the app is restarted. You can iterate on your app easily by changing your `include` code and re-deploying. Because there is a slight performance penalty involved in copying the `include` files into the container, you may wish to consolidate you code directly into custom-built image once you have successfully iterated to production quality. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/serving/serving-a-model === # Serving a Model from a Workflow With FastAPI In this section, we create a Union.ai app to serve a scikit-learn model created by a Union.ai workflow using `FastAPI`. ## Example app In this example, we first use a Union.ai workflow to train a model and output it as a Union.ai `Artifact`. We then use a Union.ai app to serve the model using `FastAPI`. In a local directory, create the following files: ```shell โ”œโ”€โ”€ app.py โ”œโ”€โ”€ main.py โ””โ”€โ”€ train_wf.py ``` ## App configuration In the code below, we declare the resources, runtime image, and FastAPI app that exposes a `/predict` endpoint. ```python """A Union.ai app that uses FastAPI to serve model created by a Union.ai workflow.""" import os import union import joblib from fastapi import FastAPI SklearnModel = union.Artifact(name="sklearn-model") # The `ImageSpec` for the container that will run the `App`, where `union-runtime` # must be declared as a dependency. In addition to any other dependencies needed # by the app code. Set the environment variable `REGISTRY` to be the URI for your # container registry. If you are using `ghcr.io` as your registry, make sure the # image is public. image_spec = union.ImageSpec( name="union-serve-sklearn-fastapi", packages=["union-runtime>=0.1.18", "scikit-learn==1.5.2", "joblib==1.5.1", "fastapi[standard]"], builder="union" ) ml_models = {} @asynccontextmanager async def lifespan(app: FastAPI): model_file = os.getenv("SKLEARN_MODEL") ml_models["model"] = joblib.load(model_file) yield app = FastAPI(lifespan=lifespan) # The `App` declaration, which uses the `ImageSpec` declared above. # Your core logic of the app resides in the files declared in the `include` # parameter, in this case, `main.py`. Input artifacts are declared in the # `inputs` parameter fast_api_app = union.app.App( name="simple-fastapi-sklearn", inputs=[ union.app.Input( value=SklearnModel.query(), download=True, env_var="SKLEARN_MODEL", ) ], container_image=image_spec, framework_app=app, limits=union.Resources(cpu="1", mem="1Gi"), port=8082, ) @app.get("/predict") async def predict(x: float, y: float) -> float: result = ml_models["model"]([[x, y]]) return {"result": result} ``` Note that the Artifact is provided as an `Input` to the App definition. With `download=True`, the model is downloaded to the container's working directory. The full local path to the model is set to `SKLEARN_MODEL` by the runtime. During startup, the FastAPI app loads the model using the `SKLEARN_MODEL` environment variable. Then it serves an endpoint at `/predict` that takes two float inputs and returns a float result. ## Training workflow The training workflow trains a random forest regression and saves it to a Union.ai `Artifact`. ```python """A Union.ai workflow that trains a model.""" import os from pathlib import Path from typing import Annotated import joblib from sklearn.datasets import make_regression from sklearn.ensemble import RandomForestRegressor import union # Declare the `Artifact`. SklearnModel = union.Artifact(name="sklearn-model") # The `ImageSpec` for the container that runs the tasks. # Set the environment variable `REGISTRY` to be the URI for your container registry. # If you are using `ghcr.io` as your registry, make sure the image is public. image_spec = union.ImageSpec( packages=["scikit-learn==1.5.2", "joblib==1.5.1"], builder="union" ) # The `task` that trains a `RandomForestRegressor` model. @union.task( limits=union.Resources(cpu="2", mem="2Gi"), container_image=image_spec, ) def train_model() -> Annotated[union.FlyteFile, SklearnModel]: """Train a RandomForestRegressor model and save it as a file.""" X, y = make_regression(n_features=2, random_state=42) working_dir = Path(union.current_context().working_directory) model_file = working_dir / "model.joblib" rf = RandomForestRegressor().fit(X, y) joblib.dump(rf, model_file) return model_file ``` ## Run the example To run this example you will need to register and run the workflow first: ```shell $ union run --remote train_wf.py train_model ``` This task trains a `RandomForestRegressor`, saves it to a file, and uploads it to a Union.ai `Artifact`. This artifact is retrieved by the FastAPI app for serving the model. ![scikit-learn Artifact](../../../_static/images/user-guide/core-concepts/serving/fastapi-sklearn/sklearn-artifact.png) Once the workflow has completed, you can deploy the app: ```shell $ union deploy apps app.py simple-fastapi-sklearn ``` The output displays the console URL and endpoint for the FastAPI App: ```shell โœจ Deploying Application: simple-fastapi-sklearn ๐Ÿ”Ž Console URL: https:///org/... [Status] Pending: OutOfDate: The Configuration is still working to reflect the latest desired specification. [Status] Pending: IngressNotConfigured: Ingress has not yet been reconciled. [Status] Pending: Uninitialized: Waiting for load balancer to be ready [Status] Started: Service is ready ๐Ÿš€ Deployed Endpoint: https://.apps. ``` You can see the Swagger docs of the FastAPI endpoint, by going to `/docs`: ![scikit-learn FastAPI App](../../../_static/images/user-guide/core-concepts/serving/fastapi-sklearn/sklearn-fastapi.png) === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/serving/fast-api-auth === # API Key Authentication with FastAPI In this guide, we'll deploy a FastAPI app that uses API key authentication. This allows you to invoke the endpoint from the public internet in a secure manner. ## Define the Fast API app First we define the `ImageSpec` for the runtime image: ```python import os from union import ImageSpec, Resources, Secret from union.app import App image_spec = ImageSpec( name="fastapi-with-auth-image", builder="union", packages=["union-runtime>=0.1.18", "fastapi[standard]==0.115.11", "union>=0.1.150"], ) ``` Then we define a simple FastAPI app that uses `HTTPAuthorizationCredentials` to authenticate requests. ```python import os from fastapi import FastAPI, HTTPException, Security, status, Depends from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer from typing import Annotated from union import UnionRemote app = FastAPI() fast_api_app = union.app.App( name="fastapi-with-auth", secrets=[ union.Secret(key="AUTH_API_KEY", env_var="AUTH_API_KEY"), union.Secret(key="MY_UNION_API_KEY", env_var="UNION_API_KEY"), ], container_image=image_spec, framework_app=app, limits=union.Resources(cpu="1", mem="1Gi"), port=8082, requires_auth=False, ) async def verify_token( credentials: HTTPAuthorizationCredentials = Security(HTTPBearer()), ) -> HTTPAuthorizationCredentials: auth_api_key = os.getenv("AUTH_API_KEY") if credentials.credentials != AUTH_API_KEY: raise HTTPException( status_code=status.HTTP_403_FORBIDDEN, detail="Could not validate credentials", ) return credentials @app.get("/") def root( credentials: Annotated[HTTPAuthorizationCredentials, Depends(verify_token)], ): return {"message": "Hello, World!"} ``` As you can see, we define a `FastAPI` app and provide it as an input in the `union.app.App` definition. Then, we define a `verify_token` function that verifies the API key. Finally, we define a root endpoint that uses the `verify_token` function to authenticate requests. Note that we are also requesting for two secrets: - The `AUTH_API_KEY` is used by the FastAPI app to authenticate the webhook. - The `MY_UNION_API_KEY` is used to authenticate UnionRemote with Union. With `requires_auth=False`, you can reach the endpoint without going through Unionโ€™s authentication, which is okay since we are rolling our own `AUTH_API_KEY`. Before we can deploy the app, we create the secrets required by the application: ```bash union create secret --name AUTH_API_KEY ``` Next, to create the MY_UNION_API_KEY secret, we need to first create a admin api-key: ```bash union create admin-api-key --name MY_UNION_API_KEY ``` ## Deploy the Fast API app Finally, you can now deploy the FastAPI app: ```bash union deploy apps app.py fastapi-with-auth ``` Deploying the application will stream the status to the console: ``` Image ghcr.io/.../webhook-serving:KXwIrIyoU_Decb0wgPy23A found. Skip building. โœจ Deploying Application: fastapi-webhook ๐Ÿ”Ž Console URL: https:///console/projects/thomasjpfan/domains/development/apps/fastapi-webhook [Status] Pending: App is pending deployment [Status] Pending: RevisionMissing: Configuration "fastapi-webhook" is waiting for a Revision to become ready. [Status] Pending: IngressNotConfigured: Ingress has not yet been reconciled. [Status] Pending: Uninitialized: Waiting for load balancer to be ready [Status] Started: Service is ready ๐Ÿš€ Deployed Endpoint: https://rough-meadow-97cf5.apps. ``` Then to invoke the endpoint, you can use the following curl command: ```bash curl -X GET "https://rough-meadow-97cf5.apps./" \ -H "Authorization: Bearer " ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/serving/cache-huggingface-model === # Cache a HuggingFace Model as an Artifact This guide shows you how to cache HuggingFace models as Union Artifacts. The **Union CLI > `union` CLI commands > `cache` > `model-from-hf`** command allows you to automatically download and cache models from HuggingFace Hub as Union Artifacts. This is particularly useful for serving large language models (LLMs) and other AI models efficiently in production environments. ## Why Cache Models from HuggingFace? Caching models from HuggingFace Hub as Union Artifacts provides several key benefits: - **Faster Model Downloads**: Once cached, models load much faster since they're stored in Union's optimized blob storage. - **Stream model weights into GPU memory**: Union's **Core concepts > App Serving > Cache a HuggingFace Model as an Artifact > `SGLangApp`** and **Core concepts > App Serving > Cache a HuggingFace Model as an Artifact > `VLLMApp`** classes also allow you to load model weights directly into GPU memory instead of downloading the weights to disk first, then loading to GPU memory. - **Reliability**: Eliminates dependency on HuggingFace Hub availability during model serving. - **Cost Efficiency**: Reduces repeated downloads and bandwidth costs from HuggingFace Hub. - **Version Control**: Each cached model gets a unique artifact ID for reproducible deployments. - **Sharding Support**: Large models can be automatically sharded for distributed inference. - **Streaming**: Models can be streamed directly from blob storage to GPU memory. ## Prerequisites Before using the `union cache model-from-hf` command, you need to set up authentication: 1. **Create a HuggingFace API Token**: - Go to [HuggingFace Settings](https://huggingface.co/settings/tokens) - Create a new token with read access - Store it as a Union secret: ```bash union create secret --name HUGGINGFACE_TOKEN ``` 2. **Create a Union API Key** (optional): ```bash union create api-key admin --name MY_API_KEY union create secret --name MY_API_KEY ``` If you don't want to create a Union API key, Union tenants typically ship with a `EAGER_API_KEY` secret, which is an internally-provision Union API key that you can use for the purpose of caching HuggingFace models. ## Basic Example: Cache a Model As-Is The simplest way to cache a model is to download it directly from HuggingFace without any modifications: ```bash union cache model-from-hf Qwen/Qwen2.5-0.5B-Instruct \ --hf-token-key HUGGINGFACE_TOKEN \ --union-api-key EAGER_API_KEY \ --artifact-name qwen2-5-0-5b-instruct \ --cpu 2 \ --mem 8Gi \ --ephemeral-storage 10Gi \ --wait ``` ### Command Breakdown - `Qwen/Qwen2.5-0.5B-Instruct`: The HuggingFace model repository - `--hf-token-key HUGGINGFACE_TOKEN`: Union secret containing your HuggingFace API token - `--union-api-key EAGER_API_KEY`: Union secret with admin permissions - `--artifact-name qwen2-5-0-5b-instruct`: Custom name for the cached artifact. If not provided, the model repository name is lower-cased and `.` characters are replaced with `-`. - `--cpu 2`: CPU resources for downloading the caching - `--mem 8Gi`: Memory resources for downloading and caching - `--ephemeral-storage 10Gi`: Temporary storage for the download process - `--wait`: Wait for the caching process to complete ### Output When the command runs, you'll see outputs like this: ``` ๐Ÿ”„ Started background process to cache model from Hugging Face repo Qwen/Qwen2.5-0.5B-Instruct. Check the console for status at https://acme.union.ai/console/projects/flytesnacks/domains/development/executions/a5nr2 g79xb9rtnzczqtp ``` You can then visit the URL to see the model caching workflow on the Union UI. If you provide the `--wait` flag to the `union cache model-from-hf` command, the command will wait for the model to be cached and then output additional information: ``` Cached model at: /tmp/flyte-axk70dc8/sandbox/local_flytekit/50b27158c2bb42efef8e60622a4d2b6d/model_snapshot Model Artifact ID: flyte://av0.2/acme/flytesnacks/development/qwen2-5-0-5b-instruct@322a60c7ba4df41621be528a053f3b1a To deploy this model run: union deploy model --project None --domain development flyte://av0.2/acme/flytesnacks/development/qwen2-5-0-5b-instruct@322a60c7ba4df41621be528a053f3b1a ``` ## Using Cached Models in Applications Once you have cached a model, you can use it in your Union serving apps: ### VLLM App Example ```python import os from union import Artifact, Resources from union.app.llm import VLLMApp from flytekit.extras.accelerators import L4 # Use the cached model artifact Model = Artifact(name="qwen2-5-0-5b-instruct") vllm_app = VLLMApp( name="vllm-app-3", requests=Resources(cpu="12", mem="24Gi", gpu="1"), accelerator=L4, model=Model.query(), # Query the cached artifact model_id="qwen2", scaledown_after=300, stream_model=True, port=8084, ) ``` ### SGLang App Example ```python import os from union import Artifact, Resources from union.app.llm import SGLangApp from flytekit.extras.accelerators import L4 # Use the cached model artifact Model = Artifact(name="qwen2-5-0-5b-instruct") sglang_app = SGLangApp( name="sglang-app-3", requests=Resources(cpu="12", mem="24Gi", gpu="1"), accelerator=L4, model=Model.query(), # Query the cached artifact model_id="qwen2", scaledown_after=300, stream_model=True, port=8000, ) ``` ## Advanced Example: Sharding a Model with the vLLM Engine For large models that require distributed inference, you can use the `--shard-config` option to automatically shard the model using the [vLLM](https://docs.vllm.ai/en/latest/) inference engine. ### Create a Shard Configuration File Create a YAML file (e.g., `shard_config.yaml`) with the sharding parameters: ```yaml engine: vllm args: model: unsloth/Llama-3.3-70B-Instruct tensor_parallel_size: 4 gpu_memory_utilization: 0.9 extra_args: max_model_len: 16384 ``` The `shard_config.yaml` file is a YAML file that should conform to the **Core concepts > App Serving > Cache a HuggingFace Model as an Artifact > `remote.ShardConfig`** dataclass, where the `args` field contains configuration that's forwarded to the underlying inference engine. Currently, only the `vLLM` engine is supported for sharding, so the `args` field should conform to the **Core concepts > App Serving > Cache a HuggingFace Model as an Artifact > `remote.VLLMShardArgs`** dataclass. ### Cache the Sharded Model ```bash union cache model-from-hf unsloth/Llama-3.3-70B-Instruct \ --hf-token-key HUGGINGFACE_TOKEN \ --union-api-key EAGER_API_KEY \ --artifact-name llama-3-3-70b-instruct-sharded \ --cpu 36 \ --gpu 4 \ --mem 300Gi \ --ephemeral-storage 300Gi \ --accelerator nvidia-l40s \ --shard-config shard_config.yaml \ --project flytesnacks \ --domain development \ --wait ``` ## Best Practices When caching models without sharding 1. **Resource Sizing**: Allocate sufficient resources for the model size: - Small models (< 1B): 2-4 CPU, 4-8Gi memory - Medium models (1-7B): 4-8 CPU, 8-16Gi memory - Large models (7B+): 8+ CPU, 16Gi+ memory 2. **Sharding for Large Models**: Use tensor parallelism for models > 7B parameters: - 7-13B models: 2-4 GPUs - 13-70B models: 4-8 GPUs - 70B+ models: 8+ GPUs 3. **Storage Considerations**: Ensure sufficient ephemeral storage for the download process === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/serving/deploy-optimized-llm-endpoints === # Deploy Optimized LLM Endpoints with vLLM and SGLang This guide shows you how to deploy high-performance LLM endpoints using SGLang and vLLM. It also shows how to use Union's optimized serving images that are designed to reduce cold start times and provide efficient model serving capabilities. For information on how to cache models from HuggingFace Hub as Union Artifacts, see the **Core concepts > App Serving > Cache a HuggingFace Model as an Artifact** guide. ## Overview Union provides two specialized app classes for serving high-performance LLM endpoints: - ****Core concepts > App Serving > Deploy Optimized LLM Endpoints with vLLM and SGLang > `SGLangApp`****: uses [SGLang](https://docs.sglang.ai/), a fast serving framework for large language models and vision language models. - ****Core concepts > App Serving > Deploy Optimized LLM Endpoints with vLLM and SGLang > `VLLMApp`****: uses [vLLM](https://docs.vllm.ai/en/latest/), a fast and easy-to-use library for LLM inference and serving. By default, both classes provide: - **Reduced cold start times** through optimized image loading. - **Fast model loading** by streaming model weights directly from blob storage to GPU memory. - **Distributed inference** with options for shared memory and tensor parallelism. You can also serve models with other frameworks like **Core concepts > App Serving > Serving a Model from a Workflow With FastAPI**, but doing so would require more effort to achieve high performance, whereas vLLM and SGLang provide highly performant LLM endpoints out of the box. ## Basic Example: Deploy a Non-Sharded Model ### Deploy with vLLM Assuming that you have followed the guide to **Core concepts > App Serving > Cache a HuggingFace Model as an Artifact** and have a model artifact named `qwen2-5-0-5b-instruct`, you can deploy a simple LLM endpoint with the following code: ```python # vllm_app.py import union from union.app.llm import VLLMApp from flytekit.extras.accelerators import L4 # Reference the cached model artifact Model = union.Artifact(name="qwen2-5-0-5b-instruct") # Deploy with default image vllm_app = VLLMApp( name="vllm-app", requests=union.Resources(cpu="12", mem="24Gi", gpu="1"), accelerator=L4, model=Model.query(), # Query the cached artifact model_id="qwen2", scaledown_after=300, stream_model=True, # Enable streaming for faster loading port=8084, requires_auth=False, ) ``` To use the optimized image, use the `OPTIMIZED_VLLM_IMAGE` variable: ```python from union.app.llm import OPTIMIZED_VLLM_IMAGE vllm_app = VLLMApp( name="vllm-app", container_image=OPTIMIZED_VLLM_IMAGE, ... ) ``` Here we're using a single L4 GPU to serve the model and specifying `stream_model=True` to stream the model weights directly to GPU memory. Deploy the app: ```bash union deploy apps vllm_app.py vllm-app ``` ### Deploy with SGLang ```python # sglang_app.py import union from union.app.llm import SGLangApp from flytekit.extras.accelerators import L4 # Reference the cached model artifact Model = union.Artifact(name="qwen2-5-0-5b-instruct") # Deploy with default image sglang_app = SGLangApp( name="sglang-app", requests=union.Resources(cpu="12", mem="24Gi", gpu="1"), accelerator=L4, model=Model.query(), # Query the cached artifact model_id="qwen2", scaledown_after=300, stream_model=True, # Enable streaming for faster loading port=8000, requires_auth=False, ) ``` To use the optimized image, use the `OPTIMIZED_SGLANG_IMAGE` variable: ```python from union.app.llm import OPTIMIZED_SGLANG_IMAGE sglang_app = SGLangApp( name="sglang-app", container_image=OPTIMIZED_SGLANG_IMAGE, ... ) ``` Deploy the app: ```bash union deploy apps sglang_app.py sglang-app ``` ## Custom Image Example: Deploy with Your Own Image If you need more control over the serving environment, you can define a custom `ImageSpec`. For vLLM apps, that would look like this: ```python import union from union.app.llm import VLLMApp from flytekit.extras.accelerators import L4 # Reference the cached model artifact Model = union.Artifact(name="qwen2-5-0-5b-instruct") # Define custom optimized image image = union.ImageSpec( name="vllm-serving-custom", builder="union", apt_packages=["build-essential"], packages=["union[vllm]>=0.1.189"], env={ "NCCL_DEBUG": "INFO", "CUDA_LAUNCH_BLOCKING": "1", }, ) # Deploy with custom image vllm_app = VLLMApp( name="vllm-app-custom", container_image=image, ... ) ``` And for SGLang apps, it would look like this: ```python # sglang_app.py import union from union.app.llm import SGLangApp from flytekit.extras.accelerators import L4 # Reference the cached model artifact Model = union.Artifact(name="qwen2-5-0-5b-instruct") # Define custom optimized image image = union.ImageSpec( name="sglang-serving-custom", builder="union", python_version="3.12", apt_packages=["build-essential"], packages=["union[sglang]>=0.1.189"], ) # Deploy with custom image sglang_app = SGLangApp( name="sglang-app-custom", container_image=image, ... ) ``` This allows you to control the exact package versions in the image, but at the cost of increased cold start times. This is because the Union images are optimized with [Nydus](https://github.com/dragonflyoss/nydus), which reduces the cold start time by streaming container image layers. This allows the container to start before the image is fully downloaded. ## Advanced Example: Deploy a Sharded Model For large models that require distributed inference, deploy using a sharded model artifact: ### Cache a Sharded Model First, cache a large model with sharding (see **Core concepts > App Serving > Cache a HuggingFace Model as an Artifact > Advanced Example: Sharding a Model with the vLLM Engine** for details). First create a shard configuration file: ```yaml # shard_config.yaml engine: vllm args: model: unsloth/Llama-3.3-70B-Instruct tensor_parallel_size: 4 gpu_memory_utilization: 0.9 extra_args: max_model_len: 16384 ``` Then cache the model: ```bash union cache model-from-hf unsloth/Llama-3.3-70B-Instruct \ --hf-token-key HUGGINGFACE_TOKEN \ --union-api-key EAGER_API_KEY \ --artifact-name llama-3-3-70b-instruct-sharded \ --cpu 36 \ --gpu 4 \ --mem 300Gi \ --ephemeral-storage 300Gi \ --accelerator nvidia-l40s \ --shard-config shard_config.yaml \ --project flytesnacks \ --domain development \ --wait ``` ### Deploy with VLLMApp Once the model is cached, you can deploy it to a vLLM app: ```python # vllm_app_sharded.py from flytekit.extras.accelerators import L40S from union import Artifact, Resources from union.app.llm import VLLMApp # Reference the sharded model artifact LLMArtifact = Artifact(name="llama-3-3-70b-instruct-sharded") # Deploy sharded model with optimized configuration vllm_app = VLLMApp( name="vllm-app-sharded", requests=Resources( cpu="36", mem="300Gi", gpu="4", ephemeral_storage="300Gi", ), accelerator=L40S, model=LLMArtifact.query(), model_id="llama3", # Additional arguments to pass into the vLLM engine: # see https://docs.vllm.ai/en/stable/serving/engine_args.html # or run `vllm serve --help` to see all available arguments extra_args=[ "--tensor-parallel-size", "4", "--gpu-memory-utilization", "0.8", "--max-model-len", "4096", "--max-num-seqs", "256", "--enforce-eager", ], env={ "NCCL_DEBUG": "INFO", "CUDA_LAUNCH_BLOCKING": "1", "VLLM_SKIP_P2P_CHECK": "1", }, shared_memory=True, # Enable shared memory for multi-GPU scaledown_after=300, stream_model=True, port=8084, requires_auth=False, ) ``` Then deploy the app: ```bash union deploy apps vllm_app_sharded.py vllm-app-sharded-optimized ``` ### Deploy with SGLangApp You can also deploy the sharded model to a SGLang app: ```python import os from flytekit.extras.accelerators import GPUAccelerator from union import Artifact, Resources from union.app.llm import SGLangApp # Reference the sharded model artifact LLMArtifact = Artifact(name="llama-3-3-70b-instruct-sharded") # Deploy sharded model with SGLang sglang_app = SGLangApp( name="sglang-app-sharded", requests=Resources( cpu="36", mem="300Gi", gpu="4", ephemeral_storage="300Gi", ), accelerator=GPUAccelerator("nvidia-l40s"), model=LLMArtifact.query(), model_id="llama3", # Additional arguments to pass into the SGLang engine: # See https://docs.sglang.ai/backend/server_arguments.html for details. extra_args=[ "--tensor-parallel-size", "4", "--mem-fraction-static", "0.8", ], env={ "NCCL_DEBUG": "INFO", "CUDA_LAUNCH_BLOCKING": "1", }, shared_memory=True, scaledown_after=300, stream_model=True, port=8084, requires_auth=False, ) ``` Then deploy the app: ```bash union deploy apps sglang_app_sharded.py sglang-app-sharded-optimized ``` ## Authentication via API Key To secure your `SGLangApp`s and `VLLMApp`s with API key authentication, you can specify a secret in the `extra_args` parameter. First, create a secret: ```bash union secrets create --name AUTH_SECRET ``` Add the secret value to the input field and save the secret. Then, add the secret to the `extra_args` parameter. For SGLang, do the following: ```python from union import Secret sglang_app = SGLangApp( name="sglang-app", ..., # Disable Union's platform-level authentication so you can access the # endpoint in the public internet requires_auth=False, secrets=[Secret(key="AUTH_SECRET", env_var="AUTH_SECRET")], extra_args=[ ..., "--api-key", "$AUTH_SECRET", # Use the secret in the extra_args ], ) ``` And similarly for vLLM, do the following: ```python from union import Secret vllm_app = VLLMApp( name="vllm-app", ..., # Disable Union's platform-level authentication so you can access the # endpoint in the public internet requires_auth=False, secrets=[Secret(key="AUTH_SECRET", env_var="AUTH_SECRET")], extra_args=[ ..., "--api-key", "$AUTH_SECRET", # Use the secret in the extra_args ], ) ``` ## Performance Tuning You can refer to the corresponding documentation for vLLM and SGLang for more information on how to tune the performance of your app. - **vLLM**: see the [optimization and tuning](https://docs.vllm.ai/en/latest/configuration/optimization.html) and [engine arguments](https://docs.vllm.ai/en/latest/configuration/engine_args.html) pages to learn about how to tune the performance of your app. You can also look at the [distributed inference and serving](https://docs.vllm.ai/en/latest/serving/distributed_serving.html) page to learn more about distributed inference. - **SGLang**: see the [environment variables](https://docs.sglang.ai/references/environment_variables.html#performance-tuning) and [server arguments](https://docs.sglang.ai/backend/server_arguments.html) pages to learn about all of the available serving options in SGLang. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/serving/deploying-your-connector === # Deploying Custom Flyte Connectors **Core concepts > App Serving > Deploying Custom Flyte Connectors > Flyte connectors** allow you to extend Union's capabilities by integrating with external services. This guide explains how to deploy custom connectors that can be used in your Flyte workflows. ## Overview Connectors enable your workflows to interact with third-party services or systems. Union.ai supports deploying connectors as services using the `FlyteConnectorApp` class. You can deploy connectors in two ways: 1. **Module-based deployment**: Include your connector code directly in the deployment 2. **ImageSpec-based deployment**: Use pre-built images with connectors already installed ## Prerequisites Before deploying a connector, ensure you have: - A Union.ai account - Any required API keys or credentials for your connector - Docker registry access (if using custom images) ## Connector Deployment Options ### Module-based Deployment Module-based deployment is ideal when you want to iterate quickly on connector development. With this approach, you include your connector code directly using the `include` parameter. ```python # app.py from union import ImageSpec, Resources, Secret from union.app import FlyteConnectorApp image = ImageSpec( name="flyteconnector", packages=[ "flytekit[connector]", "union", "union-runtime", "openai", # ChatGPT connector needs openai SDK ], env={"FLYTE_SDK_LOGGING_LEVEL": "10"}, builder="union", ) openai_connector_app = FlyteConnectorApp( name="openai-connector-app", container_image=image, secrets=[Secret(key="flyte_openai_api_key")], limits=Resources(cpu="1", mem="1Gi"), include=["./chatgpt"], # Include the connector module directory ) ``` With this approach, you organize your connector code in a module structure: ```bash chatgpt/ โ”œโ”€โ”€ __init__.py โ”œโ”€โ”€ connector.py โ””โ”€โ”€ constants.py ``` The `include` parameter takes a list of files or directories to include in the deployment. ### ImageSpec-based Deployment ImageSpec-based deployment is preferred for production environments where you have stable connector implementations. In this approach, your connector code is pre-installed in a container image. ```python # app.py from union import ImageSpec, Resources, Secret from union.app import FlyteConnectorApp image = ImageSpec( name="flyteconnector", packages=[ "flytekit[connector]", "flytekitplugins-slurm", "union", "union-runtime", ], apt_packages=["build-essential", "libmagic1", "vim", "openssh-client", "ca-certificates"], env={"FLYTE_SDK_LOGGING_LEVEL": "10"}, builder="union", ) slurm_connector_app = FlyteConnectorApp( name="slurm-connector-app", container_image=image, secrets=[Secret(key="flyte_slurm_private_key")], limits=Resources(cpu="1", mem="1Gi"), ) ``` ## Managing Secrets Most connectors require credentials to authenticate with external services. Union.ai allows you to manage these securely: ```bash # Create a secret for OpenAI API key union create secret flyte_openai_api_key -f /etc/secrets/flyte_openai_api_key --project flytesnacks --domain development # Create a secret for SLURM access union create secret flyte_slurm_private_key -f /etc/secrets/flyte_slurm_private_key --project flytesnacks --domain development ``` Reference these secrets in your connector app: ```python from union import Secret # In your app definition secrets=[Secret(key="flyte_openai_api_key")] ``` Inside your connector code, access these secrets using: ```python from flytekit.extend.backend.utils import get_connector_secret api_key = get_connector_secret(secret_key="FLYTE_OPENAI_API_KEY") ``` ## Example: Creating a ChatGPT Connector Here's how to implement a ChatGPT connector: 1. Create a connector class: ```python # chatgpt/connector.py import asyncio import logging from typing import Optional import openai from flyteidl.core.execution_pb2 import TaskExecution from flytekit import FlyteContextManager from flytekit.core.type_engine import TypeEngine from flytekit.extend.backend.base_connector import ConnectorRegistry, Resource, SyncConnectorBase from flytekit.extend.backend.utils import get_connector_secret from flytekit.models.literals import LiteralMap from flytekit.models.task import TaskTemplate from .constants import OPENAI_API_KEY, TIMEOUT_SECONDS class ChatGPTConnector(SyncConnectorBase): name = "ChatGPT Connector" def __init__(self): super().__init__(task_type_name="chatgpt") async def do( self, task_template: TaskTemplate, inputs: Optional[LiteralMap] = None, **kwargs, ) -> Resource: ctx = FlyteContextManager.current_context() input_python_value = TypeEngine.literal_map_to_kwargs(ctx, inputs, {"message": str}) message = input_python_value["message"] custom = task_template.custom custom["chatgpt_config"]["messages"] = [{"role": "user", "content": message}] client = openai.AsyncOpenAI( organization=custom["openai_organization"], api_key=get_connector_secret(secret_key=OPENAI_API_KEY), ) logger = logging.getLogger("httpx") logger.setLevel(logging.WARNING) completion = await asyncio.wait_for(client.chat.completions.create(**custom["chatgpt_config"]), TIMEOUT_SECONDS) message = completion.choices[0].message.content outputs = {"o0": message} return Resource(phase=TaskExecution.SUCCEEDED, outputs=outputs) ConnectorRegistry.register(ChatGPTConnector()) ``` 2. Define constants: ```python # chatgpt/constants.py # Constants for ChatGPT connector TIMEOUT_SECONDS = 10 OPENAI_API_KEY = "FLYTE_OPENAI_API_KEY" ``` 3. Create an `__init__.py` file: ```python # chatgpt/__init__.py from .connector import ChatGPTConnector __all__ = ["ChatGPTConnector"] ``` ## Using the Connector in a Workflow After deploying your connector, you can use it in your workflows: ```python # workflow.py from flytekit import workflow from flytekitplugins.openai import ChatGPTTask chatgpt_small_job = ChatGPTTask( name="3.5-turbo", chatgpt_config={ "model": "gpt-3.5-turbo", "temperature": 0.7, }, ) chatgpt_big_job = ChatGPTTask( name="gpt-4", chatgpt_config={ "model": "gpt-4", "temperature": 0.7, }, ) @workflow def wf(message: str) -> str: message = chatgpt_small_job(message=message) message = chatgpt_big_job(message=message) return message ``` Run the workflow: ```bash union run --remote workflow.py wf --message "Tell me about Union.ai" ``` ## Creating Your Own Connector To create a custom connector: 1. Inherit from `SyncConnectorBase` or `AsyncConnectorBase` 2. Implement the required methods (`do` for synchronous connectors, `create`, `get`, and `delete` for asynchronous connectors) 3. Register your connector with `ConnectorRegistry.register(YourConnector())` 4. Deploy your connector using one of the methods above ## Deployment Commands Deploy your connector app: ```bash # Module-based deployment union deploy apps app_module_deployment/app.py openai-connector-app # ImageSpec-based deployment union deploy apps app_image_spec_deployment/app.py slurm-connector-app ``` ## Best Practices 1. **Security**: Never hardcode credentials; always use Union.ai secrets 2. **Error Handling**: Include robust error handling in your connector implementation 3. **Timeouts**: Set appropriate timeouts for external API calls 4. **Logging**: Implement detailed logging for debugging 5. **Testing**: Test your connector thoroughly before deploying to production By following this guide, you can create and deploy custom connectors that extend Union.ai's capabilities to integrate with any external service or system your workflows need to interact with. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/caching === # Caching Union.ai allows you to cache the output of nodes (**Core concepts > Tasks**, **Core concepts > Workflows > Subworkflows and sub-launch plans**) to make subsequent executions faster. Caching is useful when many executions of identical code with the same input may occur. Here's a video with a brief explanation and demo, focused on task caching: ๐Ÿ“บ [Watch on YouTube](https://www.youtube.com/watch?v=WNkThCp-gqo) > [!NOTE] > * Caching is available and individiually enablable for all nodes *within* a workflow directed acyclic graph (DAG). > * Nodes in this sense include tasks, subworkflows (workflows called directly within another workflow), and sub-launch plans (launch plans called within a workflow). > * Caching is *not available* for top-level workflows or launch plans (that is, those invoked from UI or CLI). > * By default, caching is *disabled* on all tasks, subworkflows and sub-launch plans, to avoid unintended consequences when caching executions with side effects. It must be explcitly enabled on any node where caching is desired. ## Enabling and configuring caching Caching can be enabled by setting the `cache` parameter of the `@union.task` (for tasks) decorator or `with_overrides` method (for subworkflows or sub-launch plans) to a `Cache` object. The parameters of the `Cache` object are used to configure the caching behavior. For example: ```python import union # Define a task and enable caching for it @union.task(cache=union.Cache(version="1.0", serialize=True, ignored_inputs=["a"])) def sum(a: int, b: int, c: int) -> int: return a + b + c # Define a workflow to be used as a subworkflow @union.workflow def child*wf(a: int, b: int, c: int) -> list[int]: return [ sum(a=a, b=b, c=c) for _ in range(5) ] # Define a launch plan to be used as a sub-launch plan child_lp = union.LaunchPlan.get_or_create(child_wf) # Define a parent workflow that uses the subworkflow @union.workflow def parent_wf_with_subwf(input: int = 0): return [ # Enable caching on the subworkflow child_wf(a=input, b=3, c=4).with_overrides(cache=union.Cache(version="1.0", serialize=True, ignored_inputs=["a"])) for i in [1, 2, 3] ] # Define a parent workflow that uses the sub-launch plan @union.workflow def parent_wf_with_sublp(input: int = 0): return [ child_lp(a=input, b=1, c=2).with_overrides(cache=union.Cache(version="1.0", serialize=True, ignored_inputs=["a"])) for i in [1, 2, 3] ] ``` In the above example, caching is enabled at multiple levels: * At the task level, in the `@union.task` decorator of the task `sum`. * At the workflow level, in the `with_overrides` method of the invocation of the workflow `child_wf`. * At the launch plan level, in the `with_overrides` method of the invocation of the launch plan `child_lp`. In each case, the result of the execution is cached and reused in subsequent executions. Here the reuse is demonstrated by calling the `child_wf` and `child_lp` workflows multiple times with the same inputs. Additionally, if the same node is invoked again with the same inputs (excluding input "a", as it is ignored for purposes of versioning) the cached result is returned immediately instead of re-executing the process. This applies even if the cached node is invoked externally through the UI or CLI. ## The `Cache` object The [Cache]() object takes the following parameters: * `serialize` (`bool`): Enables or disables **Core concepts > Caching > Cache serialization**. When enabled, Union.ai ensures that a single instance of the node is run before any other instances that would otherwise run concurrently. This allows the initial instance to cache its result and lets the later instances reuse the resulting cached outputs. If not set, cache serialization is disabled. * `ignored_inputs` (`Union[Tuple[str, ...], str]`): Input variables that should not be included when calculating the hash for the cache. If not set, no inputs are ignored. * `salt` (`str`): A [salt]() used in the hash generation. A salt is a random value that is combined with the input values before hashing. ## The `overwrite-cache` flag When launching the execution of a workflow, launch plan or task, you can use the `overwrite-cache` flag to invalidate the cache and force re-execution. ### Overwrite cache on the command line The `overwrite-cache` flag can be used from the command line with the `union run` command. For example: ```shell $ union run --remote --overwrite-cache example.py wf ``` ### Overwrite cache in the UI You can also trigger cache invalidation when launching an execution from the UI by checking **Override**, in the launch dialog: ![Overwrite cache flag in the UI](../../_static/images/user-guide/core-concepts/caching/overwrite-cached-outputs.png) ### Overwrite cache programmatically When using `UnionRemote`, you can use the `overwrite_cache` parameter in the [`UnionRemote.execute`]() method: ```python from flytekit.configuration import Config from union.remote import UnionRemote remote = UnionRemote( config=Config.auto(), default_project="flytesnacks", default_domain="development" ) wf = remote.fetch_workflow(name="workflows.example.wf") execution = remote.execute(wf, inputs={"name": "Kermit"}, overwrite_cache=True) ``` ## How caching works When a node (with caching enabled) completes on Union.ai, a **key-value entry** is created in the **caching table**. The **value** of the entry is the output. The **key** is composed of: * **Project:** A task run under one project cannot use the cached task execution from another project which would cause inadvertent results between project teams that could result in data corruption. * **Domain:** To separate test, staging, and production data, task executions are not shared across these environments. * **Node signature:** The cache is specific to the signature associated with the execution. The signature comprises the name, input parameter names/types, and the output parameter name/type of the node. If the signature changes, the cache entry is invalidated. * **Input values:** A well-formed Union.ai node always produces deterministic outputs. This means that, given a set of input values, every execution should have identical outputs. When an execution is cached, the input values are part of the cache key. If a node is run with a new set of inputs, a new cache entry is created for the combination of that particular entity with those particular inputs. The result is that within a given project and domain, a cache entry is created for each distinct combination of name, signature, cache version, and input set for every node that has caching enabled. If the same node with the same input values is encountered again, the cached output is used instead of running the process again. ### Explicit cache version When a change to code is made that should invalidate the cache for that node, you can explicitly indicate this by incrementing the `version` parameter value. For a task example, see below. (For workflows and launch plans, the parameter would be specified in the `with_overrides` method.) ```python @union.task(cache=union.Cache(version="1.1")) def t(n: int) -> int: return n \* n + 1 ``` Here the `version` parameter has been bumped from `1.0`to `1.1`, invalidating of the existing cache. The next time the task is called it will be executed and the result re-cached under an updated key. However, if you change the version back to `1.0`, you will get a "cache hit" again and skip the execution of the task code. If used, the `version` parameter must be explicitly changed in order to invalidate the cache. Not every Git revision of a node will necessarily invalidate the cache. A change in Git SHA does not necessarily correlate to a change in functionality. You can refine your code without invalidating the cache as long as you explicitly use, and don't change, the `version` parameter (or the signature, see below) of the node. The idea behind this is to decouple syntactic sugar (for example, changed documentation or renamed variables) from changes to logic that can affect the process's result. When you use Git (or any version control system), you have a new version per code change. Since the behavior of most nodes in a Git repository will remain unchanged, you don't want their cached outputs to be lost. When a node's behavior does change though, you can bump `version` to invalidate the cache entry and make the system recompute the outputs. ### Node signature If you modify the signature of a node by adding, removing, or editing input parameters or output return types, Union.ai invalidates the cache entries for that node. During the next execution, Union.ai executes the process again and caches the outputs as new values stored under an updated key. ### Caching when running locally The description above applies to caching when executing a node remotely on your Union.ai cluster. Caching is also available **Development cycle > Running in a local cluster**. When running locally the caching mechanism is the same except that the cache key does not include **project** or **domain** (since there are none). The cache key is composed only of **cache version**, **signature**, and **inputs**. The results of local executions are stored under `~/.flyte/local-cache/`. Similar to the remote case, a local cache entry for a node will be invalidated if either the `cache_version` or the signature is modified. In addition, the local cache can also be emptied by running ```shell $ union local-cache clear ``` This removes the contents of the `~/.flyte/local-cache/` directory. Occasionally, you may want to disable the local cache for testing purposes, without making any code changes to your task decorators. You can set the `FLYTE_LOCAL_CACHE_ENABLED` environment variable to `false` in your terminal in order to bypass caching temporarily. ## Cache serialization Cache serialization means only executing a single instance of a unique cacheable task (determined by the `cache_version` parameter and task signature) at a time. Using this mechanism, Union.ai ensures that during multiple concurrent executions of a task only a single instance is evaluated, and all others wait until completion and reuse the resulting cached outputs. Ensuring serialized evaluation requires a small degree of overhead to coordinate executions using a lightweight artifact reservation system. Therefore, this should be viewed as an extension to rather than a replacement for non-serialized cacheable tasks. It is particularly well fit for long-running or otherwise computationally expensive tasks executed in scenarios similar to the following examples: * Periodically scheduled workflow where a single task evaluation duration may span multiple scheduled executions. * Running a commonly shared task within different workflows (which receive the same inputs). ### Enabling cache serialization Task cache serializing is disabled by default to avoid unexpected behavior for task executions. To enable, set `serialize=True` in the `@union.task` decorator. The cache key definitions follow the same rules as non-serialized cache tasks. ```python @union.task(cache=union.Cache(version="1.1", serialize=True)) def t(n: int) -> int: return n \* n ``` In the above example calling `t(n=2)` multiple times concurrently (even in different executions or workflows) will only execute the multiplication operation once. Concurrently evaluated tasks will wait for completion of the first instance before reusing the cached results and subsequent evaluations will instantly reuse existing cache results. ### How does cache serialization work? The cache serialization paradigm introduces a new artifact reservation system. Executions with cache serialization enabled use this reservation system to acquire an artifact reservation, indicating that they are actively evaluating a node, and release the reservation once the execution is completed. Union.ai uses a clock-skew algorithm to define reservation timeouts. Therefore, executions are required to periodically extend the reservation during their run. The first execution of a serializable node will successfully acquire the artifact reservation. Execution will be performed as usual and upon completion, the results are written to the cache, and the reservation is released. Concurrently executed node instances (those that would otherwise run in parallel with the initial execution) will observe an active reservation, in which case these instances will wait until the next reevaluation and perform another check. Once the initial execution completes, they will reuse the cached results as will any subsequent instances of the same node. Union.ai handles execution failures using a timeout on the reservation. If the execution currently holding the reservation fails to extend it before it times out, another execution may acquire the reservation and begin processing. ## Caching of offloaded objects In some cases, the default behavior displayed by Union.aiโ€™s caching feature might not match the user's intuition. For example, this code makes use of pandas dataframes: ```python @union.task def foo(a: int, b: str) -> pandas.DataFrame: df = pandas.DataFrame(...) ... return df @union.task(cache=True) def bar(df: pandas.DataFrame) -> int: ... @union.workflow def wf(a: int, b: str): df = foo(a=a, b=b) v = bar(df=df) ``` If run twice with the same inputs, one would expect that `bar` would trigger a cache hit, but thatโ€™s not the case because of the way dataframes are represented in Union.ai. However, Union.ai provides a new way to control the caching behavior of literals. This is done via a `typing.Annotated` call on the node signature. For example, in order to cache the result of calls to `bar`, you can rewrite the code above like this: ```python def hash_pandas_dataframe(df: pandas.DataFrame) -> str: return str(pandas.util.hash_pandas_object(df)) @union.task def foo_1(a: int, b: str) -> Annotated[pandas.DataFrame, HashMethod(hash_pandas_dataframe)]: df = pandas.DataFrame(...) ... return df @union.task(cache=True) def bar_1(df: pandas.DataFrame) -> int: ... @union.workflow def wf_1(a: int, b: str): df = foo(a=a, b=b) v = bar(df=df) ``` Note how the output of the task `foo` is annotated with an object of type `HashMethod`. Essentially, it represents a function that produces a hash that is used as part of the cache key calculation in calling the task `bar`. ### How does caching of offloaded objects work? Recall how input values are taken into account to derive a cache key. This is done by turning the literal representation into a string and using that string as part of the cache key. In the case of dataframes annotated with `HashMethod`, we use the hash as the representation of the literal. In other words, the literal hash is used in the cache key. This feature also works in local execution. Hereโ€™s a complete example of the feature: ```python def hash_pandas_dataframe(df: pandas.DataFrame) -> str: return str(pandas.util.hash_pandas_object(df)) @union.task def uncached_data_reading_task() -> Annotated[pandas.DataFrame, HashMethod(hash_pandas_dataframe)]: return pandas.DataFrame({"column_1": [1, 2, 3]}) @union.task(cache=True) def cached_data_processing_task(df: pandas.DataFrame) -> pandas.DataFrame: time.sleep(1) return df \* 2 @union.task def compare_dataframes(df1: pandas.DataFrame, df2: pandas.DataFrame): assert df1.equals(df2) @union.workflow def cached_dataframe_wf(): raw_data = uncached_data_reading_task() # Execute `cached_data_processing_task` twice, but force those # two executions to happen serially to demonstrate how the second run # hits the cache. t1_node = create_node(cached_data_processing_task, df=raw_data) t2_node = create_node(cached_data_processing_task, df=raw_data) t1_node >> t2_node # Confirm that the dataframes actually match compare_dataframes(df1=t1_node.o0, df2=t2_node.o0) if **name** == "**main**": df1 = cached_dataframe_wf() stickioesprint(f"Running cached_dataframe_wf once : {df1}") ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/workspaces === # Workspaces Workspaces provide a convenient VSCode development environment for iterating on your Union.ai tasks, workflows, and apps. With workspaces, you can: * Develop and debug your tasks, workflows, or code in general * Run your tasks and workflows in a way that matches your production environment * Deploy your workflows and apps to development, staging, or production environments * Persist files across workspace restarts to save your work * Specify secrets and resources for your workspace * Specify custom container images * Specify custom `on_startup` commands * Adjust the idle time-to-live (TTL) for your workspace to avoid unneeded expenses * Authenticate with GitHub to clone private repositories ## Creating a workspace To create a workspace, click on the **Workspace** tab on left navbar and click on the **New Workspace** button on the top right. ![Create Workspace](../../_static/images/user-guide/core-concepts/workspaces/create-new-workspace-1.png) Provide a name for your workspace, set an **Idle TTL** (time to live), and click **Create**. ![Create Workspace](../../_static/images/user-guide/core-concepts/workspaces/create-new-workspace-2.png) > [!NOTE] > The Idle TTL is the amount of time a workspace will be idle before it is > automatically stopped. Workspaces have a global TTL of 1 day, but you can set > the idle TTL field to a shorter duration to stop the workspace sooner. You should see a new workspace created in the Workspaces view: ![Create Workspace](../../_static/images/user-guide/core-concepts/workspaces/create-new-workspace-3.png) ## Running a workspace To run a workspace, click on the switch on the workspace item: ![Run Workspace](../../_static/images/user-guide/core-concepts/workspaces/run-workspace-1.png) Once the workspace has started, you can click on the **Open in VSCode** button: ![Run Workspace](../../_static/images/user-guide/core-concepts/workspaces/run-workspace-2.png) Once the startup commands have completed, you'll see a browser-based VSCode IDE: ![Run Workspace](../../_static/images/user-guide/core-concepts/workspaces/run-workspace-3.png) To stop a workspace, click on the toggle switch on the workspace item. ## Filesystem persistence Any changes to the filesystem that you make in the working directory of your workspace (the directory you find yourself in when you first open the workspace) are persisted across workspace restarts. This allows you to save data, code, models, and other files in your workspace. > [!NOTE] > Storing large datasets, models, and other files in your workspace may slow down > the start and stop times of your workspace. This is because the workspace > instance needs time to download/upload the files from persistent storage. ## Editing a workspace Change the workspace configuration by clicking on the **Edit** button: ![Edit Workspace](../../_static/images/user-guide/core-concepts/workspaces/edit-workspace-1.png) Note that you can change everything except the workspace name. ![Edit Workspace](../../_static/images/user-guide/core-concepts/workspaces/edit-workspace-2.png) ## The workspace detail view Clicking on the workspace item on the list view will reveal the workspace detail view, which provides all the information about the workspace. ![Workspace Detail](../../_static/images/user-guide/core-concepts/workspaces/workspace-detail.png) ## Archiving a workspace Archive a workspace by clicking on the **Archive** button: ![Archive Workspace](../../_static/images/user-guide/core-concepts/workspaces/archive-workspace.png) Show archived workspaces by clicking on the **Show archived** toggle on the top right of the workspaces list view. Unarchive a workspace by clicking on the **Unarchive** button: ![Unarchive Workspace](../../_static/images/user-guide/core-concepts/workspaces/unarchive-workspace.png) ## Workspace CLI commands The `union` CLI also provides commands for managing workspaces. ### Create a workspace configuration The first step is to create a yaml file that describes the workspace. ```shell $ union create workspace-config --init base_image workspace.yaml ``` This will create a `workspace.yaml` file in the current directory, with the default configuration values that you can edit for your needs: ```yaml name: my-workspace description: my workspace description project: domain: container_image: public.ecr.aws/unionai/workspace-base:py3.11-latest resources: cpu: "2" mem: "4Gi" gpu: null accelerator: null on_startup: null ttl_seconds: 1200 ``` Note that the yaml file contains a `project` and `domain` field that you can set to create a workspace in a specific project and domain. ### Create a workspace Then, create a workspace using the `union create workspace` command: ```shell $ union create workspace workspace.yaml ``` This command will also start your workspace, and will print out the workspace link that you click on to open the workspace in your browser: ```shell Created: workspace_definition { ... } Starting workspace 'my-workspace' ๐Ÿš€ Workspace started: Open VSCode in Browser ``` ### Stop a workspace When you want to stop a workspace, use the `union stop workspace` command: ```shell $ union stop workspace --name my-workspace ``` This will print out a message indicating that the workspace has been stopped: ```shell Workspace instance stopped: org: "org" ... ``` ### Update a workspace To update a workspace, modify the `workspace.yaml` file and run the `union update workspace` command: ```shell $ union update workspace workspace.yaml ``` This will print out a message that looks something like: ```shell Updated: workspace_definition { ... } ``` ### Get existing workspaces To get existing workspaces, use the `union get workspace` command: ```shell $ union get workspace ``` This will print out a table of all the workspaces you have access to in the specified project and domain (the command uses the default project and domain if you don't provide them). ```shell โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“ โ”ƒ Workspace name โ”ƒ CPU โ”ƒ Memory โ”ƒ GPU โ”ƒ Accelerator โ”ƒ TTL Seconds โ”ƒ Active URL โ”ƒ โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ โ”‚ my-workspace โ”‚ 2 โ”‚ 4Gi โ”‚ - โ”‚ - โ”‚ 1200 โ”‚ - โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` To get the details of a specific workspace, provide the workspace name with the `--name` flag. ### Start a workspace To start a workspace, use the `union start workspace` command, specifying the name of the workspace you want to start in the `--name` flag. ```shell $ union start workspace --name my-workspace ``` You should see a message that looks like: ```shell Starting workspace 'my-workspace' ๐Ÿš€ Workspace started: Open VSCode in Browser ``` ## Customizing a workspace There are several settings that you can customize for a workspace in the UI or the CLI. ### Setting secrets If you don't have any secrets yet, create them with the `union create secret` command: ```shell $ union create secret --project my_project --domain my_domain --name my_secret ``` You'll be prompted to enter a secret value in the terminal: ```shell Enter secret value: ... ``` > [!NOTE] > You can learn more about secrets management **Development cycle > Managing secrets**. Set secrets for your workspace by clicking on the **Secrets** tab in the sidebar. Provide the `my_secret` key and optionally, the environment variable you want to assign it to in the workspace. ![Secrets](../../_static/images/user-guide/core-concepts/workspaces/setting-secrets.png) #### Setting secrets via the CLI Set secrets via the CLI using the `secrets` key, which is a list of objects with a `key` and `env_var` (optional) field: ```yaml name: my-workspace description: my workspace description project: flytesnacks domain: development container_image: public.ecr.aws/unionai/workspace-base:py3.11-latest secrets: - key: my_secret # this is the secret key you set when you create the secret env_var: MY_SECRET # this is an optional environment variable that you # can bind the secret value onto. ... ``` ### Setting CPU, memory, and GPU resources You can also set the resources for your workspace: ![Resources](../../_static/images/user-guide/core-concepts/workspaces/setting-resources.png) ### Specifying custom `on_startup` commands If you need to run any commands like install additional dependencies or `wget` a file from the web, specify custom `on_startup` commands: ![On Startup](../../_static/images/user-guide/core-concepts/workspaces/customize-onstartup.png) ### Specifying custom container images By default, the workspace will use a Union.ai-provided container image which contains the following Python libraries: - `union` - `flytekit` - `uv` - `ipykernel` - `pandas` - `pyarrow` - `scikit-learn` - `matplotlib` #### Specifying a custom container image in the UI You can specify a pre-built custom container image by clicking on the **Container** tab in the sidebar and provide the image name in the workspace creation form. > [!NOTE] > The minimum requirement for custom images is that it has `union>=0.1.166` > installed in it. ![Custom Container](../../_static/images/user-guide/core-concepts/workspaces/customize-container-image.png) In many cases, you may want to use the same container image as a task execution that you want to debug. You can find the container image URI by going to the task execution details page: ![Task Execution](../../_static/images/user-guide/core-concepts/workspaces/customize-container-image-get-uri.png) #### Specifying a custom container image in the CLI The `union` CLI provides a way to specify a custom container image that's built by Union's image builder service. To do this, run the following command: ```shell union create workspace-config --init custom_image workspace.yaml ``` This will create a `workspace.yaml` file with a `container_image` image key that supports the **Development cycle > ImageSpec** arguments. When you run the `union create workspace` command with this `workspace.yaml` file, it will first build the image before creating the workspace definition. #### Example: Specifying a workspace with GPUs The following example shows a `workspace.yaml` file that specifies a workspace with a GPU accelerator. ```yaml # workspace.yaml name: workspace-with-gpu description: Workspace that uses GPUs # Make sure that the project and domain exists project: domain: container_image: name: custom-image builder: union packages: - torch resources: cpu: "2" mem: "4Gi" gpu: "1" accelerator: nvidia-l4 on_startup: null ttl_seconds: 1200 ``` Then run the following command to create the workspace: ```shell union create workspace workspace.yaml ``` The configuration above will first build a custom container with `torch` installed. Then, it will create a workspace definition with a single `nvidia-l4` GPU accelerator. Finally, it will start a workspace session. In the VSCode browser IDE, you can quickly verify that `torch` has access to GPUs by running the following in a Python REPL: ```python import torch print(torch.cuda.is_available()) ``` > [!NOTE] > See the **Core concepts > Workspaces > Customizing a workspace > Setting CPU, memory, and GPU resources** > section for more details on how to configure specific GPU accelerators. ## Authenticating with GitHub If you want to clone a private GitHub repository into your workspace, you can using the pre-installed `gh` CLI to authenticate your workspace session: ```shell gh auth login ``` You'll be prompted to enter either a GitHub personal access token (PAT) or authenticate via the browser. > [!NOTE] > You can create and set a `GITHUB_TOKEN` secret to set the access token for your > workspace, but you'll need to authenticate via `gh auth login` in every new > workspace session: * Create a secret with the `union create secret` command * Create a workspace or update an existing one with the `GITHUB_TOKEN` secret, setting the environment variable to e.g. `GITHUB_TOKEN` * In the workspace session, run `gh auth login` to authenticate with GitHub and use the `$GITHUB_TOKEN` environment variable as the personal access token. ## Sorting and filtering workspaces You can filter workspaces to only the active ones by clicking on the **Active** toggle on the top left of the workspaces list view. ![Active Workspaces](../../_static/images/user-guide/core-concepts/workspaces/active-workspaces.png) Sort by recently updated by clicking on the **Recently updated** toggle on the top right of the workspaces list view, and you can also sort by recently updated by clicking on the **Recently updated** toggle on the top right of the workspaces list view. ![Filtering and Sorting Workspaces](../../_static/images/user-guide/core-concepts/workspaces/filtering-sorting-workspaces.png) ## Troubleshooting You may come across issues starting up a workspace due to various reasons, including: * Resource requests not being available on your Union cluster. * Secrets key typpos of not being defined on the project/domain. * Container image typos or container images not existing. Under the hood, workspaces are powered by Union.ai tasks, so to debug these kinds of issues, the workspace detail page provides a link to the underlying task that's hosting the VSCode IDE: ![Workspace Detail](../../_static/images/user-guide/core-concepts/workspaces/failed-workspace-detail.png) Clicking on the link will open the task details page, where you can see the underlying task definition, pod events, and logs to debug further. ![Task Detail](../../_static/images/user-guide/core-concepts/workspaces/failed-task-detail.png) === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/named-outputs === # Named outputs By default, Union.ai employs a standardized convention to assign names to the outputs of tasks or workflows. Each output is sequentially labeled as `o1`, `o2`, `o3`, and so on. You can, however, customize these output names by using a `NamedTuple`. To begin, import the required dependencies: ```python # basics/named_outputs.py from typing import NamedTuple import union ``` Here we define a `NamedTuple` and assign it as an output to a task called `slope`: ```python slope_value = NamedTuple("slope_value", [("slope", float)]) @union.task def slope(x: list[int], y: list[int]) -> slope_value: sum_xy = sum([x[i] * y[i] for i in range(len(x))]) sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) n = len(x) return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) ``` Similarly, we define another `NamedTuple` and assign it to the output of another task, `intercept`: ```python intercept_value = NamedTuple("intercept_value", [("intercept", float)]) @union.task def intercept(x: list[int], y: list[int], slope: float) -> intercept_value: mean_x = sum(x) / len(x) mean_y = sum(y) / len(y) intercept = mean_y - slope * mean_x return intercept ``` > [!Note] > While itโ€™s possible to create `NamedTuples` directly within the code, > itโ€™s often better to declare them explicitly. > This helps prevent potential linting errors in tools like `mypy`. > > ```python > def slope() -> NamedTuple("slope_value", slope=float): > pass > ``` You can easily unpack the `NamedTuple` outputs directly within a workflow. Additionally, you can also have the workflow return a `NamedTuple` as an output. >[!Note] > Remember that we are extracting individual task execution outputs by dereferencing them. > This is necessary because `NamedTuples` function as tuples and require dereferencing. ```python slope_and_intercept_values = NamedTuple("slope_and_intercept_values", [("slope", float), ("intercept", float)]) @union.workflow def simple_wf_with_named_outputs(x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> slope_and_intercept_values: slope_value = slope(x=x, y=y) intercept_value = intercept(x=x, y=y, slope=slope_value.slope) return slope_and_intercept_values(slope=slope_value.slope, intercept=intercept_value.intercept) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/core-concepts/image-spec === # ImageSpec In this section, you will uncover how Union.ai utilizes Docker images to construct containers under the hood, and you'll learn how to craft your own images to encompass all the necessary dependencies for your tasks or workflows. You will explore how to execute a raw container with custom commands, indicate multiple container images within a single workflow, and get familiar with the ins and outs of `ImageSpec`! `ImageSpec` allows you to customize the container image for your Union.ai tasks without a Dockerfile. `ImageSpec` speeds up the build process by allowing you to reuse previously downloaded packages from the PyPI and APT caches. By default, the `ImageSpec` will be built using the **Core concepts > ImageSpec > remote builder**, but you can always specify your own e.g. local Docker. For every `union.PythonFunctionTask` task or a task decorated with the `@task` decorator, you can specify rules for binding container images. By default, union binds a single container image, i.e., the [default Docker image](https://ghcr.io/flyteorg/flytekit), to all tasks. To modify this behavior, use the `container_image` parameter available in the `union.task` decorator, and pass an `ImageSpec` definition. Before building the image, union checks the container registry to see if the image already exists. If the image does not exist, union will build the image before registering the workflow and replace the image name in the task template with the newly built image name. ## Install Python or APT packages You can specify Python packages and APT packages in the `ImageSpec`. These specified packages will be added on top of the [default image](https://github.com/flyteorg/flytekit/blob/master/Dockerfile), which can be found in the union Dockerfile. More specifically, union invokes [DefaultImages.default_image()](https://github.com/flyteorg/flytekit/blob/master/flytekit/configuration/default_images.py#L26-L27) function. This function determines and returns the default image based on the Python version and union version. For example, if you are using Python 3.8 and flytekit 1.6.0, the default image assigned will be `ghcr.io/flyteorg/flytekit:py3.8-1.6.0`. ```python from union import ImageSpec sklearn_image_spec = ImageSpec( packages=["scikit-learn", "tensorflow==2.5.0"], apt_packages=["curl", "wget"], ) ``` ## Install Conda packages Define the `ImageSpec` to install packages from a specific conda channel. ```python image_spec = ImageSpec( conda_packages=["langchain"], conda_channels=["conda-forge"], # List of channels to pull packages from. ) ``` ## Use different Python versions in the image You can specify the Python version in the `ImageSpec` to build the image with a different Python version. ```python image_spec = ImageSpec( packages=["pandas"], python_version="3.9", ) ``` ## Import modules only in a specify imageSpec environment The `is_container()` method is used to determine whether the task is utilizing the image constructed from the `ImageSpec`. If the task is indeed using the image built from the `ImageSpec`, it will return true. This approach helps minimize module loading time and prevents unnecessary dependency installation within a single image. In the following example, both `task1` and `task2` will import the `pandas` module. However, `Tensorflow` will only be imported in `task2`. ```python from flytekit import ImageSpec, task import pandas as pd pandas_image_spec = ImageSpec( packages=["pandas"], registry="ghcr.io/flyteorg", ) tensorflow_image_spec = ImageSpec( packages=["tensorflow", "pandas"], registry="ghcr.io/flyteorg", ) # Return if and only if the task is using the image built from tensorflow_image_spec. if tensorflow_image_spec.is_container(): import tensorflow as tf @task(container_image=pandas_image_spec) def task1() -> pd.DataFrame: return pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [1, 22]}) @task(container_image=tensorflow_image_spec) def task2() -> int: num_gpus = len(tf.config.list_physical_devices('GPU')) print("Num GPUs Available: ", num_gpus) return num_gpus ``` ## Install CUDA in the image There are few ways to install CUDA in the image. ### Use Nvidia docker image CUDA is pre-installed in the Nvidia docker image. You can specify the base image in the `ImageSpec`. ```python image_spec = ImageSpec( base_image="nvidia/cuda:12.6.1-cudnn-devel-ubuntu22.04", packages=["tensorflow", "pandas"], python_version="3.9", ) ``` ### Install packages from extra index CUDA can be installed by specifying the `pip_extra_index_url` in the `ImageSpec`. ```python image_spec = ImageSpec( name="pytorch-mnist", packages=["torch", "torchvision", "flytekitplugins-kfpytorch"], pip_extra_index_url=["https://download.pytorch.org/whl/cu118"], ) ``` ## Build an image in different architecture You can specify the platform in the `ImageSpec` to build the image in a different architecture, such as `linux/arm64` or `darwin/arm64`. ```python image_spec = ImageSpec( packages=["pandas"], platform="linux/arm64", ) ``` ## Customize the tag of the image You can customize the tag of the image by specifying the `tag_format` in the `ImageSpec`. In the following example, the tag will be `-dev`. ```python image_spec = ImageSpec( name="my-image", packages=["pandas"], tag_format="{spec_hash}-dev", ) ``` ## Copy additional files or directories You can specify files or directories to be copied into the container `/root`, allowing users to access the required files. The directory structure will match the relative path. Since Docker only supports relative paths, absolute paths and paths outside the current working directory (e.g., paths with "../") are not allowed. ```python from union import task, workflow, ImageSpec image_spec = ImageSpec( name="image_with_copy", copy=["files/input.txt"], ) @task(container_image=image_spec) def my_task() -> str: with open("/root/files/input.txt", "r") as f: return f.read() ``` ## Define ImageSpec in a YAML File You can override the container image by providing an ImageSpec YAML file to the `union run` or `union register` command. This allows for greater flexibility in specifying a custom container image. For example: ```yaml # imageSpec.yaml python_version: 3.11 packages: - sklearn env: Debug: "True" ``` Use union to register the workflow: ```shell $ union run --remote --image image.yaml image_spec.py wf ``` ## Build the image without registering the workflow If you only want to build the image without registering the workflow, you can use the `union build` command. ```shell $ union build --remote image_spec.py wf ``` ## Force push an image In some cases, you may want to force an image to rebuild, even if the ImageSpec hasnโ€™t changed. To overwrite an existing image, pass the `FLYTE_FORCE_PUSH_IMAGE_SPEC=True` to the `union` command. ```bash FLYTE_FORCE_PUSH_IMAGE_SPEC=True union run --remote image_spec.py wf ``` You can also force push an image in the Python code by calling the `force_push()` method. ```python image = ImageSpec(packages=["pandas"]).force_push() ``` ## Getting source files into ImageSpec Typically, getting source code files into a task's image at run time on a live Union.ai backend is done through the fast registration mechanism. However, if your `ImageSpec` constructor specifies a `source_root` and the `copy` argument is set to something other than `CopyFileDetection.NO_COPY`, then files will be copied regardless of fast registration status. If the `source_root` and `copy` fields to an `ImageSpec` are left blank, then whether or not your source files are copied into the built `ImageSpec` image depends on whether or not you use fast registration. Please see **Development cycle > Running your code** for the full explanation. Since files are sometimes copied into the built image, the tag that is published for an ImageSpec will change based on whether fast register is enabled, and the contents of any files copied. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle === # Development cycle This section covers developing production-ready workflows for Union.ai. ## Subpages - **Development cycle > Authentication** - **Development cycle > Project structure** - **Development cycle > Projects and domains** - **Development cycle > Building workflows** - **Development cycle > Setting up a production project** - **Development cycle > Local dependencies** - **Development cycle > ImageSpec** - **Development cycle > Running your code** - **Development cycle > Overriding parameters** - **Development cycle > Run details** - **Development cycle > Debugging with interactive tasks** - **Development cycle > Managing secrets** - **Development cycle > Managing API keys** - **Development cycle > Accessing AWS S3 buckets** - **Development cycle > Task resource validation** - **Development cycle > Running in a local cluster** - **Development cycle > CI/CD deployment** - **Development cycle > Jupyter notebooks** - **Development cycle > Decks** - **Development cycle > UnionRemote** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/authentication === # Authentication Authentication is required to interact with Union.ai using the command-line interface (CLI). The authentication method depends on whether you are working on a local or remote machine. This guide walks you through different authentication mechanisms and helps you choose the best one for your use case. Before diving into authentication, ensure you have installed the Union CLI. See **Getting started > Local setup** for details. ## Authentication Methods Union CLI supports three authentication mechanisms: | Authentication Method | Works on Local? | Works on Remote? | Use Case | |-----------------------|-----------------|------------------|-----------------------------------------------------------------| | PKCE (default) | โœ… Yes | โŒ No | Best on local machines with a browser. | | DeviceFlow | โœ… Yes | โœ… Yes | Best on remote machines without a browser, like an ssh session. | | ClientSecret | โœ… Yes | โœ… Yes | Best for CI/CD or automation. | > [!NOTE] > If you used `union create login --host `, this used PKCE by default. ## 1. PKCE (Proof Key of Code Exchange) PKCE is the default authentication method. When you run a Union CLI command, it opens a browser window for authentication. Authentication Flow: - Run a Union CLI command. - You are redirected to your default browser and log in. Example Configuration: ```yaml admin: endpoint: https://.hosted.unionai.cloud insecure: false authType: Pkce logger: show-source: true level: 0 ``` > [!NOTE] > PKCE requires a local browser, making it unsuitable for using the Union CLI on remote machines within an ssh session. ## 2. DeviceFlow (Best for Remote Machines) If you are working with the Union CLI on a remote machine without a browser, use DeviceFlow. This method provides a URL that you can open in your local browser. Authentication Flow: - Run a Union CLI command. - The CLI returns a URL. - Open the URL in your local browser and log in. Example Configuration: ``` admin: endpoint: dns:///.hosted.unionai.cloud insecure: false authType: DeviceFlow logger: show-source: true level: 0 ``` > [!NOTE] > During authentication, Union.ai attempts to store an authentication token on the keyring service of the operating system. If you are authenticating from within an SSH session on a Linux based machine, there may not be a keyring service by default. If you find that browser based authentication is required every time you run or register your workflows, you may need to `run pip install keyring` or `pip install keyrings.alt` to install a keyring service on your machine. ## 3. ClientSecret (Best for CI/CD and Automation) The ClientSecret method is a headless authentication option, ideal for automation and CI/CD pipelines. Steps to Set Up ClientSecret Authentication: 1. Create an API Key: ``` $ union create api-key admin --name my-custom-name ``` The output provides a Client ID and API Key. Store the API Key securely, as it will not be shown again. 2. Set the Environment Variable: ``` export UNION_API_KEY="" ``` With this environment variable set, `union` commands do not require a configuration yaml to be referenced. 3. Give the API Key admin permissions with a **Uctl CLI** command: ``` uctl --config ~/path/to/a/pkce/config.yaml append identityassignment --application my-custom-name --policy admin --org ``` Let's note a couple of things here. First, the config file here must be **Development cycle > Authentication > 1. PKCE (Proof Key of Code Exchange)** which will require you to authenticate though your browser. If you don't know where your config file is, check `~/.union/config.yaml`. This is where the automatically generated config would have been saved if you followed the **Getting started** guide. Second, your org name can be found from your endpoint. For example, if your endpoint is `https://my-org.hosted.unionai.cloud`, then your org name is `my-org`. Now, with your `UNION_API_KEY` environment variable set, your `union` command will use the API key to authenticate automatically - no need to pass in a config file anymore! > [!NOTE] > Never commit API keys to version control. Use environment variables or a secure vault. ## Managing Authentication Configuration By default, the Union CLI looks for configuration files in `~/.union/config.yaml`. You can override this by: - Setting the `UNION_CONFIG` environment variable: ``` export UNION_CONFIG=~/.my-config-location/my-config.yaml ``` - Using the `--config` flag: ``` $ union --config ~/.my-config-location/my-config.yaml run my_script.py my_workflow ``` ## Troubleshooting Authentication Issues - Old configuration files causing conflicts? Remove the deprecated directory from `~/.unionai/`. - Need to switch authentication methods? Update `~/.union/config.yaml` or use a different config file. - Getting prompted for login every time? If using DeviceFlow on Linux, install a `keyring` service (`pip install keyring keyrings.alt`). === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/project-structure === # Project structure Organizing a workflow project repository effectively is key for ensuring scalability, collaboration, and easy maintenance. Here are best practices for structuring a Union.ai workflow project repo, covering task organization, workflow management, dependency handling, and documentation. ## Recommended Directory Structure A typical Union.ai workflow project structure could look like this: ```shell โ”œโ”€โ”€ .github/workflows/ โ”œโ”€โ”€ .gitignore โ”œโ”€โ”€ docs/ โ”‚ โ””โ”€โ”€ README.md โ”œโ”€โ”€ src/ โ”‚ โ”œโ”€โ”€ core/ # Core logic specific to the use case โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”‚ โ”œโ”€โ”€ model.py โ”‚ โ”‚ โ”œโ”€โ”€ data.py โ”‚ โ”‚ โ””โ”€โ”€ structs.py โ”‚ โ”œโ”€โ”€ tasks/ # Contains individual tasks โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”‚ โ”œโ”€โ”€ preprocess.py โ”‚ โ”‚ โ”œโ”€โ”€ fit.py โ”‚ โ”‚ โ”œโ”€โ”€ test.py โ”‚ โ”‚ โ””โ”€โ”€ plot.py โ”‚ โ”œโ”€โ”€ workflows/ # Contains workflow definitions โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”‚ โ”œโ”€โ”€ inference.py โ”‚ โ”‚ โ””โ”€โ”€ train.py โ”‚ โ””โ”€โ”€ orchestration/ # For helper constructs (e.g., secrets, images) โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ””โ”€โ”€ constants.py โ”œโ”€โ”€ uv.lock โ””โ”€โ”€ pyproject.toml ``` This structure is designed to ensure each project component has a clear, logical home, making it easy for team members to find and modify files. ## Organizing Tasks and Workflows In Union.ai, tasks are the building blocks of workflows, so itโ€™s important to structure them intuitively: * **Tasks**: Store each task in its own file within the `tasks/` directory. If multiple tasks are closely related, consider grouping them within a module. Alternatively, each task can have its own module to allow more granular organization and sub-directories could be used to group similar tasks. * **Workflows**: Store workflows, which combine tasks into end-to-end processes, in the `workflows/` directory. This separation ensures workflows are organized independently from core task logic, promoting modularity and reuse. ## Orchestration Directory for Helper Constructs Include a directory, such as `orchestration/` or `union_utils/`, for constructs that facilitate workflow orchestration. This can house helper files like: * **Secrets**: Definitions for accessing secrets (e.g., API keys) in Union.ai. * **ImageSpec**: A tool that simplifies container management, allowing you to avoid writing Dockerfiles directly. ## Core Logic for Workflow-Specific Functionality Use a `core/` directory for business logic specific to your workflows. This keeps the core application code separate from workflow orchestration code, improving maintainability and making it easier for new team members to understand core functionality. ## Importance of `__init__.py` Adding `__init__.py` files within each directory is essential: * **For Imports**: These files make the directory a Python package, enabling proper imports across modules. * **For Union.ai's Fast Registration**: When performing fast registration, Union.ai considers the first directory without an `__init__.py` as the root. Union.ai will then package the root and its contents into a tarball, streamlining the registration process and avoiding the need to rebuild the container image every time you make code changes. ## Monorepo vs Multi-repo: Choosing a structure When working with multiple teams, you have two main options: * **Monorepo**: A single repository shared across all teams, which can simplify dependency management and allow for shared constructs. However, it can introduce complexity in permissions and version control for different teams. * **Multi-repo**: Separate repositories for each team or project can improve isolation and control. In this case, consider creating shared, installable packages for constructs that multiple teams use, ensuring consistency without merging codebases. ## CI/CD The GitHub action should: * Register (and promote if needed) on merge to domain branch. * Execute on merge of input YAML. * Inject git SHA as entity version. ## Documentation and Docstrings Writing clear docstrings is encouraged, as they are automatically propagated to the Union.ai UI. This provides useful context for anyone viewing the workflows and tasks in the UI, reducing the need to consult source code for explanations. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/projects-and-domains === # Projects and domains Projects and domains are the principle organizational categories into which you group your workflows in Union.ai. Projects define groups of task, workflows, launch plans and other entities that share a functional purpose. Domains represent distinct steps through which the entities in a project transition as they proceed through the development cycle. By default, Union.ai provides three domains: `development`, `staging`, and `production`. During onboarding, you can configure your Union.ai instance to have different domains. Speak to the Union.ai team for more information. Projects and domains are orthogonal to each other, meaning that a project has multiple domains and a domain has multiple projects. Here is an example arrangement: | | Development | Staging | Production | |-----------|-------------------|-------------------|-------------------| | Project 1 | workflow_1 (v2.0) | workflow_1 (v1.0) | workflow_1 (v1.0) | | Project 2 | workflow_2 (v2.0) | workflow_2 (v1.0) | workflow_2 (v1.0) | ## Projects Projects represent independent workflows related to specific teams, business areas, or applications. Each project is isolated from others, but workflows can reference entities (workflows or tasks) from other projects to reuse generalizable resources. ## Domains Domains represent distinct environments orthogonal to the set of projects in your org within Union.ai, such as development, staging, and production. These enable dedicated configurations, permissions, secrets, cached execution history, and resource allocations for each environment, preventing unintended impact on other projects and/or domains. Using domains allows for a clear separation between environments, helping ensure that development and testing don't interfere with production workflows. A production domain ensures a โ€œclean slateโ€ so that cached development executions do not result in unexpected behavior. Additionally, secrets may be configured for external production data sources. ## When to use different Union.ai projects? Projects help group independent workflows related to specific teams, business areas, or applications. Generally speaking, each independent team or ML product should have its own Union.ai project. Even though these are isolated from one another, teams may reference entities (workflows or tasks) from other Union.ai projects to reuse generalizable resources. For example, one team may create a generalizable task to train common model types. However, this requires advanced collaboration and common coding standards. When setting up workflows in Union.ai, effective use of **projects** and **domains** is key to managing environments, permissions, and resource allocation. Below are best practices to consider when organizing workflows in Union.ai. ## Projects and Domains: The Power of the Project-Domain Pair Union.ai uses a project-domain pair to create isolated configurations for workflows. This pairing allows for: * **Dedicated Permissions**: Through Role-Based Access Control (RBAC), users can be assigned roles with tailored permissionsโ€”such as contributor or adminโ€”specific to individual project-domain pairs. This allows fine-grained control over who can manage or execute workflows within each pair, ensuring that permissions are both targeted and secure. More details **Administration > User management > Custom roles and policies**. * **Resource and Execution Monitoring**: Track and monitor resource utilization, executions, and performance metrics on a dashboard unique to each project-domain pair. This helps maintain visibility over workflow execution and ensures optimal performance. More details **Administration > Resources**. * **Resource Allocations and Quotas**: By setting quotas for each project-domain pair, Union.ai can ensure that workflows do not exceed designated limits, preventing any project or domain from unintentionally impacting resources available to others. Additionally, you can configure unique resource defaultsโ€”such as memory, CPU, and storage allocationsโ€”for each project-domain pair. This allows each pair to meet the specific requirements of its workflows, which is particularly valuable given the unique needs across different projects. More details **Core concepts > Tasks > Task hardware environment > Customizing task resources > Execution defaults and resource quotas** and **Administration > Resources**. * **Configuring Secrets**: Union.ai allows you to configure secrets at the project-domain level, ensuring sensitive information, such as API keys and tokens, is accessible only within the specific workflows that need them. This enhances security by isolating secrets according to the project and domain, reducing the risk of unauthorized access across environments. More details **Development cycle > Managing secrets**. ## Domains: Clear Environment Separation Domains represent distinct environments within Union.ai, allowing clear separation between development, staging, and production. This structure helps prevent cross-environment interference, ensuring that changes made in development or testing do not affect production workflows. Using domains for this separation ensures that workflows can evolve in a controlled manner across different stages, from initial development through to production deployment. ## Projects: Organizing Workflows by Teams, Business Areas, or Applications Projects in Union.ai are designed to group independent workflows around specific teams, business functions, or applications. By aligning projects to organizational structure, you can simplify access control and permissions while encouraging a clean separation of workflows across different teams or use cases. Although workflows can reference each other across projects, it's generally cleaner to maintain independent workflows within each project to avoid complexity. Union.aiโ€™s CLI tools and SDKs provide options to specify projects and domains easily: * **CLI Commands**: In most commands within the `union` and `uctl` CLIs, you can specify the project and domain by using the `--project` and `--domain` flags, enabling precise control over which project-domain pair a command applies to. More details **Union CLI** and **Uctl CLI**. * **Python SDK**: When working with the `union` SDK, you can leverage `UnionRemote` to define the project and domain for workflow interactions programmatically, ensuring that all actions occur in the intended environment. More details [here](union-remote). === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/building-workflows === # Building workflows ## When should I decompose tasks? There are several reasons why one may choose to decompose a task into smaller tasks. Doing so may result in better computational performance, improved cache performance, and taking advantage of interruptible tasks. However, decomposition comes at the cost of the overhead among tasks, including spinning up nodes and downloading data. In some cases, these costs may be remediated by using **Core concepts > Actors**. ### Differing runtime requirements Firstly, decomposition provides support for heterogeneous environments among the operations in the task. For example, you may have some large task that trains a machine learning model and then uses the model to run batch inference on your test data. However, training a model typically requires significantly more memory than inference. For that reason, given large enough scale, it could actually be beneficial to decompose this large task into two tasks that (1) train a model and then (2) run batch inference. By doing so, you could request significantly less memory for the second task in order to save on the expense of this workflow. If you are working with even more data, then you might benefit from decomposing the batch inference task via `map_task` such that you may further parallelize this operation, substantially reducing the runtime of this step. Generally speaking, decomposition provides infrastructural flexibility regarding the ability to define resources, dependencies, and execution parallelism. ### Improved cache performance Secondly, you may decompose large tasks into smaller tasks to enable โ€œfine-grainedโ€ caching. In other words, each unique task provides an automated โ€œcheckpointโ€ system. Thus, by breaking down a large workflow into its many natural tasks, one may minimize redundant work among multiple serial workflow executions. This is especially useful during rapid, iterative development, during which a user may attempt to run the same workflow multiple times in a short period of time. โ€œFine-grainedโ€ caching will dramatically improve productivity while executing workflows both locally and remotely. ### Take advantage of interruptible tasks Lastly, one may utilize โ€œfine-grainedโ€ caching to leverage interruptible tasks. Interruptible tasks will attempt to run on spot instances or spot VMs, where possible. These nodes are interruptible, meaning that the task may occasionally fail due to another organization willing to pay more to use it. However, these spot instances can be substantially cheaper than their non-interruptible counterparts (on-demand instances / VMs). By utilizing โ€œfine-grainedโ€ caching, one may reap the significant cost savings on interruptible tasks while minimizing the effects of having their tasks being interrupted. ## When should I parallelize tasks? In general, parallelize early and often. A lot of Union.aiโ€™s powerful ergonomics like caching and workflow recovery happen at the task level, as mentioned above. Decomposing into smaller tasks and parallelizing enables for a performant and fault-tolerant workflow. One caveat is for very short duration tasks, where the overhead of spinning up a pod and cleaning it up negates any benefits of parallelism. With reusable containers via **Core concepts > Actors**, however, these overheads are transparently obviated, providing the best of both worlds at the cost of some up-front work in setting up that environment. In any case, it may be useful to batch the inputs and outputs to amortize any overheads. Please be mindful to keep the sequencing of inputs within a batch, and of the batches themselves, to ensure reliable cache hits. ### Parallelization constructs The two main parallelization constructs in Union.ai are the **Development cycle > Building workflows > map task** and the **Core concepts > Workflows > Dynamic workflows**. They accomplish roughly the same goal but are implemented quite differently and have different advantages. Dynamic tasks are more akin to a `for` loop, iterating over inputs sequentially. The parallelism is controlled by the overall workflow parallelism. Map tasks are more efficient and have no such sequencing guarantees. They also have their own concurrency setting separate from the overall workflow and can have a minimum failure threshold of their constituent tasks. A deeper explanation of their differences is available [here]() while examples of how to use them together can be found [here](). ## When should I use caching? Caching should be enabled once the body of a task has stabilized. Cache keys are implicitly derived from the task signature, most notably the inputs and outputs. If the body of a task changes without a modification to the signature, and the same inputs are used, it will produce a cache hit. This can result in unexpected behavior when iterating on the core functionality of the task and expecting different inputs downstream. Moreover, caching will not introspect the contents of a `FlyteFile` for example. If the same URI is used as input with completely different contents, it will also produce a cache hit. For these reasons, itโ€™s wise to add an explicit cache key so that it can be invalidated at any time. Despite these caveats, caching is a huge time saver during workflow development. Caching upstream tasks enable a rapid run through of the workflow up to the node youโ€™re iterating on. Additionally, caching can be valuable in complex parallelization scenarios where youโ€™re debugging the failure state of large map tasks, for example. In production, if your cluster is under heavy resource constraints, caching can allow a workflow to complete across re-runs as more and more tasks are able to return successfully with each run. While not an ideal scenario, caching can help soften the blow of production failures. With these caveats in mind, there are very few scenarios where caching isnโ€™t warranted. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/setting-up-a-project === # Setting up a production project In Union.ai, your work is organized in a hierarchy with the following structure: * **Organization**: Your Union.ai instance, accessible at a specific URL like `union.my-company.com`. * **Domains** Within an organization there are (typically) three domains, `development`, `staging`, and `production`, used to organize your code during the development process. You can configure a custom set of domains to suit your needs during **Configuring your data plane**. * **Projects**: Orthogonal to domains, projects are used to organize your code into logical groups. You can create as many projects as you need. A given workflow will reside in a specific project. For example, let's say `my_workflow` is a workflow in `my_project`. When you start work on `my_workflow` you would typically register it in the project-domain `my_project/development`. As you work on successive iterations of the workflow you might promote `my_workflow` to `my_project/staging` and eventually `my_project/production`. Promotion is done simply by **Development cycle > Running your code**. ## Terminology In everyday use, the term "project" is often used to refer to not just the Union.ai entity that holds a set of workflows, but also to the local directory in which you are developing those workflows, and to the GitHub (or other SCM) repository that you are using to store the same workflow code. To avoid confusion, in this guide we will stick to the following naming conventions: * **Union.ai project**: The entity in your Union.ai instance that holds a set of workflows, as described above. Often referred to simply as a **project**. * **Local project**: The local directory (usually the working directory of a GitHub repository) in which you are developing workflows. ## Create a Union.ai project You can create a new project in the Union.ai UI by clicking on the project breadcrumb at the top left and selecting **All projects**: ![Select all projects](../../_static/images/user-guide/development-cycle/setting-up-a-project/select-all-projects.png) This will take you to the **Projects list**: ![Projects list](../../_static/images/user-guide/development-cycle/setting-up-a-project/projects-list.png) Click on the **New Project** button and fill in the details for your new project. You now have a project on Union.ai into which you can register your workflows. The next step is to set up a local workflow directory. ## Creating a local production project directory using `union init` Earlier, in the [Getting started](../getting-started/_index) section we used `union init` to create a new local project based on the `union-simple`. Here, we will do the same, but use the `union-production` template. Perform the following command: ```shell $ union init --template union-production my-project ``` ## Directory structure In the `basic-example` directory youโ€™ll see the following file structure: ```shell โ”œโ”€โ”€ LICENSE โ”œโ”€โ”€ README.md โ”œโ”€โ”€ docs โ”‚ โ””โ”€โ”€ docs.md โ”œโ”€โ”€ pyproject.toml โ”œโ”€โ”€ src โ”‚ โ”œโ”€โ”€ core โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”‚ โ””โ”€โ”€ core.py โ”‚ โ”œโ”€โ”€ orchestration โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”‚ โ””โ”€โ”€ orchestration.py โ”‚ โ”œโ”€โ”€ tasks โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”‚ โ””โ”€โ”€ say_hello.py โ”‚ โ””โ”€โ”€ workflows โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ””โ”€โ”€ hello_world.py โ””โ”€โ”€ uv.lock ``` You can create your own conventions and file structure for your production projects, but this tempkate provides a good starting point. However, the separate `workflows` subdirectory and the contained `__init__.py` file are significant. We will discuss them when we cover the **Development cycle > Running your code**. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/local-dependencies === # Local dependencies During the development cycle you will want to be able to run your workflows both locally on your machine and remotely on Union.ai. To enable this, you need to ensure that the required dependencies are installed in both places. Here we will explain how to install your dependencies locally. For information on how to make your dependencies available on Union.ai, see **Development cycle > ImageSpec**. ## Define your dependencies in your `pyproject.toml` We recommend using the [`uv` tool](https://docs.astral.sh/uv/) for project and dependency management. When using the best way declare your dependencies is to list them under `dependencies` in your `pyproject.toml` file, like this: ```toml [project] name = "union-simple" version = "0.1.0" description = "A simple Union.ai project" readme = "README.md" requires-python = ">=3.9,<3.13" dependencies = ["union"] ``` ## Create a Python virtual environment Ensure that your Python virtual environment is properly set up with the required dependencies. Using `uv`, you can install the dependencies with the command: ```shell $ uv sync ``` You can then activate the virtual environment with: ```shell $ source .venv/bin/activate ``` > [!NOTE] `activate` vs `uv run` > When running the Union CLI within your local project you must run it in the virtual environment _associated with_ that project. > > To run `union` within your project's virtual environment using `uv`, you can prefix it use the `uv run` command. For example: > > `uv run union ...` > > Alternatively, you can activate the virtual environment with `source .venv/bin/activate` and then run the `union` command directly. > In our examples we assume that you are doing the latter. Having installed your dependencies in your local environment, you can now **Development cycle > Running your code**. The next step is to ensure that the same dependencies are also **Development cycle > ImageSpec**. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/image-spec === # ImageSpec During the development cycle you will want to be able to run your workflows both locally on your machine and remotely on Union.ai, so you will need to ensure that the required dependencies are installed in both environments. Here we will explain how to set up the dependencies for your workflow to run remotely on Union.ai. For information on how to make your dependencies available locally, see **Development cycle > Local dependencies**. When a workflow is deployed to Union.ai, each task is set up to run in its own container in the Kubernetes cluster. You specify the dependencies as part of the definition of the container image to be used for each task using the `ImageSpec` class. For example:: ```python import union image_spec = union.ImageSpec( builder="union", name="say-hello-image", requirements="uv.lock", ) @union.task(container_image=image_spec) def say_hello(name: str) -> str: return f"Hello, {name}!" @union.workflow def hello_world_wf(name: str = "world") -> str: greeting = say_hello(name=name) return greeting ``` Here, the `ImageSpec` class is used to specify the container image to be used for the `say_hello` task. * The `builder` parameter specifies how the image should be built. The value `union` means that the image will be built using Union.ai's built-in cloud builder. In some cases you may want to build the image locally on your machine and push it to a container registry. In that case, you would remove the `builder` parameter (or set it to `envd`) and add a `registry` parameter with the URL of the registry to push the image to. See below for more details. * The `name` parameter specifies the name of the image. This name will be used to identify the image in the container registry. * The `requirements` parameter specifies the path to a file (relative to the directory in which the `union run` or `union register` command is invoked) that specifies the dependencies to be installed in the image. The file may be: * A `requirements.txt` file. * A `uv.lock` file generated by the `uv sync` command. * A `poetry.lock` file generated by the `poetry install` command. * A `pyproject.toml` file. When you execute the `union run` or `union register` command, Union.ai will build the container image defined in `ImageSpec` block (as well as registering the tasks and workflows defined in your code). ## Union.ai cloud image builder If you have specified `builder="union"` in the `ImageSpec`, Union.ai will build the image using its `ImageBuilder` service in the cloud and registered the image in Union.ai's own container registry. From there it will be pulled and installed in the task container when it spins up. All this is done transparently and does not require any set up by the user. ## Local image builder > [!NOTE] Local image build in BYOC > In Union.ai BYOC, you can build images from ImageSpec either using the Union.ai cloud image builder (by specifying `builder="union"`) or on your local machine > (by omitting the `builder` parameter or specifying `builder="envd"`). > In Union.ai Serverless, images defined by `ImageSpec` are always built using the Union.ai cloud image builder. > Local image building is not supported in Serverless. If you have not specified a `builder` or have specified `builder="envd"`, Union.ai will build the image locally on your machine and push it to the registry you specify. This also requires that you specify a `registry` parameter in the `ImageSpec`. For example: ```python image_spec = union.ImageSpec( builder="envd", name="say-hello-image", requirements="uv.lock", registry="https://ghcr.io/", ) ``` Here we assume you are using GitHub's GHCR, and that you substitute your GitHub organization name for ``. ### Local container engine To enable local image building you must have an [OCI-compatible](https://opencontainers.org/) container engine, like [Docker](https://docs.docker.com/get-docker/), installed and running locally. Other options include [Podman](https://podman.io/), [LXD](https://linuxcontainers.org/lxd/introduction/), or [Containerd](https://containerd.io/). ### Access to a container registry You will also need access to a container registry. You must specify the URL of the registry in the `registry` parameter of the `ImageSpec`. Above we used the GitHub Container Registry (GHCR) that comes as part of your GitHub account. For more information, see [Working with the Container registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry). You may use another container registry if you prefer, such as [Docker Hub](https://hub.docker.com/), [Amazon Elastic Container Registry (ECR)](../integrations/enabling-aws-resources/enabling-aws-ecr), or [Google Artifact Registry (GAR)](../integrations/enabling-gcp-resources/enabling-google-artifact-registry). You will need to set up your local Docker client to authenticate to GHCR in order for `union` to be able to push the image built according to the `ImageSpec` to GHCR. Follow the directions in [Working with the Container registry > Authenticating to the Container registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry). ### Make your image accessible to Union.ai In addition to making sure your registry is accessible from your local machine, you will need to ensure that the specific image, once pushed to the registry, is itself publicly accessible. > [!NOTE] Make your image public > Note that in the case of our example registry (GHCR), making the image public can only be done once the image _has been_ pushed. > This means that you will need to register your workflow first, then make the image public and then run the workflow from the Union.ai UI. > If you try to run the workflow before making the image public (for example by doing a `union run` which both registers and runs immediately) > the workflow execution will fail with an `ImagePullBackOff `error. In the GitHub Container Registry, switch the visibility of your container image to Public. For more information, see [Configuring a package's access control and visibility](https://docs.github.com/en/packages/learn-github-packages/configuring-a-packages-access-control-and-visibility#about-inheritance-of-access-permissions-and-visibility). At this point, you can run the workflow from the Union.ai interface. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/running-your-code === # Running your code ## Set up your development environment If you have not already done so, follow the **Getting started** section to sign in to Union.ai, and set up your local environment. ## CLI commands for running your code The Union CLI and Uctl CLI provide commands that allow you to deploy and run your code at different stages of the development cycle: 1. `union run`: For deploying and running a single script immediately in your local Python environment. 2. `union run --remote`: For deploying and running a single script immediately in the cloud on Union.ai. 3. `union register`: For deploying multiple scripts to Union.ai and running them from the Web interface. 4. `union package` and `uctl register`: For deploying workflows to production and for scripting within a CI/CD pipeline. > [!NOTE] > In some cases, you may want to test your code in a local cluster before deploying it to Union.ai. > This step corresponds to using the commands 2, 3, or 4, but targeting your local cluster instead of Union.ai. > For more details, see **Development cycle > Running in a local cluster**. ## Running a script in local Python with `union run` {#running-a-script-in-local-python} During the development cycle you will want to run a specific workflow or task in your local Python environment to test it. To quickly try out the code locally use `union run`: ```shell $ union run workflows/example.py wf --name 'Albert' ``` Here you are invoking `union run` and passing the name of the Python file and the name of the workflow within that file that you want to run. In addition, you are passing the named parameter `name` and its value. This command is useful for quickly testing a workflow locally to check for basic errors. For more details see [union run details](./details-of-union-run). ## Running a script on Union.ai with `union run --remote` To quickly run a workflow on Union.ai, use `union run --remote`: ```shell $ union run --remote --project basic-example --domain development workflows/example.py wf --name 'Albert' ``` Here we are invoking `union run --remote` and passing: * The project, `basic-example` * The domain, `development` * The Python file, `workflows/example.py` * The workflow within that file that you want to run, `wf` * The named parameter `name`, and its value This command will: * Build the container image defined in your `ImageSpec`. * Package up your code and deploy it to the specified project and domain in Union.ai. * Run the workflow on Union.ai. This command is useful for quickly deploying and running a specific workflow on Union.ai. For more details see [union run details](./details-of-union-run). This command is useful for quickly deploying and running a specific workflow on Union.ai. For more details see [union run details](./details-of-union-run). ## Running tasks through uctl This is a multi-step process where we create an execution spec file, update the spec file, and then create the execution. ### Generate execution spec file ```shell $ uctl launch task --project flytesnacks --domain development --name workflows.example.generate_normal_df --version v1 ``` ### Update the input spec file for arguments to the workflow ```yaml iamRoleARN: 'arn:aws:iam::12345678:role/defaultrole' inputs: n: 200 mean: 0.0 sigma: 1.0 kubeServiceAcct: "" targetDomain: "" targetProject: "" task: workflows.example.generate_normal_df version: "v1" ``` ### Create execution using the exec spec file ```shell $ uctl create execution -p flytesnacks -d development --execFile exec_spec.yaml ``` ### Monitor the execution by providing the execution id from create command ```shell $ uctl get execution -p flytesnacks -d development ``` ## Running workflows through uctl Workflows on their own are not runnable directly. However, a launchplan is always bound to a workflow (at least the auto-create default launch plan) and you can use launchplans to `launch` a workflow. The `default launchplan` for a workflow has the same name as its workflow and all argument defaults are also identical. Tasks also can be executed using the launch command. One difference between running a task and a workflow via launchplans is that launchplans cannot be associated with a task. This is to avoid triggers and scheduling. ## Running launchplans through uctl This is multi-step process where we create an execution spec file, update the spec file and then create the execution. More details can be found **Uctl CLI > uctl create > uctl create execution**. ### Generate an execution spec file ```shell $ uctl get launchplan -p flytesnacks -d development myapp.workflows.example.my_wf --execFile exec_spec.yaml ``` ### Update the input spec file for arguments to the workflow ```yaml inputs: name: "adam" ``` ### Create execution using the exec spec file ```shell $ uctl create execution -p flytesnacks -d development --execFile exec_spec.yaml ``` ### Monitor the execution by providing the execution id from create command ```bash $ uctl get execution -p flytesnacks -d development ``` ## Deploying your code to Union.ai with `union register` ```shell $ union register workflows --project basic-example --domain development ``` Here we are registering all the code in the `workflows` directory to the project `basic-example` in the domain `development`. This command will: * Build the container image defined in your `ImageSpec`. * Package up your code and deploy it to the specified project and domain in Union.ai. The package will contain the code in the Python package located in the `workflows` directory. Note that the presence of the `__init__.py` file in this directory is necessary in order to make it a Python package. The command will not run the workflow. You can run it from the Web interface. This command is useful for deploying your full set of workflows to Union.ai for testing. ### Fast registration `union register` packages up your code through a mechanism called fast registration. Fast registration is useful when you already have a container image thatโ€™s hosted in your container registry of choice, and you change your workflow/task code without any changes in your system-level/Python dependencies. At a high level, fast registration: * Packages and zips up the directory/file that you specify as the argument to `union register`, along with any files in the root directory of your project. The result of this is a tarball that is packaged into a `.tar.gz` file, which also includes the serialized task (in `protobuf` format) and workflow specifications defined in your workflow code. * Registers the package to the specified cluster and uploads the tarball containing the user-defined code into the configured blob store (e.g. S3, GCS). At workflow execution time, Union.ai knows to automatically inject the zipped up task/workflow code into the running container, thereby overriding the user-defined tasks/workflows that were originally baked into the image. > [!NOTE] `WORKDIR`, `PYTHONPATH`, and `PATH` > When executing any of the above commands, the archive that gets creates is extracted wherever the `WORKDIR` is set. > This can be handled directly via the `WORKDIR` directive in a `Dockerfile`, or specified via `source_root` if using `ImageSpec`. > This is important for discovering code and executables via `PATH` or `PYTHONPATH`. > A common pattern for making your Python packages fully discoverable is to have a top-level `src` folder, adding that to your `PYTHONPATH`, > and making all your imports absolute. > This avoids having to โ€œinstallโ€ your Python project in the image at any point e.g. via `pip install -e`. ## Inspecting executions Uctl supports inspecting execution by retrieving its details. For a deeper dive, refer to the [Reference](../../api-reference/uctl-cli/_index) guide. Monitor the execution by providing the execution id from create command which can be task or workflow execution. ```shell $ uctl get execution -p flytesnacks -d development ``` For more details use `--details` flag which shows node executions along with task executions on them. ```shell $ uctl get execution -p flytesnacks -d development --details ``` If you prefer to see yaml/json view for the details then change the output format using the -o flag. ```shell $ uctl get execution -p flytesnacks -d development --details -o yaml ``` To see the results of the execution you can inspect the node closure outputUri in detailed yaml output. ```shell "outputUri": "s3://my-s3-bucket/metadata/propeller/flytesnacks-development-/n0/data/0/outputs.pb" ``` ## Deploying your code to production ### Package your code with `union package` The combination of `union package` and `uctl register` is the standard way of deploying your code to production. This method is often used in scripts to **Development cycle > CI/CD deployment**. First, package your workflows: ```shell $ union --pkgs workflows package ``` This will create a tar file called `flyte-package.tgz` of the Python package located in the `workflows` directory. Note that the presence of the `__init__.py` file in this directory is necessary in order to make it a Python package. > [!NOTE] > You can specify multiple workflow directories using the following command: > > `union --pkgs DIR1 --pkgs DIR2 package ...` > > This is useful in cases where you want to register two different projects that you maintain in a single place. > > If you encounter a ModuleNotFoundError when packaging, use the --source option to include the correct source paths. For instance: > > `union --pkgs package --source ./src -f` ### Register the package with `uctl register` Once the code is packaged you register it using the `uctl` CLI: ```shell $ uctl register files \ --project basic-example \ --domain development \ --archive flyte-package.tgz \ --version "$(git rev-parse HEAD)" ``` Letโ€™s break down what each flag is doing here: * `--project`: The target Union.ai project. * `--domain`: The target domain. Usually one of `development`, `staging`, or `production`. * `--archive`: This argument allows you to pass in a package file, which in this case is the `flyte-package.tgz` produced earlier. * `--version`: This is a version string that can be any string, but we recommend using the Git SHA in general, especially in production use cases. See [Uctl CLI](../../api-reference/uctl-cli/_index) for more details. ## Using union register versus union package + uctl register As a rule of thumb, `union register` works well when you are working on a single cluster and iterating quickly on your task/workflow code. On the other hand, `union package` and `uctl register` is appropriate if you are: * Working with multiple clusters, since it uses a portable package * Deploying workflows to a production context * Testing your workflows in your CI/CD infrastructure. > [!NOTE] Programmatic Python API > You can also perform the equivalent of the three methods of registration using a [UnionRemote object](../development-cycle/union-remote/_index). ## Image management and registration method The `ImageSpec` construct available in `union` also has a mechanism to copy files into the image being built. Its behavior depends on the type of registration used: * If fast register is used, then itโ€™s assumed that you donโ€™t also want to copy source files into the built image. * If fast register is not used (which is the default for `union package`, or if `union register --copy none` is specified), then itโ€™s assumed that you do want source files copied into the built image. * If your `ImageSpec` constructor specifies a `source_root` and the `copy` argument is set to something other than `CopyFileDetection.NO_COPY`, then files will be copied regardless of fast registration status. ## Building your own images While we recommend that you use `ImageSpec` and the `union` cloud image builder, you can, if you wish build and deploy your own images. You can start with `union init --template basic-template-dockerfile`, the resulting template project includes a `docker_build.sh` script that you can use to build and tag a container according to the recommended practice: ```shell $ ./docker_build.sh ``` By default, the `docker_build.sh` script: * Uses the `PROJECT_NAME` specified in the union command, which in this case is my_project. * Will not use any remote registry. * Uses the Git SHA to version your tasks and workflows. You can override the default values with the following flags: ```shell $ ./docker_build.sh -p PROJECT_NAME -r REGISTRY -v VERSION ``` For example, if you want to push your Docker image to Githubโ€™s container registry you can specify the `-r ghcr.io` flag. > [!NOTE] > The `docker_build.sh` script is purely for convenience; you can always roll your own way of building Docker containers. Once youโ€™ve built the image, you can push it to the specified registry. For example, if youโ€™re using Github container registry, do the following: ```shell $ docker login ghcr.io $ docker push TAG ``` ## CI/CD with Flyte and GitHub Actions You can use any of the commands we learned in this guide to register, execute, or test Union.ai workflows in your CI/CD process. Union.ai provides two GitHub actions that facilitate this: * `flyte-setup-action`: This action handles the installation of uctl in your action runner. * `flyte-register-action`: This action uses `uctl register` under the hood to handle registration of packages, for example, the `.tgz` archives that are created by `union package`. ### Some CI/CD best practices In the case where workflows are registered on each commit in your build pipelines, you can consider the following recommendations and approach: * **Versioning Strategy** : Determining the version of the build for different types of commits makes them consistent and identifiable. For commits on feature branches, use `{branch-name}-{short-commit-hash}` and for the ones on main branches, use `main-{short-commit-hash}`. Use version numbers for the released (tagged) versions. * **Workflow Serialization and Registration** : Workflows should be serialized and registered based on the versioning of the build and the container image. Depending on whether the build is for a feature branch or `main`, the registration domain should be adjusted accordingly. * **Container Image Specification** : When managing multiple images across tasks within a workflow, use the `--image` flag during registration to specify which image to use. This avoids hardcoding the image within the task definition, promoting reusability and flexibility in workflows. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/overriding-parameters === # Overriding parameters The `with_overrides` method allows you to specify parameter overrides on [tasks](../core-concepts/tasks/_index), **Core concepts > Workflows > Subworkflows and sub-launch plans** at execution time. This is useful when you want to change the behavior of a task, subworkflow, or sub-launch plan without modifying the original definition. ## Task parameters When calling a task, you can specify the following parameters in `with_overrides`: * `accelerator`: Specify **Core concepts > Tasks > Task hardware environment > Accelerators**. * `cache_serialize`: Enable **Core concepts > Caching**. * `cache_version`: Specify the **Core concepts > Caching**. * `cache`: Enable **Core concepts > Caching**. * `container_image`: Specify a **Core concepts > Tasks > Task software environment > Local image building**. * `interruptible`: Specify whether the task is **Core concepts > Tasks > Task hardware environment > Interruptible instances**. * `limits`: Specify **Core concepts > Tasks > Task hardware environment > Customizing task resources**. * `name`: Give a specific name to this task execution. This will appear in the workflow flowchart in the UI (see **Development cycle > Overriding parameters > below**). * `node_name`: Give a specific name to the DAG node for this task. This will appear in the workflow flowchart in the UI (see **Development cycle > Overriding parameters > below**). * `requests`: Specify **Core concepts > Tasks > Task hardware environment > Customizing task resources**. * `retries`: Specify the **Core concepts > Tasks > Task parameters**. * `task_config`: Specify a **Core concepts > Tasks > Task parameters**. * `timeout`: Specify the **Core concepts > Tasks > Task parameters**. For example, if you have a task that does not have caching enabled, you can use `with_overrides` to enable caching at execution time as follows: ```python my_task(a=1, b=2, c=3).with_overrides(cache=True) ``` ### Using `with_overrides` with `name` and `node_name` Using `with_overrides` with `name` on a task is a particularly useful feature. For example, you can use `with_overrides(name="my_task")` to give a specific name to a task execution, which will appear in the UI. The name specified can be chosen or generated at invocation time without modifying the task definition. ```python @union.workflow def wf() -> int: my_task(a=1, b=1, c=1).with_overrides(name="my_task_1") my_task(a=2, b=2, c=2).with_overrides(name="my_task_2", node_name="my_node_2") return my_task(a=1, b=1, c=1) ``` The above code would produce the following workflow display in the UI: ![Overriding name](../../_static/images/user-guide/development-cycle/overriding-parameters/override-name.png) There is also a related parameter called `node_name` that can be used to give a specific name to the DAG node for this task. The DAG node name is usually autogenerated as `n0`, `n1`, `n2`, etc. It appears in the `node` column of the workflow table. Overriding `node_name` results in the autogenerated name being replaced by the specified name: ![Overriding node name](../../_static/images/user-guide/development-cycle/overriding-parameters/override-node-name.png) Note that the `node_name` was specified as `my_node_2` in the code but appears as `my_node_2` in the UI. This is to the fact that Kubernetes node names cannot contain underscores. Union.ai automatically alters the name to be Kubernetes-compliant. ## Subworkflow and sub-launch plan parameters When calling a workflow or launch plan from within a high-level workflow (in other words, when invoking a subworkflow or sub-launch plan), you can specify the following parameters in `with_overrides`: * `cache_serialize`: Enable **Core concepts > Caching**. * `cache_version`: Specify the **Core concepts > Caching**. * `cache`: Enable **Core concepts > Caching**. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/run-details === # Run details The `union run` command is used to run a specific workflow or task in your local Python environment or on Union.ai. In this section we will discuss some details of how and why to use it. ## Passing parameters `union run` enables you to execute a specific workflow using the syntax: ```shell $ union run ``` Keyword arguments can be supplied to `union run` by passing them in like this: ```shell -- ``` For example, above we invoked `union run` with script `example.py`, workflow `wf`, and named parameter `name`: ```shell $ union run example.py wf --name 'Albert' ``` The value `Albert` is passed for the parameter `name`. With `snake_case` argument names, you have to convert them to `kebab-case`. For example, if the code were altered to accept a `last_name` parameter then the following command: ```shell $ union run example.py wf --last-name 'Einstein' ``` This passes the value `Einstein` for that parameter. ## Why `union run` rather than `python`? You could add a `main` guard at the end of the script like this: ```python if __name__ == "__main__": training_workflow(hyperparameters={"C": 0.1}) ``` This would let you run it with `python example.py`, though you have to hard code your arguments. It would become even more verbose if you want to pass in your arguments: ```python if __name__ == "__main__": import json from argparse import ArgumentParser parser = ArgumentParser() parser.add_argument("--hyperparameters", type=json.loads) ... # add the other options args = parser.parse_args() training_workflow(hyperparameters=args.hyperparameters)Py ``` `union run` is less verbose and more convenient for running workflows with arguments. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/debugging-with-interactive-tasks === # Debugging with interactive tasks With interactive tasks you can inspect and debug live task code directly in the UI in an embedded Visual Studio Code IDE. ## Enabling interactive tasks in your code To enable interactive tasks, you need to: * Include `flytekitplugins-flyteinteractive` as a dependency * Use the `@vscode` decorator on the tasks you want to make interactive. The `@vscode` decorator, when applied, converts a task into a Visual Studio Code server during runtime. This process overrides the standard execution of the taskโ€™s function body, initiating a command to start a Visual Studio Code server instead. > [!NOTE] No need for ingress or port forwarding > The Union.ai interactive tasks feature is an adaptation of the open-source > **External service backend plugins > FlyteInteractive**. > It improves on the open-source version by removing the need for ingress > configuration or port forwarding, providing a more seamless debugging > experience. ## Basic example The following example demonstrates interactive tasks in a simple workflow. ### requirements.txt This `requirements.txt` file is used by all the examples in this section: ```text flytekit flytekitplugins-flyteinteractive ``` ### example.py ```python """Union.ai workflow example of interactive tasks (@vscode)""" import union from flytekitplugins.flyteinteractive import vscode image = union.ImageSpec( registry="", name="interactive-tasks-example", base_image="ghcr.io/flyteorg/flytekit:py3.11-latest", requirements="requirements.txt" ) @union.task(container_image=image) @vscode def say_hello(name: str) -> str: s = f"Hello, {name}!" return s @union.workflow def wf(name: str = "world") -> str: greeting = say_hello(name=name) return greeting ``` ## Register and run the workflow To register the code to a project on Union.ai and run the workflow, follow the directions in **Development cycle > Running your code** ## Access the IDE 1. Select the first task in the workflow page (in this example the task is called `say_hello`). The task info pane will appear on the right side of the page. 2. Wait until the task is in the **Running** state and the **VSCode (User)** link appears. 3. Click the **VSCode (User)** link. ![VSCode link](../../_static/images/user-guide/development-cycle/debugging-with-interactive-tasks/vscode-link.png) ## Inspect the task code Once the IDE opens, you will be able to see your task code in the editor. ![Inspect code](../../_static/images/user-guide/development-cycle/debugging-with-interactive-tasks/inspect-code.png) ## Interactive debugging To run the task in VSCode, click the _Run and debug_ symbol on the left rail of the IDE and select the **Interactive Debugging** configuration. ![Interactive debugging](../../_static/images/user-guide/development-cycle/debugging-with-interactive-tasks/interactive-debugging.png) Click the **Play** button beside the configuration drop-down to run the task. This will run your task with inputs from the previous task. To inspect intermediate states, set breakpoints in the Python code and use the debugger for tracing. > [!NOTE] No task output written to Union.ai storage > Itโ€™s important to note that during the debugging phase the task runs entirely within VSCode and does not write the output to Union.ai storage. ## Update your code You can edit your code in the VSCode environment and run the task again to see the changes. Note, however, that the changes will not be automatically persisted anywhere. You will have to manually copy and paste the changes back to your local environment. ## Resume task After you finish debugging, you can resume your task with updated code by executing the **Resume Task** configuration. This will terminate the code server, run the task with inputs from the previous task, and write the output to Union.ai storage. > [!NOTE] Remember to persist your code > Remember to persist your code (for example, by checking it into GitHub) before resuming the task, since you will lose the connection to the VSCode server afterwards. ![Resume task](../../_static/images/user-guide/development-cycle/debugging-with-interactive-tasks/resume-task.png) ## Auxiliary Python files You will notice that aside from your code, there are some additional files in the VSCode file explorer that have been automatically generated by the system: ### flyteinteractive_interactive_entrypoint.py The `flyteinteractive_interactive_entrypoint.py` script implements the **Interactive Debugging** action that we used above: ![Interactive entrypoint](../../_static/images/user-guide/development-cycle/debugging-with-interactive-tasks/flyteinteractive-interactive-entrypoint-py.png) ### flyteinteractive_resume_task.py The `flyteinteractive_resume_task.py` script implements the **Resume Task** action that we used above: ![Resume task](../../_static/images/user-guide/development-cycle/debugging-with-interactive-tasks/flyteinteractive-resume-task-py.png) ### launch.json The `launch.json` file in the `.vscode` directory configures the **Interactive Debugging** and **Resume Task** actions. ![launch.json](../../_static/images/user-guide/development-cycle/debugging-with-interactive-tasks/launch-json.png) ## Integrated terminal In addition to using the convenience functions defined by the auxiliary files, you can also run your Python code script directly from the integrated terminal using `python .py` (in this example, `python hello.py`). ![Interactive terminal](../../_static/images/user-guide/development-cycle/debugging-with-interactive-tasks/interactive-terminal.png) ## Install extensions As with local VSCode, you can install a variety of extensions to assist development. Available extensions differ from official VSCode for legal reasons and are hosted on the [Open VSX Registry](https://open-vsx.org/). Python and Jupyter extensions are installed by default. Additional extensions can be added by defining a configuration object and passing it to the `@vscode` decorator, as shown below: ### example-extensions.py ```python """Union.ai workflow example of interactive tasks (@vscode) with extensions""" import union from flytekitplugins.flyteinteractive import COPILOT_EXTENSION, VscodeConfig, vscode image = union.ImageSpec( registry="", name="interactive-tasks-example", base_image="ghcr.io/flyteorg/flytekit:py3.11-latest", requirements="requirements.txt" ) config = VscodeConfig() config.add_extensions(COPILOT_EXTENSION) # Use predefined URL config.add_extensions( "https://open-vsx.org/api/vscodevim/vim/1.27.0/file/vscodevim.vim-1.27.0.vsix" ) # Copy raw URL from Open VSX @union.task(container_image=image) @vscode(config=config) def say_hello(name: str) -> str: s = f"Hello, {name}!" return s @union.workflow def wf(name: str = "world") -> str: greeting = say_hello(name=name) return greeting ``` ## Manage resources To manage resources, the VSCode server is terminated after a period of idleness (no active HTTP connections). Idleness is monitored via a heartbeat file. The `max_idle_seconds` parameter can be used to set the maximum number of seconds the VSCode server can be idle before it is terminated. ### example-manage-resources.py ```python """Union.ai workflow example of interactive tasks (@vscode) with max_idle_seconds""" import union from flytekitplugins.flyteinteractive import vscode image = union.ImageSpec( registry="", name="interactive-tasks-example", base_image="ghcr.io/flyteorg/flytekit:py3.11-latest", requirements="requirements.txt" ) @union.task(container_image=image) @vscode(max_idle_seconds=60000) def say_hello(name: str) -> str: s = f"Hello, {name}!" return s @union.workflow def wf(name: str = "world") -> str: greeting = say_hello(name=name) return greeting ``` ## Pre and post hooks Interactive tasks also allow the registration of functions to be executed both before and after VSCode starts. This can be used for tasks requiring setup or cleanup. ### example-pre-post-hooks.py ```python """Union.ai workflow example of interactive tasks (@vscode) with pre and post hooks""" import union from flytekitplugins.flyteinteractive import vscode image = union.ImageSpec( registry="", name="interactive-tasks-example", base_image="ghcr.io/flyteorg/flytekit:py3.11-latest", requirements="requirements.txt" ) def set_up_proxy(): print("set up") def push_code(): print("push code") @union.task(container_image=image) @vscode(pre_execute=set_up_proxy, post_execute=push_code) def say_hello(name: str) -> str: s = f"Hello, {name}!" return s @union.workflow def wf(name: str = "world") -> str: greeting = say_hello(name=name) return greeting ``` ## Only initiate VSCode on task failure The system can also be set to only initiate VSCode _after a task failure_, preventing task termination and thus enabling inspection. This is done by setting the `run_task_first` parameter to `True`. ### example-run-task-first.py ```python """Union.ai workflow example of interactive tasks (@vscode) with run_task_first""" import union from flytekitplugins.flyteinteractive import vscode image = union.ImageSpec( registry="", name="interactive-tasks-example", base_image="ghcr.io/flyteorg/flytekit:py3.11-latest", requirements="requirements.txt" ) @union.task(container_image=image) @vscode(run_task_first=True) def say_hello(name: str) -> str: s = f"Hello, {name}!" return s @union.workflow def wf(name: str = "world") -> str: greeting = say_hello(name=name) return greeting ``` ## Debugging execution issues The inspection of task and workflow execution provides log links to debug things further. Using `--details` flag you can view node executions with log links. ```shell โ””โ”€โ”€ n1 - FAILED - 2021-06-30 08:51:07.3111846 +0000 UTC - 2021-06-30 08:51:17.192852 +0000 UTC โ””โ”€โ”€ Attempt :0 โ””โ”€โ”€ Task - FAILED - 2021-06-30 08:51:07.3111846 +0000 UTC - 2021-06-30 08:51:17.192852 +0000 UTC โ””โ”€โ”€ Logs : โ””โ”€โ”€ Name :Kubernetes Logs (User) โ””โ”€โ”€ URI :http://localhost:30082/#/log/flytectldemo-development/f3a5a4034960f4aa1a09-n1-0/pod?namespace=flytectldemo-development ``` Additionally, you can check the pods launched in `\-\` namespace ```shell $ kubectl get pods -n - ``` The launched pods will have a prefix of execution name along with suffix of `nodeId`: ```shell NAME READY STATUS RESTARTS AGE f65009af77f284e50959-n0-0 0/1 ErrImagePull 0 18h ``` For example, above we see that the `STATUS` indicates an issue with pulling the image. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/managing-secrets === # Managing secrets You can use secrets to interact with external services. ## Creating secrets ### Creating a secret on the command line To create a secret, use the `union create secret` command: ```shell $ union create secret my_secret_name ``` You'll be prompted to enter a secret value in the terminal: ``` Enter secret value: ... ``` ### Creating a secret from a file To create a secret from a file, run the following command: ```shell $ union create secret my_secret_name -f /path/to/secret_file ``` ### Scoping secrets * When you create a secret without specifying a project` or domain, as we did above, the secret will be available across all projects-domain combinations. * If you specify only a domain, the secret will be available across all projects, but only in that domain. * If you specify both a project and a domain, the secret will be available in that project-domain combination only. * If you specify only a project, you will get an error. For example, to create a secret so that it is only available in `my_project-development`, you would run: ```shell $ union create secret my_secret_name --project my_project --domain development ``` ## Listing secrets You can list existing secrets with the `union get secret` command. For example, the following command will list all secrets in the organization: ```shell $ union get secret ``` Specifying either or both of the `--project` and `--domain` flags will list the secrets that are **only** available in that project and/or domain. For example, to list the secrets that are only available in `my_project` and domain `development`, you would run: ```shell $ union get secret --project my_project --domain development ``` ## Using secrets in workflow code Note that a workflow can only access secrets whose scope includes the project and domain of the workflow. ### Using a secret created on the command line To use a secret created on the command line, see the example code below. To run the example code: 1. **Development cycle > Managing secrets > Creating secrets > Creating a secret on the command line** with the key `my_secret`. 2. Copy the following example code to a new file and save it as `using_secrets.py`. 3. Run the script with `union run --remote using_secrets.py main`. ```python import union @union.task(secret_requests=[union.Secret(key="my_secret")]) def t1(): secret_value = union.current_context().secrets.get(key="my_secret") # do something with the secret. For example, communication with an external API. ... ``` > [!WARNING] > Do not return secret values from tasks, as this will expose secrets to the control plane. With `env_var`, you can automatically load the secret into the environment. This is useful with libraries that expect the secret to have a specific name: ```python import union @union.task(secret_requests=[union.Secret(key="my_union_api_key", env_var="UNION_API_KEY")]) def t1(): # Authenticates the remote with UNION_API_KEY remote = union.UnionRemote(default_project="flytesnacks", default_domain="development") ``` ### Using a secret created from a file To use a secret created from a file in your workflow code, you must mount it as a file. To run the example code below: 1. **Development cycle > Managing secrets > Creating secrets > Creating a secret from a file** with the key `my_secret`. 2. Copy the example code below to a new file and save it as `using_secrets_file.py`. 4. Run the script with `union run --remote using_secrets_file.py main`. ```python import union @union.task( secret_requests=[ union.Secret(key="my_file_secret", mount_requirement=union.Secret.MountType.FILE), ] ) def t1(): path_to_secret_file = union.current_context().secrets.get_secrets_file("my_file_secret") with open(path_to_secret_file, "r") as f: secret_value = f.read() # do something with the secret. For example, communication with an external API. ... ``` > [!WARNING] > Do not return secret values from tasks, as this will expose secrets to the control plane. > [!NOTE] > The `get_secrets_file` method takes the secret key and returns the path to the secret file. ## Updating secrets To update a secret, run the `union update secret` command. You will be prompted to enter a new value: ```shell $ union update secret --project my_project --domain my_domain my_secret ``` ## Deleting secrets To delete a secret, use the `union delete secret` command: ```shell $ union delete secret --project my_project --domain my_domain my_secret ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/managing-api-keys === # Managing API keys You need to create an API key to allow external systems to run compute on Union.ai, e.g. a GitHub action that registers or runs workflows. ## Creating an API key To create an API key, run the following with the Union CLI with any name. ```shell $ union create api-key admin --name my-custom-name Client ID: my-custom-name The following API key will only be shown once. Be sure to keep it safe! Configure your headless CLI by setting the following environment variable: export UNION_API_KEY="" ``` Store the `` in a secure location. For `git` development, make sure to not check in the `` into your repository. Within a GitHub action, you can use [Github Secrets](https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions) to store the secret. For this example, copy the following workflow into a file called `hello.py`: ```python import union @union.task def welcome(name: str) -> str: return f"Welcome to Union.ai! {name}" @union.workflow def main(name: str) -> str: return welcome(name=name) ``` You can run this workflow from any machine by setting the `UNION_API_KEY` environment variable: ```shell $ export UNION_API_KEY="" $ union run --remote hello.py main --name "Union.ai" ``` ## Listing and deleting applications You can list all your application by running: ```shell $ union get api-key admin ``` ```shell โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“ โ”ƒ client_id โ”ƒ โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ โ”‚ my-custom-name โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` The `client_id` contains your custom application name and a prefix that contains your username. Finally, you can delete your application by running: ```shell $ union delete api-key admin --name my-custom-name ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/accessing-aws-s3 === # Accessing AWS S3 buckets Here we will take a look at how to access data on AWS S3 Buckets from Union.ai. As a prerequisite, we assume that our AWS S3 bucket is accessible with API keys: `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. ## Creating secrets on Union.ai First, we create secrets on Union.ai by running the following command: ```shell $ union create secret AWS_ACCESS_KEY_ID ``` This will open a prompt where we paste in our AWS credentials: ```shell Enter secret value: ๐Ÿ—๏ธ ``` Repeat this process for all other AWS credentials, such as `AWS_SECRET_ACCESS_KEY`. ## Using secrets in a task Next, we can use the secrets directly in a task! With AWS CLI, we create a small text file and move it to a AWS bucket ```shell $ aws s3 mb s3://test_bucket $ echo "Hello Union.ai" > my_file.txt $ aws s3 cp my_file.txt s3://test_bucket/my_file.txt ``` Next, we give a task access to our AWS secrets by supplying them through `secret_requests`. For this guide, save the following snippet as `aws-s3-access.py` and run: ```python import union @union.task( secret_requests=[ union.Secret(key="AWS_ACCESS_KEY_ID"), union.Secret(key="AWS_SECRET_ACCESS_KEY"), ], ) def read_s3_data() -> str: import s3fs secrets = union.current_context().secrets s3 = s3fs.S3FileSystem( secret=secrets.get(key="AWS_SECRET_ACCESS_KEY"), key=secrets.get(key="AWS_ACCESS_KEY_ID"), ) with s3.open("test_bucket/my_file.txt") as f: content = f.read().decode("utf-8") return content @union.workflow def main(): read_s3_data() ``` Within the task, the secrets are available through `current_context().secrets` and passed to `s3fs`. Running the following command to execute the workflow: ```shell $ union run --remote aws-s3-access.py main ``` ## Conclusion You can easily access your AWS S3 buckets by running `union create secret` and configuring your tasks to access the secrets! === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/task-resource-validation === # Task resource validation In Union.ai, when you attempt to execute a workflow with unsatisfiable resource requests, we fail the execution immediately rather than allowing it to queue forever. We intercept execution creation requests in executions service to validate that their resource requirements can be met and fast-fail if not. A failed validation returns a message similar to ```text Request failed with status code 400 rpc error: code = InvalidArgument desc = no node satisfies task 'workflows.fotd.fotd_directory' resource requests ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/running-in-a-local-cluster === # Running in a local cluster ## Running in a local Kubernetes cluster Ultimately you will be running your workflows in a Kubernetes cluster in Union.ai. But it can be handy to try out a workflow in a cluster on your local machine. First, ensure that you have [Docker](https://www.docker.com/products/docker-desktop/) (or a similar OCI-compliant container engine) installed locally and that _the daemon is running_. Then start the demo cluster using `uctl`: ```shell $ uctl demo start ``` ### Configuration When `uctl` starts the cluster in your local container engine it also writes configuration information to the directory `~/.union/`. Most importantly, it creates the file `~/.union/config-sandbox.yaml`. This file holds (among other things) the location of the Kubernetes cluster to which we will be deploying the workflow: ```yaml admin: endpoint: localhost:30080 authType: Pkce insecure: true console: endpoint: http://localhost:30080 logger: show-source: true level: 0 ``` Right now this file indicates that the target cluster is your local Docker instance (`localhost:30080`), but later we will change it to point to your Union.ai cluster. Later invocations of `uctl` or `union` will need to know the location of the target cluster. This can be provided in two ways: 1. Explicitly passing the location of the config file on the command line * `uctl --config ~/.union/config-sandbox.yaml ` * `union --config ~/.union/config-sandbox.yaml ` 2. Setting the environment variable `UNION_CONFIG`to the location of the config file: * `export UNION_CONFIG=~/.union/config-sandbox.yaml` > [!NOTE] > In this guide, we assume that you have set the `UNION_CONFIG` environment variable in your shell to the location of the configuration file. ### Start the workflow Now you can run your workflow in the local cluster simply by adding the `--remote` flag to your `union` command: ```shell $ union run --remote \ workflows/example.py \ training_workflow \ --hyperparameters '{"C": 0.1}' ``` The output supplies a URL to your workflow execution in the UI. ### Inspect the results Navigate to the URL produced by `union run` to see your workflow in the Union.ai UI. ## Local cluster with default image ```shell $ union run --remote my_file.py my_workflow ``` _Where `union` is configured to point to the local cluster started with `uctl demo start`._ * Task code runs in the environment of the default image in your local cluster. * Python code is dynamically overlaid into the container at runtime. * Only supports Python code whose dependencies are installed in the default image (see here). * Includes a local S3. * Supports some plugins but not all. * Single workflow runs immediately. * Workflow is registered to a default project. * Useful for demos. ## Local cluster with custom image ```shell $ union run --remote \ --image my_cr.io/my_org/my_image:latest \ my_file.py \ my_workflow ``` _Where `union` is configured to point to the local cluster started with `uctl demo start`._ * Task code runs in the environment of your custom image (`my_cr.io/my_org/my_image:latest`) in your local cluster. * Python code is dynamically overlaid into the container at runtime * Supports any Python dependencies you wish, since you have full control of the image. * Includes a local S3. * Supports some plugins but not all. * Single workflow runs immediately. * Workflow is registered to a default project. * Useful for advanced testing during the development cycle. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/ci-cd-deployment === # CI/CD deployment So far we have covered the steps of deploying a project manually from the command line. In many cases, you will want to automate this process through a CI/CD system. In this section, we explain how to set up a CI/CD system to register, execute and promote workflows on Union.ai. We will use GitHub Actions as the example CI/CD system. ## Create a Union.ai API key An API key is registered in your Union.ai control plane to enable external systems to perform actions on your behalf. To allow your CI/CD system to authenticate with Union.ai, create a Union.ai API key. See **Development cycle > Managing API keys** for details. ```shell $ union create api-key admin --name my-cicd-key ``` Copy the `UNION_API_KEY` value for later use; this is the only time the secret is displayed. ## Store the secret in your CI/CD secrets store Store the secret in your CI/CD secrets store. In GitHub, from the repository page: 1. Select **Settings > Secrets and variables > Actions**. 2. Select the **Secrets** tab and click **New repository secret**. 3. Give a meaningful name to the secret, like `UNION_CICD_API_KEY`. 4. Paste in the string from above as the value. 5. Click **Add secret**. ## Configure your CI/CD workflow file Create the CI/CD workflow file. For GitHub Actions, you might add `example-project/.github/workflows/deploy.yaml` similar to: ```yaml name: Deploy on: push: branches: - main env: PROJECT: flytesnacks DOMAIN: production jobs: build_and_register: runs-on: ubuntu-latest permissions: contents: read packages: write steps: - name: Checkout repository uses: actions/checkout@v3 - name: Install python & uv run: | sudo apt-get install python3 curl -LsSf https://astral.sh/uv/install.sh | sh - name: Install dependencies run: uv sync - name: Register to Union env: UNION_API_KEY: ${{ secrets.CICD_API_KEY }} run: | source .venv/bin/activate union register --version ${{ github.sha }} -p ${{ env.PROJECT }} \ -d ${{ env.DOMAIN }} --activate-launchplans ./launchplans ``` > [!NOTE] > The `Register to Union` step registers the launch plans and related Flyte entities in the `launchplans` directory. It sets the project and domain, activates launch plans automatically, and pins the version to the Git commit SHA for traceability across all registered Flyte entities. See union **Union CLI > `union` CLI commands > `register`** for additional options. Once this is set up, every push to the main branch in your repository will build and deploy your project to Union.ai. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/jupyter-notebooks === # Jupyter notebooks Union.ai supports the development, running, and debugging of tasks and workflows in an interactive Jupyter notebook environment, which accelerates the iteration speed when building data- or machine learning-driven applications. ## Write your workflows and tasks in cells When building tasks and workflows in a notebook, you write the code in cells as you normally would. From those cells you can run the code locally (i.e., in the notebook itself, not on Union.ai) by clikcing the run button, as you would in any notebook. ## Enable the notebook to register workflows to Union.ai To enable the tasks and workflows in your notebok to be easily registered and run on your Union.ai instance, you needdto set up an _interactive_ UnionRemote object and then use to invoke the remote executions: First, in a cell, create an interactive UnionRemote object: ```python from flytekit.configuration import Config from flytekit.remote import FlyteRemote remote = UnionRemote( config=Config.auto(), default_project="default", default_domain="development", interactive_mode_enabled=True, ) ``` The `interactive_mode_enabled` flag must be set to `True` when running in a Jupyter notebook environment, enabling interactive registration and execution of workflows. Next, set up the execution invocation in another cell: ```python execution = remote.execute(my_task, inputs={"name": "Joe"}) execution = remote.execute(my_wf, inputs={"name": "Anne"}) ``` The interactive UnionRemote client re-registers an entity whenever itโ€™s redefined in the notebook, including when you re-execute a cell containing the entity definition, even if the entity remains unchanged. This behavior facilitates iterative development and debugging of tasks and workflows in a Jupyter notebook. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/decks === # Decks Decks lets you display customized data visualizations from within your task code. Decks are rendered as HTML and appear right in the Union.ai UI when you run your workflow. > [!NOTE] > Decks is an opt-in feature; to enable it, set `enable_deck` to `True` in the task parameters. To begin, import the dependencies: ```python import union from flytekit.deck.renderer import MarkdownRenderer from sklearn.decomposition import PCA import plotly.express as px import plotly ``` > [!NOTE] > The renderers are packaged separately from `flytekit` itself. > To enable the `MarkdownRenderer` imported above > you first have to install the package `flytekitplugins-deck-standard` > in your local Python environment and include it in your `ImageSpec` (as shown below). We create a new deck named `pca` and render Markdown content along with a [PCA](https://en.wikipedia.org/wiki/Principal_component_analysis) plot. Now, declare the required dependnecies in an `ImageSpec`: ```python custom_image = union.ImageSpec( packages=[ "flytekitplugins-deck-standard", "markdown", "pandas", "pillow", "plotly", "pyarrow", "scikit-learn", "ydata_profiling", ], builder="union", ) ``` Next, we define the task that will construct the figure and create the Deck: ```python @union.task(enable_deck=True, container_image=custom_image) def pca_plot(): iris_df = px.data.iris() X = iris_df[["sepal_length", "sepal_width", "petal_length", "petal_width"]] pca = PCA(n_components=3) components = pca.fit_transform(X) total_var = pca.explained_variance_ratio_.sum() * 100 fig = px.scatter_3d( components, x=0, y=1, z=2, color=iris_df["species"], title=f"Total Explained Variance: {total_var:.2f}%", labels={"0": "PC 1", "1": "PC 2", "2": "PC 3"}, ) main_deck = union.Deck("pca", MarkdownRenderer().to_html("### Principal Component Analysis")) main_deck.append(plotly.io.to_html(fig)) ``` Note the usage of `append` to append the Plotly figure to the Markdown deck. The following is the expected output containing the path to the `deck.html` file: ``` {"asctime": "2023-07-11 13:16:04,558", "name": "flytekit", "levelname": "INFO", "message": "pca_plot task creates flyte deck html to file:///var/folders/6f/xcgm46ds59j7g__gfxmkgdf80000gn/T/flyte-0_8qfjdd/sandbox/local_flytekit/c085853af5a175edb17b11cd338cbd61/deck.html"} ``` ![Union deck plot](../../_static/images/user-guide/development-cycle/decks/flyte-deck-plot-local.webp) Once you execute this task on the Union.ai instance, you can access the deck by going to the task view and clicking the _Deck_ button: ![Union deck button](../../_static/images/user-guide/development-cycle/decks/flyte-deck-button.png) ## Deck tabs Each Deck has a minimum of three tabs: input, output and default. The input and output tabs are used to render the input and output data of the task, while the default deck can be used to creta cusom renderings such as line plots, scatter plots, Markdown text, etc. Additionally, you can create other tabs as well. ## Deck renderers > [!NOTE] > The renderers are packaged separately from `flytekit` itself. > To enable them you first have to install the package `flytekitplugins-deck-standard` > in your local Python environment and include it in your `ImageSpec`. ### Frame profiling renderer The frame profiling render creates a profile report from a Pandas DataFrame. ```python import union import pandas as pd from flytekitplugins.deck.renderer import FrameProfilingRenderer @union.task(enable_deck=True, container_image=custom_image) def frame_renderer() -> None: df = pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4]}) union.Deck("Frame Renderer", FrameProfilingRenderer().to_html(df=df)) ``` ![Frame renderer](../../_static/images/user-guide/development-cycle/decks/flyte-decks-frame-renderer.png) ### Top-frame renderer The top-fram renderer renders a DataFrame as an HTML table. ```python import union from typing import Annotated from flytekit.deck import TopFrameRenderer @union.task(enable_deck=True, container_image=custom_image) def top_frame_renderer() -> Annotated[pd.DataFrame, TopFrameRenderer(1)]: return pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4]}) ``` ![Top frame renderer](../../_static/images/user-guide/development-cycle/decks/flyte-decks-top-frame-renderer.png) ### Markdown renderer The Markdown renderer converts a Markdown string into HTML. ```python import union from flytekit.deck import MarkdownRenderer @union.task(enable_deck=True, container_image=custom_image) def markdown_renderer() -> None: union.current_context().default_deck.append( MarkdownRenderer().to_html("You can install flytekit using this command: ```import flytekit```") ) ``` ![Markdown renderer](../../_static/images/user-guide/development-cycle/decks/flyte-decks-markdown-renderer.png) ### Box renderer The box renderer groups rows of a DataFrame together into a box-and-whisker mark to visualize their distribution. Each box extends from the first quartile (Q1) to the third quartile (Q3). The median (Q2) is indicated by a line within the box. Typically, the whiskers extend to the edges of the box, plus or minus 1.5 times the interquartile range (IQR: Q3-Q1). ```python import union from flytekitplugins.deck.renderer import BoxRenderer @union.task(enable_deck=True, container_image=custom_image) def box_renderer() -> None: iris_df = px.data.iris() union.Deck("Box Plot", BoxRenderer("sepal_length").to_html(iris_df)) ``` ![Box renderer](../../_static/images/user-guide/development-cycle/decks/flyte-decks-box-renderer.png) ### Image renderer The image renderer converts a `FlyteFile` or `PIL.Image.Image` object into an HTML displayable image, where the image data is encoded as a base64 string. ```python import union from flytekitplugins.deck.renderer import ImageRenderer @union.task(enable_deck=True, container_image=custom_image) def image_renderer(image: union.FlyteFile) -> None: union.Deck("Image Renderer", ImageRenderer().to_html(image_src=image)) @union.workflow def image_renderer_wf(image: union.FlyteFile = "https://bit.ly/3KZ95q4",) -> None: image_renderer(image=image) ``` ![Image renderer](../../_static/images/user-guide/development-cycle/decks/flyte-decks-image-renderer.png) #### Table renderer The table renderer converts a Pandas DataFrame into an HTML table. ```python import union from flytekitplugins.deck.renderer import TableRenderer @union.task(enable_deck=True, container_image=custom_image) def table_renderer() -> None: union.Deck( "Table Renderer", TableRenderer().to_html(df=pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4]}), table_width=50), ) ``` ![Table renderer](../../_static/images/user-guide/development-cycle/decks/flyte-decks-table-renderer.png) ### Custom renderers You can also create your own custom renderer. A renderer is essentially a class with a `to_html` method. Here we create custom renderer that summarizes the data from a Pandas `DataFrame` instead of showing raw values. ```python class DataFrameSummaryRenderer: def to_html(self, df: pd.DataFrame) -> str: assert isinstance(df, pd.DataFrame) return df.describe().to_html() ``` Then we can use the Annotated type to override the default renderer of the `pandas.DataFrame` type: ```python try: from typing import Annotated except ImportError: from typing_extensions import Annotated @task(enable_deck=True) def iris_data( sample_frac: Optional[float] = None, random_state: Optional[int] = None, ) -> Annotated[pd.DataFrame, DataFrameSummaryRenderer()]: data = px.data.iris() if sample_frac is not None: data = data.sample(frac=sample_frac, random_state=random_state) md_text = ( "# Iris Dataset\n" "This task loads the iris dataset using the `plotly` package." ) flytekit.current_context().default_deck.append(MarkdownRenderer().to_html(md_text)) flytekit.Deck("box plot", BoxRenderer("sepal_length").to_html(data)) return data ``` ## Streaming Decks You can stream a Deck directly using `Deck.publish()`: ```python import union @task(enable_deck=True) def t_deck(): union.Deck.publish() ``` This will create a live deck that where you can click the refresh button and see the deck update until the task succeeds. ### Union Deck Succeed Video ๐Ÿ“บ [Watch on YouTube](https://www.youtube.com/watch?v=LJaBP0mdFeE) ### Union Deck Fail Video ๐Ÿ“บ [Watch on YouTube](https://www.youtube.com/watch?v=xaBF6Jlzjq0) === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/remote-management === # UnionRemote The `UnionRemote` Python API supports functionality similar to that of the Union CLI, enabling you to manage Union.ai workflows, tasks, launch plans and artifacts from within your Python code. > [!NOTE] > The primary use case of `UnionRemote` is to automate the deployment of Union.ai entities. As such, it is intended for use within scripts *external* to actual Union.ai workflow and task code, for example CI/CD pipeline scripts. > > In other words: _Do not use `UnionRemote` within task code._ ## Creating a `UnionRemote` object Ensure that you have the Union SDK installed, import the `UnionRemote` class and create the object like this: ```python import union remote = union.UnionRemote() ``` By default, when created with a no-argument constructor, `UnionRemote` will use the prevailing configuration in the local environment to connect to Union.ai, that is, the same configuration as would be used by the Union CLI in that environment (see **Development cycle > UnionRemote > Union CLI configuration search path**). In the default case, as with the Union CLI, all operations will be applied to the default project, `flytesnacks` and default domain, `development`. Alternatively, you can initialize `UnionRemote` by explicitly specifying a `flytekit.configuration.Config` object with connection information to a Union.ai instance, a project, and a domain. Additionall, the constructor supports specifying a file upload location (equivalent to a default raw data prefix): ```python import union from flytekit.configuration import Config remote = union.UnionRemote( config=Config.for_endpoint(endpoint="union.example.com"), default_project="my-project", default_domain="my-domain", data_upload_location="://my-bucket/my-prefix", ) ``` Here we use the `Config.for_endpoint` method to specify the URL to connect to. There are other ways to configure the `Config` object. In general, you have all the same options as you would when specifying a connection for the Union CLI using a `config.yaml` file. ### Authenticating using a client secret In some cases, you may be running a script with `UnionRemote` in a CI/CD pipeline or via SSH, where you don't have access to a browser for the default authentication flow. In such scenarios, you can use the **Development cycle > Authentication > 3. ClientSecret (Best for CI/CD and Automation)** authentication method to establish a connection to Union.ai. After **Development cycle > Managing API keys**, you can initialize `UnionRemote` as follows: ```python import union from flytekit.configuration import Config, PlatformConfig remote = union.UnionRemote( config=Config( platform=PlatformConfig( endpoint="union.example.com", insecure=False, client_id="", # this is the api-key name client_credentials_secret="", # this is the api-key auth_mode="client_credentials", ) ), ) ``` For details see **Development cycle > UnionRemote > the API docs for `flytekit.configuration.Config`** ## Subpages - **Development cycle > UnionRemote > UnionRemote examples** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/development-cycle/remote-management/remote-examples === # UnionRemote examples ## Registering and running a workflow In the following example we register and run a workflow and retrieve its output: ```shell โ”œโ”€โ”€ remote.py โ””โ”€โ”€ workflow โ”œโ”€โ”€ __init__.py โ””โ”€โ”€ example.py ``` The workflow code that will be registered and run on Union.ai resides in the `workflow` directory and consists of an empty `__init__.py` file and the workflow and task code in `example.py`: ```python import os import union @union.task() def create_file(message: str) -> union.FlyteFile: with open("data.txt", "w") as f: f.write(message) return union.FlyteFile(path="data.txt") @union.workflow def my_workflow(message: str) -> union.FlyteFile: f = create_file(message) return f ``` The file `remote.py` contains the `UnionRemote` logic. It is not part of the workflow code, and is meant to be run on your local machine. ```python import union from workflow.example import my_workflow def run_workflow(): remote = union.UnionRemote() remote.fast_register_workflow(entity=my_workflow) execution = remote.execute( entity=my_workflow, inputs={"message": "Hello, world!"}, wait=True) output = execution.outputs["o0"] print(output) with open(output, "r") as f: read_lines = f.readlines() print(read_lines) ``` The `my_workflow` workflow and the `create_file` task is registered and run. Once the workflow completes, the output is passed back to the `run_workflow` function and printed out. The output is also be available via the UI, in the **Outputs** tab of the `create_file` task details view: ![Outputs](../../../_static/images/user-guide/development-cycle/union-remote/outputs.png) The steps above demonstrates the simplest way of registering and running a workflow with `UnionRemote`. For more options and details see **Union SDK > Packages > union**. ## Fetching outputs By default, `UnionRemote.execute` is non-blocking, but you can also pass in `wait=True` to make it synchronously wait for the task or workflow to complete, as we did above. You can print out the Union.ai console URL corresponding to your execution with: ```python print(f"Execution url: {remote.generate_console_url(execution)}") ``` And you can synchronize the state of the execution object with the remote state with the `sync()` method: ```python synced_execution = remote.sync(execution) print(synced_execution.inputs) # print out the inputs ``` You can also wait for the execution after youโ€™ve launched it and access the outputs: ```shell completed_execution = remote.wait(execution) print(completed_execution.outputs) # print out the outputs ``` ## Terminating all running executions for a workflow This example shows how to terminate all running executions in a given workflow name. ```python import union from dataclasses import dataclass import json from flytekit.configuration import Config from flytekit.models.core.execution import NodeExecutionPhase @dataclass class Execution: name: str link: str SOME_LARGE_LIMIT = 5000 PHASE = NodeExecutionPhase.RUNNING WF_NAME = "your_workflow_name" EXECUTIONS_TO_IGNORE = ["some_execution_name_to_ignore"] PROJECT = "your_project" DOMAIN = "production" ENDPOINT = "union.example.com" remote = union.UnionRemote( config=Config.for_endpoint(endpoint=ENDPOINT), default_project=PROJECT, default_domain=DOMAIN, ) executions_of_interest = [] executions = remote.recent_executions(limit=SOME_LARGE_LIMIT) for e in executions: if e.closure.phase == PHASE: if e.spec.launch_plan.name == WF_NAME: if e.id.name not in EXECUTIONS_TO_IGNORE: execution_on_interest = Execution(name=e.id.name, link=f"https://{ENDPOINT}/console/projects/{PROJECT}/domains/{DOMAIN}/executions/{e.id.name}") executions_of_interest.append(execution_on_interest) remote.terminate(e, cause="Terminated manually via script.") with open('terminated_executions.json', 'w') as f: json.dump([{'name': e.name, 'link': e.link} for e in executions_of_interest], f, indent=2) print(f"Terminated {len(executions_of_interest)} executions.") ``` ## Rerunning all failed executions of a workflow This example shows how to identify all failed executions from a given workflow since a certain time, and re-run them with the same inputs and a pinned workflow version. ```python import datetime import pytz import union from flytekit.models.core.execution import NodeExecutionPhase SOME_LARGE_LIMIT = 5000 WF_NAME = "your_workflow_name" PROJECT = "your_project" DOMAIN = "production" ENDPOINT = "union.example.com" VERSION = "your_target_workflow_version" remote = union.UnionRemote( config=Config.for_endpoint(endpoint=ENDPOINT), default_project=PROJECT, default_domain=DOMAIN, ) executions = remote.recent_executions(limit=SOME_LARGE_LIMIT) failures = [ NodeExecutionPhase.FAILED, NodeExecutionPhase.ABORTED, NodeExecutionPhase.FAILING, ] # time of the last successful execution date = datetime.datetime(2024, 10, 30, tzinfo=pytz.UTC) # filter executions by name filtered = [execution for execution in executions if execution.spec.launch_plan.name == WF_NAME] # filter executions by phase failed = [execution for execution in filtered if execution.closure.phase in failures] # filter executions by time windowed = [execution for execution in failed if execution.closure.started_at > date] # get inputs for each execution inputs = [remote.sync(execution).inputs for execution in windowed] # get new workflow version entity workflow = remote.fetch_workflow(name=WF_NAME, version=VERSION) # execute new workflow for each failed previous execution [remote.execute(workflow, inputs=X) for X in inputs] ``` ## Filtering for executions using a `Filter` This example shows how to use a `Filter` to only query for the executions you want. ```python from flytekit.models import filters import union WF_NAME = "your_workflow_name" LP_NAME = "your_launchplan_name" PROJECT = "your_project" DOMAIN = "production" ENDPOINT = "union.example.com" remote = union.UnionRemote.for_endpoint(ENDPOINT) # Only query executions from your project project_filter = filters.Filter.from_python_std(f"eq(workflow.name,{WF_NAME})") project_executions = remote.recent_executions(project=PROJECT, domain=DOMAIN, filters=[project_filter]) # Query for the latest execution that succeeded and was between 8 and 16 minutes latest_success = remote.recent_executions( limit=1, filters=[ filters.Equal("launch_plan.name", LP_NAME), filters.Equal("phase", "SUCCEEDED"), filters.GreaterThan("duration", 8 * 60), filters.LessThan("duration", 16 * 60), ], ) ``` ## Launch task via UnionRemote with a new version ```python import union from flytekit.remote import FlyteRemote from flytekit.configuration import Config, SerializationSettings # UnionRemote object is the main entrypoint to API remote = union.UnionRemote( config=Config.for_endpoint(endpoint="flyte.example.net"), default_project="flytesnacks", default_domain="development", ) # Get Task task = remote.fetch_task(name="workflows.example.generate_normal_df", version="v1") task = remote.register_task( entity=flyte_task, serialization_settings=SerializationSettings(image_config=None), version="v2", ) # Run Task execution = remote.execute( task, inputs={"n": 200, "mean": 0.0, "sigma": 1.0}, execution_name="task-execution", wait=True ) # Or use execution_name_prefix to avoid repeated execution names execution = remote.execute( task, inputs={"n": 200, "mean": 0.0, "sigma": 1.0}, execution_name_prefix="flyte", wait=True ) # Inspecting execution # The 'inputs' and 'outputs' correspond to the task execution. input_keys = execution.inputs.keys() output_keys = execution.outputs.keys() ``` ## Launch workflow via UnionRemote Workflows can be executed with `UnionRemote` because under the hood it fetches and triggers a default launch plan. ```python import union from flytekit.configuration import Config # UnionRemote object is the main entrypoint to API remote = union.UnionRemote( config=Config.for_endpoint(endpoint="flyte.example.net"), default_project="flytesnacks", default_domain="development", ) # Fetch workflow workflow = remote.fetch_workflow(name="workflows.example.wf", version="v1") # Execute execution = remote.execute( workflow, inputs={"mean": 1}, execution_name="workflow-execution", wait=True ) # Or use execution_name_prefix to avoid repeated execution names execution = remote.execute( workflow, inputs={"mean": 1}, execution_name_prefix="flyte", wait=True ) ``` ## Launch launchplan via UnionRemote A launch plan can be launched via UnionRemote programmatically. ```python import union from flytekit.configuration import Config # UnionRemote object is the main entrypoint to API remote = union.UnionRemote( config=Config.for_endpoint(endpoint="flyte.example.net"), default_project="flytesnacks", default_domain="development", ) # Fetch launch plan lp = remote.fetch_launch_plan( name="workflows.example.wf", version="v1", project="flytesnacks", domain="development" ) # Execute execution = remote.execute( lp, inputs={"mean": 1}, execution_name="lp-execution", wait=True ) # Or use execution_name_prefix to avoid repeated execution names execution = remote.execute( lp, inputs={"mean": 1}, execution_name_prefix="flyte", wait=True ) ``` ## Inspecting executions With `UnionRemote`, you can fetch the inputs and outputs of executions and inspect them. ```python import union from flytekit.configuration import Config # UnionRemote object is the main entrypoint to API remote = union.UnionRemote( config=Config.for_endpoint(endpoint="flyte.example.net"), default_project="flytesnacks", default_domain="development", ) execution = remote.fetch_execution( name="fb22e306a0d91e1c6000", project="flytesnacks", domain="development" ) input_keys = execution.inputs.keys() output_keys = execution.outputs.keys() # The inputs and outputs correspond to the top-level execution or the workflow itself. # To fetch a specific output, say, a model file: model_file = execution.outputs["model_file"] with open(model_file) as f: ... # You can use UnionRemote.sync() to sync the entity object's state with the remote state during the execution run. synced_execution = remote.sync(execution, sync_nodes=True) node_keys = synced_execution.node_executions.keys() # node_executions will fetch all the underlying node executions recursively. # To fetch output of a specific node execution: node_execution_output = synced_execution.node_executions["n1"].outputs["model_file"] === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output === # Data input/output This section covers how to manage data input and output in Union.ai. Union.ai also supports all the [Data input/output features of Flyte](https://docs-builder.pages.dev/docs/flyte/user-guide/data-input-output/). | Section | Description | |---------------------------------------------------|----------------------------------------------------| | **Data input/output > FlyteFile and FlyteDirectory** | Use `FlyteFile` to easily pass files across tasks. | | **Data input/output > FlyteFile and FlyteDirectory** | Use `FlyteDirectory` to easily pass directories across tasks. | | **Data input/output > Downloading with FlyteFile and FlyteDirectory** | Details on how files and directories or downloaded with `FlyteFile`. | | **Data input/output > StructuredDataset** | Details on how `StructuredDataset`is used as a general dataframe type. | | **Data input/output > Dataclass** | Details on how to uses dataclasses across tasks. | | **Data input/output > Pydantic BaseModel** | Details on how to use pydantic models across tasks. | | **Data input/output > Accessing attributes** | Details on how to directly access attributes on output promises for | | **Data input/output > Enum type** | Details on how use Enums across tasks. | | **Data input/output > Pickle type** | Details on how use pickled objects across tasks for generalized typ | | **Data input/output > PyTorch type** | Details on how use torch tensors and models across tasks. | | **Data input/output > TensorFlow types** | Details on how use tensorflow tensors and models across tasks. | | **Data input/output > Accelerated datasets** | Upload your data once and access it from any task. | ## Subpages - **Data input/output > FlyteFile and FlyteDirectory** - **Data input/output > Downloading with FlyteFile and FlyteDirectory** - **Data input/output > Task input and output** - **Data input/output > Accelerated datasets** - **Data input/output > Accessing attributes** - **Data input/output > Dataclass** - **Data input/output > Enum type** - **Data input/output > Pickle type** - **Data input/output > Pydantic BaseModel** - **Data input/output > PyTorch type** - **Data input/output > StructuredDataset** - **Data input/output > TensorFlow types** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/flyte-file-and-flyte-directory === # FlyteFile and FlyteDirectory In Union.ai, each task runs in its own container. This means that a file or directory created locally in one task will not automatically be available in other tasks. The natural way to solve this problem is for the source task to upload the file or directory to a common location (like the Union.ai object store) and then pass a reference to that location to the destination task, which then downloads or streams the data. Since this is such a common use case, the Union SDK provides the **Flytekit SDK > Packages > flytekit.types.file** and **Data input/output > FlyteFile and FlyteDirectory > `FlyteDirectory`** classes, which automate this process. ## How the classes work The classes work by wrapping a file or directory location path and, if necessary, maintaining the persistence of the referenced file or directory across task containers. When you return a `FlyteFile` (or `FlyteDirectory`) object from a task, Union.ai checks to see if the underlying file or directory is local to the task container or if it already exists in a remote location. If it is local to the source container, then Union.ai automatically uploads it to an object store so that it is not lost when the task container is discarded on task completion. If the file or directory is already remote, then no upload is performed. When the `FlyteFile` (or `FlyteDirectory`) is passed into the next task, the location of the source file (or directory) is available within the object and it can be downloaded or streamed. ## Local examples > [!NOTE] Local means local to the container > The terms _local file_ and _local_directory_ in this section refer to a file or directory local to the container running a task in Union.ai. > They do not refer to a file or directory on your local machine. ### Local file example Let's say you have a local file in the container running `task_1` that you want to make accessible in the next task, `task_2`. To do this, you create a `FlyteFile` object using the local path of the file, and then pass the `FlyteFile` object as part of your workflow, like this: ```python @union.task def task_1() -> union.FlyteFile: local_path = os.path.join(current_context().working_directory, "data.txt") with open(local_path, mode="w") as f: f.write("Here is some sample data.") return union.FlyteFile(path=local_path) @union.task def task_2(ff: union.FlyteFile): with ff.open(mode="r") as f file_contents = f.read() @union.workflow def wf(): ff = task_1() task_2(ff=ff) ``` Union.ai handles the passing of the `FlyteFile` `ff` in the workflow `wf` from `task_1` to `task_2`: * The `FlyteFile` object is initialized with the path (local to the `task_1` container) of the file you wish to share. * When the `FlyteFile` is passed out of `task_1`, Union.ai uploads the local file to a unique location in the Union.ai object store. A randomly generated, universally unique location is used to ensure that subsequent uploads of other files never overwrite each other. * The object store location is used to initialize the URI attribute of a Flyte `Blob` object. Note that Flyte objects are not Python objects. They exist at the workflow level and are used to pass data between task containers. For more details, see **Flyteidl > flyteidl/core/types.proto**. * The `Blob` object is passed to `task_2`. * Because the type of the input parameter of `task_2` is `FlyteFile`, Union.ai converts the `Blob` back into a `FlyteFile` and sets the `remote_source` attribute of that `FlyteFile` to the URI of the `Blob` object. * Inside `task_2` you can now perform a **Data input/output > FlyteFile and FlyteDirectory > `FlyteFile.open()`** and read the file contents. ### Local directory example Below is an equivalent local example for `FlyteDirectory`. The process of passing the `FlyteDirectory` between tasks is essentially identical to the `FlyteFile` example above. ```python @union.task def task1() -> union.FlyteDirectory: # Create new local directory p = os.path.join(current_context().working_directory, "my_new_directory") os.makedirs(p) # Create and write to two files with open(os.path.join(p, "file_1.txt"), 'w') as file1: file1.write("This is file 1.") with open(os.path.join(p, "file_2.txt"), 'w') as file2: file2.write("This is file 2.") return union.FlyteDirectory(p) @union.task def task2(fd: union.FlyteDirectory): # Get a list of the directory contents using os to return strings items = os.listdir(fd) print(type(items[0])) # Get a list of the directory contents using FlyteDirectory to return FlyteFiles files = union.FlyteDirectory.listdir(fd) print(type(files[0])) with open(files[0], mode="r") as f: d = f.read() print(f"The first line in the first file is: {d}") @union.workflow def workflow(): fd = task1() task2(fd=fd) ``` ## Changing the data upload location > [!NOTE] Upload location > With Union.ai Serverless, the remote location to which `FlyteFile` and `FlyteDirectory` upload container-local > files is always a randomly generated (universally unique) location in Union.ai's internal object store. It cannot be changed. > > With Union.ai BYOC, the upload location is configurable. By default, Union.ai uploads local files or directories to the default **raw data store** (Union.ai's dedicated internal object store). However, you can change the upload location by setting the raw data prefix to your own bucket or specifying the `remote_path` for a `FlyteFile` or `FlyteDirectory`. > [!NOTE] Setting up your own object store bucket > For details on how to set up your own object store bucket, consult the direction for your cloud provider: > > * **Enabling AWS resources > Enabling AWS S3** > * **Enabling GCP resources > Enabling Google Cloud Storage** > * **Enabling Azure resources > Enabling Azure Blob Storage** ### Changing the raw data prefix If you would like files or directories to be uploaded to your own bucket, you can specify the AWS, GCS, or Azure bucket in the **raw data prefix** parameter at the workflow level on registration or per execution on the command line or in the UI. This setting can be done at the workflow level on registration or per execution on the command line or in the UI. Union.ai will create a directory with a unique, random name in your bucket for each `FlyteFile` or `FlyteDirectory` data write to guarantee that you never overwrite your data. ### Specifying `remote_path` for a `FlyteFile` or `FlyteDirectory` If you specify the `remote_path` when initializing your `FlyteFile` (or `FlyteDirectory`), the underlying data is written to that precise location with no randomization. > [!NOTE] Using remote_path will overwrite data > If you set `remote_path` to a static string, subsequent runs of the same task will overwrite the file. > If you want to use a dynamically generated path, you will have to generate it yourself. ## Remote examples ### Remote file example In the example above, we started with a local file. To preserve that file across the task boundary, Union.ai uploaded it to the Union.ai object store before passing it to the next task. You can also _start with a remote file_, simply by initializing the `FlyteFile` object with a URI pointing to a remote source. For example: ```python @union.task def task_1() -> union.FlyteFile: remote_path = "https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv" return union.FlyteFile(path=remote_path) ``` In this case, no uploading is needed because the source file is already in a remote location. When the object is passed out of the task, it is converted into a `Blob` with the remote path as the URI. After the `FlyteFile` is passed to the next task, you can call `FlyteFile.open()` on it, just as before. If you don't intend on passing the `FlyteFile` to the next task, and rather intend to open the contents of the remote file within the task, you can use `from_source`. ```python @union.task def load_json(): uri = "gs://my-bucket/my-directory/example.json" my_json = FlyteFile.from_source(uri) # Load the JSON file into a dictionary and print it with open(my_json, "r") as json_file: data = json.load(json_file) print(data) ``` When initializing a `FlyteFile` with a remote file location, all URI schemes supported by `fsspec` are supported, including `http`, `https`(Web), `gs` (Google Cloud Storage), `s3` (AWS S3), `abfs`, and `abfss` (Azure Blob Filesystem). ### Remote directory example Below is an equivalent remote example for `FlyteDirectory`. The process of passing the `FlyteDirectory` between tasks is essentially identical to the `FlyteFile` example above. ```python @union.task def task1() -> union.FlyteDirectory: p = "https://people.sc.fsu.edu/~jburkardt/data/csv/" return union.FlyteDirectory(p) @union.task def task2(fd: union.FlyteDirectory): # Get a list of the directory contents and display the first csv files = union.FlyteDirectory.listdir(fd) with open(files[0], mode="r") as f: d = f.read() print(f"The first csv is: \n{d}") @union.workflow def workflow(): fd = task1() task2(fd=fd) ``` ## Streaming In the above examples, we showed how to access the contents of `FlyteFile` by calling `FlyteFile.open()`. The object returned by `FlyteFile.open()` is a stream. In the above examples, the files were small, so a simple `read()` was used. But for large files, you can iterate through the contents of the stream: ```python @union.task def task_1() -> union.FlyteFile: remote_path = "https://sample-videos.com/csv/Sample-Spreadsheet-100000-rows.csv" return union.FlyteFile(path=remote_path) @union.task def task_2(ff: union.FlyteFile): with ff.open(mode="r") as f for row in f: do_something(row) ``` ## Downloading Alternative, you can download the contents of a `FlyteFile` object to a local file in the task container. There are two ways to do this: **implicitly** and **explicitly**. ### Implicit downloading The source file of a `FlyteFile` object is downloaded to the local container file system automatically whenever a function is called that takes the `FlyteFile` object and then calls `FlyteFile`'s `__fspath__()` method. `FlyteFile` implements the `os.PathLike` interface and therefore the `__fspath__()` method. `FlyteFile`'s implementation of `__fspath__()` performs a download of the source file to the local container storage and returns the path to that local file. This enables many common file-related operations in Python to be performed on the `FlyteFile` object. The most prominent example of such an operation is calling Python's built-in `open()` method with a `FlyteFile`: ```python @union.task def task_2(ff: union.FlyteFile): with open(ff, mode="r") as f file_contents= f.read() ``` > [!NOTE] open() vs ff.open() > Note the difference between > > `ff.open(mode="r")` > > and > > `open(ff, mode="r")` > > The former calls the `FlyteFile.open()` method and returns an iterator without downloading the file. > The latter calls the built-in Python function `open()`, downloads the specified `FlyteFile` to the local container file system, > and returns a handle to that file. > > Many other Python file operations (essentially, any that accept an `os.PathLike` object) can also be performed on a `FlyteFile` > object and result in an automatic download. > > See **Data input/output > Downloading with FlyteFile and FlyteDirectory** for more information. ### Explicit downloading You can also explicitly download a `FlyteFile` to the local container file system by calling `FlyteFile.download()`: ```python @union.task def task_2(ff: union.FlyteFile): local_path = ff.download() ``` This method is typically used when you want to download the file without immediately reading it. ## Typed aliases The **Union SDK** defines some aliases of `FlyteFile` with specific type annotations. Specifically, `FlyteFile` has the following **Flytekit SDK > Packages > flytekit.types.file**: * `HDF5EncodedFile` * `HTMLPage` * `JoblibSerializedFile` * `JPEGImageFile` * `PDFFile` * `PNGImageFile` * `PythonPickledFile` * `PythonNotebook` * `SVGImageFile` Similarly, `FlyteDirectory` has the following **Flytekit SDK > Packages > flytekit.types.directory**: * `TensorboardLogs` * `TFRecordsDirectory` These aliases can optionally be used when handling a file or directory of the specified type, although the object itself will still be a `FlyteFile` or `FlyteDirectory`. The aliased versions of the classes are syntactic markers that enforce agreement between type annotations in the signatures of task functions, but they do not perform any checks on the actual contents of the file. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/downloading-with-ff-and-fd === # Downloading with FlyteFile and FlyteDirectory The basic idea behind `FlyteFile` and `FlyteDirectory` is that they represent files and directories in remote storage. When you work with these objects in your tasks, you are working with references to the remote files and directories. Of course, at some point you will need to access the actual contents of these files and directories, which means that they have to be downloaded to the local file system of the task container. The actual files and directories of a `FlyteFile` or `FlyteDirectory` are downloaded to the local file system of the task container in two ways: * Explicitly, through a call to the `download` method. * Implicitly, through automatic downloading. This occurs when an external function is called on the `FlyteFile` or `FlyteDirectory` that itself calls the `__fspath__` method. To write efficient and performant task and workflow code, it is particularly important to have a solid understanding of when exactly downloading occurs. Let's look at some examples showing when the content `FlyteFile` objects and `FlyteDirectory` objects are downloaded to the local task container file system. ## FlyteFile **Calling `download` on a FlyteFile** ```python @union.task def my_task(ff: FlyteFile): print(os.path.isfile(ff.path)) # This will print False as nothing has been downloaded ff.download() print(os.path.isfile(ff.path)) # This will print True as the FlyteFile was downloaded ``` Note that we use `ff.path` which is of type `typing.Union[str, os.PathLike]` rather than using `ff` in `os.path.isfile` directly. In the next example, we will see that using `os.path.isfile(ff)` invokes `__fspath__` which downloads the file. **Implicit downloading by `__fspath__`** In order to make use of some functions like `os.path.isfile` that you may be used to using with regular file paths, `FlyteFile` implements a `__fspath__` method that downloads the remote contents to the `path` of `FlyteFile` local to the container. ```python @union.task def my_task(ff: FlyteFile): print(os.path.isfile(ff.path)) # This will print False as nothing has been downloaded print(os.path.isfile(ff)) # This will print True as os.path.isfile(ff) downloads via __fspath__ print(os.path.isfile(ff.path)) # This will again print True as the file was downloaded ``` It is important to be aware of any operations on your `FlyteFile` that might call `__fspath__` and result in downloading. Some examples include, calling `open(ff, mode="r")` directly on a `FlyteFile` (rather than on the `path` attribute) to get the contents of the path, or similarly calling `shutil.copy` or `pathlib.Path` directly on a `FlyteFile`. ## FlyteDirectory **Calling `download` on a FlyteDirectory** ```python @union.task def my_task(fd: FlyteDirectory): print(os.listdir(fd.path)) # This will print nothing as the directory has not been downloaded fd.download() print(os.listdir(fd.path)) # This will print the files present in the directory as it has been downloaded ``` Similar to how the `path` argument was used above for the `FlyteFile`, note that we use `fd.path` which is of type `typing.Union[str, os.PathLike]` rather than using `fd` in `os.listdir` directly. Again, we will see that this is because of the invocation of `__fspath__` when `os.listdir(fd)` is called. **Implicit downloading by `__fspath__`** In order to make use of some functions like `os.listdir` that you may be used to using with directories, `FlyteDirectory` implements a `__fspath__` method that downloads the remote contents to the `path` of `FlyteDirectory` local to the container. ```python @union.task def my_task(fd: FlyteDirectory): print(os.listdir(fd.path)) # This will print nothing as the directory has not been downloaded print(os.listdir(fd)) # This will print the files present in the directory as os.listdir(fd) downloads via __fspath__ print(os.listdir(fd.path)) # This will again print the files present in the directory as it has been downloaded ``` It is important to be aware of any operations on your `FlyteDirectory` that might call `__fspath__` and result in downloading. Some other examples include, calling `os.stat` directly on a `FlyteDirectory` (rather than on the `path` attribute) to get the status of the path, or similarly calling `os.path.isdir` on a `FlyteDirectory` to check if a directory exists. **Inspecting the contents of a directory without downloading using `crawl`** As we saw above, using `os.listdir` on a `FlyteDirectory` to view the contents in remote blob storage results in the contents being downloaded to the task container. If this should be avoided, the `crawl` method offers a means of inspecting the contents of the directory without calling `__fspath__` and therefore downloading the directory contents. ```python @union.task def task1() -> FlyteDirectory: p = os.path.join(current_context().working_directory, "my_new_directory") os.makedirs(p) # Create and write to two files with open(os.path.join(p, "file_1.txt"), 'w') as file1: file1.write("This is file 1.") with open(os.path.join(p, "file_2.txt"), 'w') as file2: file2.write("This is file 2.") return FlyteDirectory(p) @union.task def task2(fd: FlyteDirectory): print(os.listdir(fd.path)) # This will print nothing as the directory has not been downloaded print(list(fd.crawl())) # This will print the files present in the remote blob storage # e.g. [('s3://union-contoso/ke/fe503def6ebe04fa7bba-n0-0/160e7266dcaffe79df85489771458d80', 'file_1.txt'), ('s3://union-contoso/ke/fe503def6ebe04fa7bba-n0-0/160e7266dcaffe79df85489771458d80', 'file_2.txt')] print(list(fd.crawl(detail=True))) # This will print the files present in the remote blob storage with details including type, the time it was created, and more # e.g. [('s3://union-contoso/ke/fe503def6ebe04fa7bba-n0-0/160e7266dcaffe79df85489771458d80', {'file_1.txt': {'Key': 'union-contoso/ke/fe503def6ebe04fa7bba-n0-0/160e7266dcaffe79df85489771458d80/file_1.txt', 'LastModified': datetime.datetime(2024, 7, 9, 16, 16, 21, tzinfo=tzlocal()), 'ETag': '"cfb2a3740155c041d2c3e13ad1d66644"', 'Size': 15, 'StorageClass': 'STANDARD', 'type': 'file', 'size': 15, 'name': 'union-contoso/ke/fe503def6ebe04fa7bba-n0-0/160e7266dcaffe79df85489771458d80/file_1.txt'}}), ('s3://union-contoso/ke/fe503def6ebe04fa7bba-n0-0/160e7266dcaffe79df85489771458d80', {'file_2.txt': {'Key': 'union-contoso/ke/fe503def6ebe04fa7bba-n0-0/160e7266dcaffe79df85489771458d80/file_2.txt', 'LastModified': datetime.datetime(2024, 7, 9, 16, 16, 21, tzinfo=tzlocal()), 'ETag': '"500d703f270d4bc034e159480c83d329"', 'Size': 15, 'StorageClass': 'STANDARD', 'type': 'file', 'size': 15, 'name': 'union-contoso/ke/fe503def6ebe04fa7bba-n0-0/160e7266dcaffe79df85489771458d80/file_2.txt'}})] print(os.listdir(fd.path)) # This will again print nothing as the directory has not been downloaded ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/task-input-and-output === # Task input and output The Union.ai workflow engine automatically manages the passing of data from task to task, and to the workflow output. This mechanism relies on enforcing strong typing of task function parameters and return values. This enables the workflow engine to efficiently marshall and unmarshall values from one task container to the next. The actual data is temporarily stored in Union.ai's internal object store within your data plane (AWS S3, Google Cloud Storage, or Azure Blob Storage, depending on your cloud provider). ## Metadata and raw data Union.ai distinguishes between the metadata and raw data. Primitive values (`int`, `str`, etc.) are stored directly in the metadata store, while complex data objects (`pandas.DataFrame`, `FlyteFile`, etc.) are stored by reference, with the reference pointer in the metadata store and the actual data in the raw data store. ## Metadata store The metadata store is located in the dedicated Union.ai object store in your data plane. Depending on your cloud provider, this may be an AWS S3, Google Cloud Storage, or Azure Blob Storage bucket. This data is accessible to the control plane. It is used to run and manage workflows and is surfaced in the UI. ## Raw data store The raw data store is, by default, also located in the dedicated Union.ai object store in your data plane. However, this location can be overridden per workflow or per execution using the **raw data prefix** parameter. The data in the raw data store is not accessible to the control plane and will only be surfaced in the UI if your code explicitly does so (for example, in a Deck). For more details, see **Data handling**. ## Changing the raw data storage location There are a number of ways to change the raw data location: * When registering your workflow: * With [`uctl register`](), use the flag `--files.outputLocationPrefix`. * With [`union register`](), use the flag `--raw-data-prefix`. * At the execution level: * In the UI, set the **Raw output data config** parameter in the execution dialog. These options change the raw data location for **all large types** (`FlyteFile`, `FlyteDirectory`, `DataFrame`, any other large data object). If you are only concerned with controlling where raw data used by `FlyteFile` or `FlyteDirectory` is stored, you can **Data input/output > Task input and output > set the `remote_path` parameter** in your task code when initializing objects of those types. ### Setting up your own object store By default, when Union.ai marshalls values across tasks, it stores both metadata and raw data in its own dedicated object store bucket. While this bucket is located in your Union.ai BYOC data plane and is therefore under your control, it is part of the Union.ai implementation and should not be accessed or modified directly by your task code. When changing the default raw data location, the target should therefore be a bucket that you set up, separate from the Union.ai-implemented bucket. For information on setting up your own bucket and enabling access to it, see [Enabling AWS S3](../integrations/enabling-aws-resources/enabling-aws-s3), [Enabling Google Cloud Storage](../integrations/enabling-gcp-resources/enabling-google-cloud-storage), or [Enabling Azure Blob Storage](../integrations/enabling-azure-resources/enabling-azure-blob-storage), depending on your cloud provider. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/accelerated-datasets === # Accelerated datasets > [!NOTE] *Accelerated datasets* and *Accelerators* are entirely different things > Accelerated datasets is a Union.ai feature that enables quick access to large datasets from within a task. > An **Core concepts > Tasks > Task hardware environment > Accelerators**, on the other hand, is a specialized hardware device that is used to accelerate the execution of a task. > These concepts are entirely different and should not be confused. Many of the workflows that you may want to run in Union.ai will involve tasks that use large static assets such as reference genomes, training datasets, or pre-trained models. These assets are often stored in an object store and need to be downloaded to the task pod each time before the task can run. This can be a significant bottleneck, especially if the data must be loaded into memory to be randomly accessed and therefore cannot be streamed. To remedy this, Union.ai provides a way to preload large static assets into a shared object store that is mounted to all machine nodes in your cluster by default. This allows you to upload your data once and then access it from any task without needing to download it each time. Data items stored in this way are called *accelerated datasets*. > [!NOTE] Only on S3 > Currently, this feature is only available for AWS S3. ## How it works * Each customer has a dedicated S3 bucket where they can store their accelerated datasets. * The naming and set up of this bucket must be coordinated with the Union.ai team, in order that a suitable name is chosen. In general it will usually be something like `s3://union--persistent`. * You can upload any data you wish to this bucket. * The bucket will be automatically mounted into every node in your cluster. * To your task logic, it will appear to be a local directory in the task container. * To use it, initialize a `FlyteFile` object with the path to the data file and pass it into a task as an input. * Note that in order for the system to recognize the file as an accelerated dataset, it must be created as a `FlyteFile` and that `FLyteFile` must be passed *into* a task. If you try to access the file directly from the object store, it will not be recognized as an accelerated dataset and the data will not be found. ## Example usage Assuming that your organization is called `my-company` and the file you want to access is called `my_data.csv`, you would first need to upload the file to the persistent bucket. See [Upload a File to Your Amazon S3 Bucket](https://docs.aws.amazon.com/quickstarts/latest/s3backup/step-2-upload-file.html). The code to access the data looks like this: ```python import union @union.task def my_task(f: union.FlyteFile) -> int: with open(f, newline="\n") as input_file: data = input_file.read() # Do something with the data @union.workflow def my_wf() my_task(f=union.FlyteFile("s3://union-my-company-persistent/my_data.csv")) ``` Note that you do not have to invoke `FlyteFile.download()` because the file will already have been made available locally within the container. ## Considerations ### Caching While the persistent bucket appears to your task as a locally mounted volume, the data itself will not be resident in the local file system until after the first access. After the first access it will be cached locally. This fact should be taken into account when using this feature. ### Storage consumption Data cached during the use of accelerated datasets will consume local storage on the nodes in your cluster. This should be taken into account when selecting and sizing your cluster nodes. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/accessing-attributes === # Accessing attributes You can directly access attributes on output promises for lists, dictionaries, dataclasses, and combinations of these types in Union.ai. Note that while this functionality may appear to be the normal behavior of Python, code in `@workflow` functions is not actually Python, but rather a Python-like DSL that is compiled by Union.ai. Consequently, accessing attributes in this manner is, in fact, a specially implemented feature. This functionality facilitates the direct passing of output attributes within workflows, enhancing the convenience of working with complex data structures. To begin, import the required dependencies and define a common task for subsequent use: ```python from dataclasses import dataclass import union @union.task def print_message(message: str): print(message) return ``` ## List You can access an output list using index notation. > [!NOTE] > Union.ai currently does not support output promise access through list slicing. ```python @union.task def list_task() -> list[str]: return ["apple", "banana"] @union.workflow def list_wf(): items = list_task() first_item = items[0] print_message(message=first_item) ``` ## Dictionary Access the output dictionary by specifying the key. ```python @union.task def dict_task() -> dict[str, str]: return {"fruit": "banana"} @union.workflow def dict_wf(): fruit_dict = dict_task() print_message(message=fruit_dict["fruit"]) ``` ## Data class Directly access an attribute of a dataclass. ```python @dataclass class Fruit: name: str @union.task def dataclass_task() -> Fruit: return Fruit(name="banana") @union.workflow def dataclass_wf(): fruit_instance = dataclass_task() print_message(message=fruit_instance.name) ``` ## Complex type Combinations of list, dict and dataclass also work effectively. ```python @union.task def advance_task() -> (dict[str, list[str]], list[dict[str, str]], dict[str, Fruit]): return {"fruits": ["banana"]}, [{"fruit": "banana"}], {"fruit": Fruit(name="banana")} @union.task def print_list(fruits: list[str]): print(fruits) @union.task def print_dict(fruit_dict: dict[str, str]): print(fruit_dict) @union.workflow def advanced_workflow(): dictionary_list, list_dict, dict_dataclass = advance_task() print_message(message=dictionary_list["fruits"][0]) print_message(message=list_dict[0]["fruit"]) print_message(message=dict_dataclass["fruit"].name) print_list(fruits=dictionary_list["fruits"]) print_dict(fruit_dict=list_dict[0]) ``` You can run all the workflows locally as follows: ```python if __name__ == "__main__": list_wf() dict_wf() dataclass_wf() advanced_workflow() ``` ## Failure scenario The following workflow fails because it attempts to access indices and keys that are out of range: ```python from flytekit import WorkflowFailurePolicy @union.task def failed_task() -> (list[str], dict[str, str], Fruit): return ["apple", "banana"], {"fruit": "banana"}, Fruit(name="banana") @union.workflow( # The workflow remains unaffected if one of the nodes encounters an error, as long as other executable nodes are still available failure_policy=WorkflowFailurePolicy.FAIL_AFTER_EXECUTABLE_NODES_COMPLETE ) def failed_workflow(): fruits_list, fruit_dict, fruit_instance = failed_task() print_message(message=fruits_list[100]) # Accessing an index that doesn't exist print_message(message=fruit_dict["fruits"]) # Accessing a non-existent key print_message(message=fruit_instance.fruit) # Accessing a non-existent param ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/dataclass === # Dataclass When you've multiple values that you want to send across Union.ai entities, you can use a `dataclass`. To begin, import the necessary dependencies: ```python import os import tempfile from dataclasses import dataclass import pandas as pd import union from flytekit.types.structured import StructuredDataset ``` Build your custom image with ImageSpec: ```python image_spec = union.ImageSpec( registry="ghcr.io/flyteorg", packages=["pandas", "pyarrow"], ) ``` ## Python types We define a `dataclass` with `int`, `str` and `dict` as the data types. ```python @dataclass class Datum: x: int y: str z: dict[int, str] ``` You can send a `dataclass` between different tasks written in various languages, and input it through the Union.ai UI as raw JSON. > [!NOTE] > All variables in a data class should be **annotated with their type**. Failure to do will result in an error. Once declared, a dataclass can be returned as an output or accepted as an input. ```python @union.task(container_image=image_spec) def stringify(s: int) -> Datum: """ A dataclass return will be treated as a single complex JSON return. """ return Datum(x=s, y=str(s), z={s: str(s)}) @union.task(container_image=image_spec) def add(x: Datum, y: Datum) -> Datum: x.z.update(y.z) return Datum(x=x.x + y.x, y=x.y + y.y, z=x.z) ``` ## Union.ai types We also define a data class that accepts `StructuredDataset`, `FlyteFile` and `FlyteDirectory`. ```python @dataclass class UnionTypes: dataframe: StructuredDataset file: union.FlyteFile directory: union.FlyteDirectory @union.task(container_image=image_spec) def upload_data() -> UnionTypes: df = pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]}) temp_dir = tempfile.mkdtemp(prefix="union-") df.to_parquet(temp_dir + "/df.parquet") file_path = tempfile.NamedTemporaryFile(delete=False) file_path.write(b"Hello, World!") fs = UnionTypes( dataframe=StructuredDataset(dataframe=df), file=union.FlyteFile(file_path.name), directory=union.FlyteDirectory(temp_dir), ) return fs @union.task(container_image=image_spec) def download_data(res: UnionTypes): assert pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]}).equals(res.dataframe.open(pd.DataFrame).all()) f = open(res.file, "r") assert f.read() == "Hello, World!" assert os.listdir(res.directory) == ["df.parquet"] ``` A data class supports the usage of data associated with Python types, data classes, FlyteFile, FlyteDirectory and structured dataset. We define a workflow that calls the tasks created above. ```python @union.workflow def dataclass_wf(x: int, y: int) -> (Datum, FlyteTypes): o1 = add(x=stringify(s=x), y=stringify(s=y)) o2 = upload_data() download_data(res=o2) return o1, o2 ``` To trigger the above task that accepts a dataclass as an input with `union run`, you can provide a JSON file as an input: ```shell $ union run dataclass.py add --x dataclass_input.json --y dataclass_input.json ``` Here is another example of triggering a task that accepts a dataclass as an input with `union run`, you can provide a JSON file as an input: ```shell $ union run \ https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/dataclass.py \ add --x dataclass_input.json --y dataclass_input.json ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/enum === # Enum type At times, you might need to limit the acceptable values for inputs or outputs to a predefined set. This common requirement is usually met by using `Enum` types in programming languages. You can create a Python `Enum` type and utilize it as an input or output for a task. Union will automatically convert it and constrain the inputs and outputs to the predefined set of values. > [!NOTE] > Currently, only string values are supported as valid `Enum` values. > Union.ai assumes the first value in the list as the default, and `Enum` types cannot be optional. > Therefore, when defining `Enum`s, it's important to design them with the first value as a valid default. We define an `Enum` and a simple coffee maker workflow that accepts an order and brews coffee โ˜•๏ธ accordingly. The assumption is that the coffee maker only understands `Enum` inputs: ```python # coffee_maker.py from enum import Enum import union class Coffee(Enum): ESPRESSO = "espresso" AMERICANO = "americano" LATTE = "latte" CAPPUCCINO = "cappucccino" @union.task def take_order(coffee: str) -> Coffee: return Coffee(coffee) @union.task def prep_order(coffee_enum: Coffee) -> str: return f"Preparing {coffee_enum.value} ..." @union.workflow def coffee_maker(coffee: str) -> str: coffee_enum = take_order(coffee=coffee) return prep_order(coffee_enum=coffee_enum) # The workflow can also accept an enum value @union.workflow def coffee_maker_enum(coffee_enum: Coffee) -> str: return prep_order(coffee_enum=coffee_enum) ``` You can specify value for the parameter `coffee_enum` on run: ```shell $ union run coffee_maker.py coffee_maker_enum --coffee_enum="latte" ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/pickle === # Pickle type Union.ai enforces type safety by utilizing type information for compiling tasks and workflows, enabling various features such as static analysis and conditional branching. However, we also strive to offer flexibility to end-users, so they don't have to invest heavily in understanding their data structures upfront before experiencing the value Union.ai has to offer. Union.ai supports the `FlytePickle` transformer, which converts any unrecognized type hint into `FlytePickle`, enabling the serialization/deserialization of Python values to/from a pickle file. > [!NOTE] > Pickle can only be used to send objects between the exact same Python version. > For optimal performance, it's advisable to either employ Python types that are supported by Union.ai > or register a custom transformer, as using pickle types can result in lower performance. This example demonstrates how you can utilize custom objects without registering a transformer. ```python import union ``` `Superhero` represents a user-defined complex type that can be serialized to a pickle file by Union and transferred between tasks as both input and output data. > [!NOTE] > Alternatively, you can **Data input/output > Dataclass** for improved performance. > We have used a simple object here for demonstration purposes. ```python class Superhero: def __init__(self, name, power): self.name = name self.power = power @union.task def welcome_superhero(name: str, power: str) -> Superhero: return Superhero(name, power) @union.task def greet_superhero(superhero: Superhero) -> str: return f"๐Ÿ‘‹ Hello {superhero.name}! Your superpower is {superhero.power}." @union.workflow def superhero_wf(name: str = "Thor", power: str = "Flight") -> str: superhero = welcome_superhero(name=name, power=power) return greet_superhero(superhero=superhero) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/pydantic === # Pydantic BaseModel > [!NOTE] > You can put Dataclass and UnionTypes (FlyteFile, FlyteDirectory, FlyteSchema, and StructuredDataset) in a pydantic BaseModel. To begin, import the necessary dependencies: ```python import os import tempfile import pandas as pd from union from union.types.structured import StructuredDataset from pydantic import BaseModel ``` Build your custom image with ImageSpec: ```python image_spec = union.ImageSpec( registry="ghcr.io/flyteorg", packages=["pandas", "pyarrow", "pydantic"], ) ``` ## Python types We define a `pydantic basemodel` with `int`, `str` and `dict` as the data types. ```python class Datum(BaseModel): x: int y: str z: dict[int, str] ``` You can send a `pydantic basemodel` between different tasks written in various languages, and input it through the Union.ai console as raw JSON. > [!NOTE] > All variables in a data class should be **annotated with their type**. Failure > to do will result in an error. Once declared, a dataclass can be returned as an output or accepted as an input. ```python @union.task(container_image=image_spec) def stringify(s: int) -> Datum: """ A Pydantic model return will be treated as a single complex JSON return. """ return Datum(x=s, y=str(s), z={s: str(s)}) @union.task(container_image=image_spec) def add(x: Datum, y: Datum) -> Datum: x.z.update(y.z) return Datum(x=x.x + y.x, y=x.y + y.y, z=x.z) ``` ## Union.ai types We also define a data class that accepts `StructuredDataset`, `FlyteFile` and `FlyteDirectory`. ```python class UnionTypes(BaseModel): dataframe: StructuredDataset file: union.FlyteFile directory: union.FlyteDirectory @union.task(container_image=image_spec) def upload_data() -> FlyteTypes: df = pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]}) temp_dir = tempfile.mkdtemp(prefix="flyte-") df.to_parquet(os.path.join(temp_dir, "df.parquet")) file_path = tempfile.NamedTemporaryFile(delete=False) file_path.write(b"Hello, World!") file_path.close() fs = FlyteTypes( dataframe=StructuredDataset(dataframe=df), file=union.FlyteFile(file_path.name), directory=union.FlyteDirectory(temp_dir), ) return fs @union.task(container_image=image_spec) def download_data(res: FlyteTypes): expected_df = pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]}) actual_df = res.dataframe.open(pd.DataFrame).all() assert expected_df.equals(actual_df), "DataFrames do not match!" with open(res.file, "r") as f: assert f.read() == "Hello, World!", "File contents do not match!" assert os.listdir(res.directory) == ["df.parquet"], "Directory contents do not match!" ``` A data class supports the usage of data associated with Python types, data classes, FlyteFile, FlyteDirectory and StructuredDataset. We define a workflow that calls the tasks created above. ```python @union.workflow def basemodel_wf(x: int, y: int) -> tuple[Datum, UnionTypes]: o1 = add(x=stringify(s=x), y=stringify(s=y)) o2 = upload_data() download_data(res=o2) return o1, o2 ``` To trigger a task that accepts a dataclass as an input with `union run`, you can provide a JSON file as an input: ``` $ union run dataclass.py basemodel_wf --x 1 --y 2 ``` To trigger a task that accepts a dataclass as an input with `union run`, you can provide a JSON file as an input: ``` union run \ https://raw.githubusercontent.com/flyteorg/flytesnacks/b71e01d45037cea883883f33d8d93f258b9a5023/examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py \ basemodel_wf --x 1 --y 2 ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/pytorch === # PyTorch type Union.ai advocates for the use of strongly-typed data to simplify the development of robust and testable pipelines. In addition to its application in data engineering, Union.ai is primarily used for machine learning. To streamline the communication between Union.ai tasks, particularly when dealing with tensors and models, we have introduced support for PyTorch types. ## Tensors and modules At times, you may find the need to pass tensors and modules (models) within your workflow. Without native support for PyTorch tensors and modules, Union relies on [pickle](https://docs-builder.pages.dev/docs/byoc/user-guide/data-input-output/pickle/) for serializing and deserializing these entities, as well as any unknown types. However, this approach isn't the most efficient. As a result, we've integrated PyTorch's serialization and deserialization support into the Union.ai type system. ```python @union.task def generate_tensor_2d() -> torch.Tensor: return torch.tensor([[1.0, -1.0, 2], [1.0, -1.0, 9], [0, 7.0, 3]]) @union.task def reshape_tensor(tensor: torch.Tensor) -> torch.Tensor: # convert 2D to 3D tensor.unsqueeze_(-1) return tensor.expand(3, 3, 2) @union.task def generate_module() -> torch.nn.Module: bn = torch.nn.BatchNorm1d(3, track_running_stats=True) return bn @union.task def get_model_weight(model: torch.nn.Module) -> torch.Tensor: return model.weight class MyModel(torch.nn.Module): def __init__(self): super(MyModel, self).__init__() self.l0 = torch.nn.Linear(4, 2) self.l1 = torch.nn.Linear(2, 1) def forward(self, input): out0 = self.l0(input) out0_relu = torch.nn.functional.relu(out0) return self.l1(out0_relu) @union.task def get_l1() -> torch.nn.Module: model = MyModel() return model.l1 @union.workflow def pytorch_native_wf(): reshape_tensor(tensor=generate_tensor_2d()) get_model_weight(model=generate_module()) get_l1() ``` Passing around tensors and modules is no more a hassle! ## Checkpoint `PyTorchCheckpoint` is a specialized checkpoint used for serializing and deserializing PyTorch models. It checkpoints `torch.nn.Module`'s state, hyperparameters and optimizer state. This module checkpoint differs from the standard checkpoint as it specifically captures the module's `state_dict`. Therefore, when restoring the module, the module's `state_dict` must be used in conjunction with the actual module. According to the PyTorch [docs](https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-load-entire-model), it's recommended to store the module's `state_dict` rather than the module itself, although the serialization should work in either case. ```python from dataclasses import dataclass import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from dataclasses_json import dataclass_json from flytekit.extras.pytorch import PyTorchCheckpoint @dataclass_json @dataclass class Hyperparameters: epochs: int loss: float class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x @union.task def generate_model(hyperparameters: Hyperparameters) -> PyTorchCheckpoint: bn = Net() optimizer = optim.SGD(bn.parameters(), lr=0.001, momentum=0.9) return PyTorchCheckpoint(module=bn, hyperparameters=hyperparameters, optimizer=optimizer) @union.task def load(checkpoint: PyTorchCheckpoint): new_bn = Net() new_bn.load_state_dict(checkpoint["module_state_dict"]) optimizer = optim.SGD(new_bn.parameters(), lr=0.001, momentum=0.9) optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) @union.workflow def pytorch_checkpoint_wf(): checkpoint = generate_model(hyperparameters=Hyperparameters(epochs=10, loss=0.1)) load(checkpoint=checkpoint) ``` > [!NOTE] > `PyTorchCheckpoint` supports serializing hyperparameters of types `dict`, `NamedTuple` and `dataclass`. ## Auto GPU to CPU and CPU to GPU conversion Not all PyTorch computations require a GPU. In some cases, it can be advantageous to transfer the computation to a CPU, especially after training the model on a GPU. To utilize the power of a GPU, the typical construct to use is: `to(torch.device("cuda"))`. When working with GPU variables on a CPU, variables need to be transferred to the CPU using the `to(torch.device("cpu"))` construct. However, this manual conversion recommended by PyTorch may not be very user-friendly. To address this, we added support for automatic GPU to CPU conversion (and vice versa) for PyTorch types. ```python import union from typing import Tuple @union.task(requests=union.Resources(gpu="1")) def train() -> Tuple[PyTorchCheckpoint, torch.Tensor, torch.Tensor, torch.Tensor]: ... device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = Model(X_train.shape[1]) model.to(device) ... X_train, X_test = X_train.to(device), X_test.to(device) y_train, y_test = y_train.to(device), y_test.to(device) ... return PyTorchCheckpoint(module=model), X_train, X_test, y_test @union.task def predict( checkpoint: PyTorchCheckpoint, X_train: torch.Tensor, X_test: torch.Tensor, y_test: torch.Tensor, ): new_bn = Model(X_train.shape[1]) new_bn.load_state_dict(checkpoint["module_state_dict"]) accuracy_list = np.zeros((5,)) with torch.no_grad(): y_pred = new_bn(X_test) correct = (torch.argmax(y_pred, dim=1) == y_test).type(torch.FloatTensor) accuracy_list = correct.mean() ``` The `predict` task will run on a CPU, and the device conversion from GPU to CPU will be automatically handled by Union. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/structured-dataset === # StructuredDataset As with most type systems, Python has primitives, container types like maps and tuples, and support for user-defined structures. However, while thereโ€™s a rich variety of DataFrame classes (Pandas, Spark, Pandera, etc.), thereโ€™s no native Python type that represents a DataFrame in the abstract. This is the gap that the `StructuredDataset` type is meant to fill. It offers the following benefits: - Eliminate boilerplate code you would otherwise need to write to serialize/deserialize from file objects into DataFrame instances, - Eliminate additional inputs/outputs that convey metadata around the format of the tabular data held in those files, - Add flexibility around how DataFrame files are loaded, - Offer a range of DataFrame specific functionality - enforce compatibility of different schemas (not only at compile time, but also runtime since type information is carried along in the literal), store third-party schema definitions, and potentially in the future, render sample data, provide summary statistics, etc. ## Usage To use the `StructuredDataset` type, import `pandas` and define a task that returns a Pandas Dataframe. Union will detect the Pandas DataFrame return signature and convert the interface for the task to the `StructuredDataset` type. ## Example This example demonstrates how to work with a structured dataset using Union.ai entities. > [!NOTE] > To use the `StructuredDataset` type, you only need to import `pandas`. The other imports specified below are only necessary for this specific example. To begin, import the dependencies for the example: ```python import typing from dataclasses import dataclass from pathlib import Path import numpy as np import pandas as pd import pyarrow as pa import pyarrow.parquet as pq import union from flytekit.models import literals from flytekit.models.literals import StructuredDatasetMetadata from flytekit.types.structured.structured_dataset import ( PARQUET, StructuredDataset, StructuredDatasetDecoder, StructuredDatasetEncoder, StructuredDatasetTransformerEngine, ) from typing_extensions import Annotated ``` Define a task that returns a Pandas DataFrame. ```python @union.task(container_image=image_spec) def generate_pandas_df(a: int) -> pd.DataFrame: return pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [a, 22], "Height": [160, 178]}) ``` Using this simplest form, however, the user is not able to set the additional DataFrame information alluded to above, - Column type information - Serialized byte format - Storage driver and location - Additional third party schema information This is by design as we wanted the default case to suffice for the majority of use-cases, and to require as few changes to existing code as possible. Specifying these is simple, however, and relies on Python variable annotations, which is designed explicitly to supplement types with arbitrary metadata. ## Column type information If you want to extract a subset of actual columns of the DataFrame and specify their types for type validation, you can just specify the column names and their types in the structured dataset type annotation. First, initialize column types you want to extract from the `StructuredDataset`. ```python all_cols = union.kwtypes(Name=str, Age=int, Height=int) col = union.kwtypes(Age=int) ``` Define a task that opens a structured dataset by calling `all()`. When you invoke `all()` with ``pandas.DataFrame``, the Union.ai engine downloads the parquet file on S3, and deserializes it to `pandas.DataFrame`. Keep in mind that you can invoke ``open()`` with any DataFrame type that's supported or added to structured dataset. For instance, you can use ``pa.Table`` to convert the Pandas DataFrame to a PyArrow table. ```python @union.task(container_image=image_spec) def get_subset_pandas_df(df: Annotated[StructuredDataset, all_cols]) -> Annotated[StructuredDataset, col]: df = df.open(pd.DataFrame).all() df = pd.concat([df, pd.DataFrame([[30]], columns=["Age"])]) return StructuredDataset(dataframe=df) @union.workflow def simple_sd_wf(a: int = 19) -> Annotated[StructuredDataset, col]: pandas_df = generate_pandas_df(a=a) return get_subset_pandas_df(df=pandas_df) ``` The code may result in runtime failures if the columns do not match. The input ``df`` has ``Name``, ``Age`` and ``Height`` columns, whereas the output structured dataset will only have the ``Age`` column. ## Serialized byte format You can use a custom serialization format to serialize your DataFrames. Here's how you can register the Pandas to CSV handler, which is already available, and enable the CSV serialization by annotating the structured dataset with the CSV format: ```python from flytekit.types.structured import register_csv_handlers from flytekit.types.structured.structured_dataset import CSV register_csv_handlers() @union.task(container_image=image_spec) def pandas_to_csv(df: pd.DataFrame) -> Annotated[StructuredDataset, CSV]: return StructuredDataset(dataframe=df) @union.workflow def pandas_to_csv_wf() -> Annotated[StructuredDataset, CSV]: pandas_df = generate_pandas_df(a=19) return pandas_to_csv(df=pandas_df) ``` ## Storage driver and location By default, the data will be written to the same place that all other pointer-types (FlyteFile, FlyteDirectory, etc.) are written to. This is controlled by the output data prefix option in Union.ai which is configurable on multiple levels. That is to say, in the simple default case, Union will, - Look up the default format for say, Pandas DataFrames, - Look up the default storage location based on the raw output prefix setting, - Use these two settings to select an encoder and invoke it. So what's an encoder? To understand that, let's look into how the structured dataset plugin works. ## Inner workings of a structured dataset plugin Two things need to happen with any DataFrame instance when interacting with Union.ai: - Serialization/deserialization from/to the Python instance to bytes (in the format specified above). - Transmission/retrieval of those bits to/from somewhere. Each structured dataset plugin (called encoder or decoder) needs to perform both of these steps. Union decides which of the loaded plugins to invoke based on three attributes: - The byte format - The storage location - The Python type in the task or workflow signature. These three keys uniquely identify which encoder (used when converting a DataFrame in Python memory to a Union.ai value, e.g. when a task finishes and returns a DataFrame) or decoder (used when hydrating a DataFrame in memory from a Union.ai value, e.g. when a task starts and has a DataFrame input) to invoke. However, it is awkward to require users to use `typing.Annotated` on every signature. Therefore, Union has a default byte-format for every registered Python DataFrame type. ## The `uri` argument BigQuery `uri` allows you to load and retrieve data from cloud using the `uri` argument. The `uri` comprises of the bucket name and the filename prefixed with `gs://`. If you specify BigQuery `uri` for structured dataset, BigQuery creates a table in the location specified by the `uri`. The `uri` in structured dataset reads from or writes to S3, GCP, BigQuery or any storage. Before writing DataFrame to a BigQuery table, 1. Create a [GCP account](https://cloud.google.com/docs/authentication/getting-started) and create a service account. 2. Create a project and add the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to your `.bashrc` file. 3. Create a dataset in your project. Here's how you can define a task that converts a pandas DataFrame to a BigQuery table: ```python @union.task def pandas_to_bq() -> StructuredDataset: df = pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]}) return StructuredDataset(dataframe=df, uri="gs:///") ``` Replace `BUCKET_NAME` with the name of your GCS bucket and `FILE_NAME` with the name of the file the DataFrame should be copied to. ### Note that no format was specified in the structured dataset constructor, or in the signature. So how did the BigQuery encoder get invoked? This is because the stock BigQuery encoder is loaded into Union with an empty format. The Union `StructuredDatasetTransformerEngine` interprets that to mean that it is a generic encoder (or decoder) and can work across formats, if a more specific format is not found. And here's how you can define a task that converts the BigQuery table to a pandas DataFrame: ```python @union.task def bq_to_pandas(sd: StructuredDataset) -> pd.DataFrame: return sd.open(pd.DataFrame).all() ``` > [!NOTE] > Union.ai creates a table inside the dataset in the project upon BigQuery query execution. ## How to return multiple DataFrames from a task? For instance, how would a task return say two DataFrames: - The first DataFrame be written to BigQuery and serialized by one of their libraries, - The second needs to be serialized to CSV and written at a specific location in GCS different from the generic pointer-data bucket If you want the default behavior (which is itself configurable based on which plugins are loaded), you can work just with your current raw DataFrame classes. ```python @union.task def t1() -> typing.Tuple[StructuredDataset, StructuredDataset]: ... return StructuredDataset(df1, uri="bq://project:flyte.table"), \ StructuredDataset(df2, uri="gs://auxiliary-bucket/data") ``` If you want to customize the Union.ai interaction behavior, you'll need to wrap your DataFrame in a `StructuredDataset` wrapper object. ## How to define a custom structured dataset plugin? `StructuredDataset` ships with an encoder and a decoder that handles the conversion of a Python value to a Union.ai literal and vice-versa, respectively. Here is a quick demo showcasing how one might build a NumPy encoder and decoder, enabling the use of a 2D NumPy array as a valid type within structured datasets. ### NumPy encoder Extend `StructuredDatasetEncoder` and implement the `encode` function. The `encode` function converts NumPy array to an intermediate format (parquet file format in this case). ```python class NumpyEncodingHandler(StructuredDatasetEncoder): def encode( self, ctx: union.FlyteContext, structured_dataset: StructuredDataset, structured_dataset_type: union.StructuredDatasetType, ) -> literals.StructuredDataset: df = typing.cast(np.ndarray, structured_dataset.dataframe) name = ["col" + str(i) for i in range(len(df))] table = pa.Table.from_arrays(df, name) path = ctx.file_access.get_random_remote_directory() local_dir = ctx.file_access.get_random_local_directory() local_path = Path(local_dir) / f"{0:05}" pq.write_table(table, str(local_path)) ctx.file_access.upload_directory(local_dir, path) return literals.StructuredDataset( uri=path, metadata=StructuredDatasetMetadata(structured_dataset_type=union.StructuredDatasetType(format=PARQUET)), ) ``` ### NumPy decoder Extend `StructuredDatasetDecoder` and implement the `StructuredDatasetDecoder.decode` function. The `StructuredDatasetDecoder.decode` function converts the parquet file to a `numpy.ndarray`. ```python class NumpyDecodingHandler(StructuredDatasetDecoder): def decode( self, ctx: union.FlyteContext, flyte_value: literals.StructuredDataset, current_task_metadata: StructuredDatasetMetadata, ) -> np.ndarray: local_dir = ctx.file_access.get_random_local_directory() ctx.file_access.get_data(flyte_value.uri, local_dir, is_multipart=True) table = pq.read_table(local_dir) return table.to_pandas().to_numpy() ``` ### NumPy renderer Create a default renderer for numpy array, then Union will use this renderer to display schema of NumPy array on the Deck. ```python class NumpyRenderer: def to_html(self, df: np.ndarray) -> str: assert isinstance(df, np.ndarray) name = ["col" + str(i) for i in range(len(df))] table = pa.Table.from_arrays(df, name) return pd.DataFrame(table.schema).to_html(index=False) ``` In the end, register the encoder, decoder and renderer with the `StructuredDatasetTransformerEngine`. Specify the Python type you want to register this encoder with (`np.ndarray`), the storage engine to register this against (if not specified, it is assumed to work for all the storage backends), and the byte format, which in this case is `PARQUET`. ```python StructuredDatasetTransformerEngine.register(NumpyEncodingHandler(np.ndarray, None, PARQUET)) StructuredDatasetTransformerEngine.register(NumpyDecodingHandler(np.ndarray, None, PARQUET)) StructuredDatasetTransformerEngine.register_renderer(np.ndarray, NumpyRenderer()) ``` You can now use `numpy.ndarray` to deserialize the parquet file to NumPy and serialize a task's output (NumPy array) to a parquet file. ```python @union.task(container_image=image_spec) def generate_pd_df_with_str() -> pd.DataFrame: return pd.DataFrame({"Name": ["Tom", "Joseph"]}) @union.task(container_image=image_spec) def to_numpy(sd: StructuredDataset) -> Annotated[StructuredDataset, None, PARQUET]: numpy_array = sd.open(np.ndarray).all() return StructuredDataset(dataframe=numpy_array) @union.workflow def numpy_wf() -> Annotated[StructuredDataset, None, PARQUET]: return to_numpy(sd=generate_pd_df_with_str()) ``` > [!NOTE] > `pyarrow` raises an `Expected bytes, got a 'int' object` error when the DataFrame contains integers. You can run the code locally as follows: ```python if __name__ == "__main__": sd = simple_sd_wf() print(f"A simple Pandas DataFrame workflow: {sd.open(pd.DataFrame).all()}") print(f"Using CSV as the serializer: {pandas_to_csv_wf().open(pd.DataFrame).all()}") print(f"NumPy encoder and decoder: {numpy_wf().open(np.ndarray).all()}") ``` ### The nested typed columns Like most storage formats (e.g. Avro, Parquet, and BigQuery), StructuredDataset support nested field structures. ```python data = [ { "company": "XYZ pvt ltd", "location": "London", "info": {"president": "Rakesh Kapoor", "contacts": {"email": "contact@xyz.com", "tel": "9876543210"}}, }, { "company": "ABC pvt ltd", "location": "USA", "info": {"president": "Kapoor Rakesh", "contacts": {"email": "contact@abc.com", "tel": "0123456789"}}, }, ] @dataclass class ContactsField: email: str tel: str @dataclass class InfoField: president: str contacts: ContactsField @dataclass class CompanyField: location: str info: InfoField company: str MyArgDataset = Annotated[StructuredDataset, union.kwtypes(company=str)] MyTopDataClassDataset = Annotated[StructuredDataset, CompanyField] MyTopDictDataset = Annotated[StructuredDataset, {"company": str, "location": str}] MyDictDataset = Annotated[StructuredDataset, union.kwtypes(info={"contacts": {"tel": str}})] MyDictListDataset = Annotated[StructuredDataset, union.kwtypes(info={"contacts": {"tel": str, "email": str}})] MySecondDataClassDataset = Annotated[StructuredDataset, union.kwtypes(info=InfoField)] MyNestedDataClassDataset = Annotated[StructuredDataset, union.kwtypes(info=union.kwtypes(contacts=ContactsField))] image = union.ImageSpec(packages=["pandas", "pyarrow", "pandas", "tabulate"], registry="ghcr.io/flyteorg") @union.task(container_image=image) def create_parquet_file() -> StructuredDataset: from tabulate import tabulate df = pd.json_normalize(data, max_level=0) print("original DataFrame: \n", tabulate(df, headers="keys", tablefmt="psql")) return StructuredDataset(dataframe=df) @union.task(container_image=image) def print_table_by_arg(sd: MyArgDataset) -> pd.DataFrame: from tabulate import tabulate t = sd.open(pd.DataFrame).all() print("MyArgDataset DataFrame: \n", tabulate(t, headers="keys", tablefmt="psql")) return t @union.task(container_image=image) def print_table_by_dict(sd: MyDictDataset) -> pd.DataFrame: from tabulate import tabulate t = sd.open(pd.DataFrame).all() print("MyDictDataset DataFrame: \n", tabulate(t, headers="keys", tablefmt="psql")) return t @union.task(container_image=image) def print_table_by_list_dict(sd: MyDictListDataset) -> pd.DataFrame: from tabulate import tabulate t = sd.open(pd.DataFrame).all() print("MyDictListDataset DataFrame: \n", tabulate(t, headers="keys", tablefmt="psql")) return t @union.task(container_image=image) def print_table_by_top_dataclass(sd: MyTopDataClassDataset) -> pd.DataFrame: from tabulate import tabulate t = sd.open(pd.DataFrame).all() print("MyTopDataClassDataset DataFrame: \n", tabulate(t, headers="keys", tablefmt="psql")) return t @union.task(container_image=image) def print_table_by_top_dict(sd: MyTopDictDataset) -> pd.DataFrame: from tabulate import tabulate t = sd.open(pd.DataFrame).all() print("MyTopDictDataset DataFrame: \n", tabulate(t, headers="keys", tablefmt="psql")) return t @union.task(container_image=image) def print_table_by_second_dataclass(sd: MySecondDataClassDataset) -> pd.DataFrame: from tabulate import tabulate t = sd.open(pd.DataFrame).all() print("MySecondDataClassDataset DataFrame: \n", tabulate(t, headers="keys", tablefmt="psql")) return t @union.task(container_image=image) def print_table_by_nested_dataclass(sd: MyNestedDataClassDataset) -> pd.DataFrame: from tabulate import tabulate t = sd.open(pd.DataFrame).all() print("MyNestedDataClassDataset DataFrame: \n", tabulate(t, headers="keys", tablefmt="psql")) return t @union.workflow def contacts_wf(): sd = create_parquet_file() print_table_by_arg(sd=sd) print_table_by_dict(sd=sd) print_table_by_list_dict(sd=sd) print_table_by_top_dataclass(sd=sd) print_table_by_top_dict(sd=sd) print_table_by_second_dataclass(sd=sd) print_table_by_nested_dataclass(sd=sd) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/data-input-output/tensorflow === # TensorFlow types This document outlines the TensorFlow types available in Union.ai, which facilitate the integration of TensorFlow models and datasets in Union.ai workflows. ### Import necessary libraries and modules ```python import union from flytekit.types.directory import TFRecordsDirectory from flytekit.types.file import TFRecordFile custom_image = union.ImageSpec( packages=["tensorflow", "tensorflow-datasets", "flytekitplugins-kftensorflow"], registry="ghcr.io/flyteorg", ) import tensorflow as tf ``` ## Tensorflow model Union.ai supports the TensorFlow SavedModel format for serializing and deserializing `tf.keras.Model` instances. The `TensorFlowModelTransformer` is responsible for handling these transformations. ### Transformer - **Name:** TensorFlow Model - **Class:** `TensorFlowModelTransformer` - **Python Type:** `tf.keras.Model` - **Blob Format:** `TensorFlowModel` - **Dimensionality:** `MULTIPART` ### Usage The `TensorFlowModelTransformer` allows you to save a TensorFlow model to a remote location and retrieve it later in your Union.ai workflows. ```python @union.task(container_image=custom_image) def train_model() -> tf.keras.Model: model = tf.keras.Sequential( [tf.keras.layers.Dense(128, activation="relu"), tf.keras.layers.Dense(10, activation="softmax")] ) model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]) return model @union.task(container_image=custom_image) def evaluate_model(model: tf.keras.Model, x: tf.Tensor, y: tf.Tensor) -> float: loss, accuracy = model.evaluate(x, y) return accuracy @union.workflow def training_workflow(x: tf.Tensor, y: tf.Tensor) -> float: model = train_model() return evaluate_model(model=model, x=x, y=y) ``` ## TFRecord files Union.ai supports TFRecord files through the `TFRecordFile` type, which can handle serialized TensorFlow records. The `TensorFlowRecordFileTransformer` manages the conversion of TFRecord files to and from Union.ai literals. ### Transformer - **Name:** TensorFlow Record File - **Class:** `TensorFlowRecordFileTransformer` - **Blob Format:** `TensorFlowRecord` - **Dimensionality:** `SINGLE` ### Usage The `TensorFlowRecordFileTransformer` enables you to work with single TFRecord files, making it easy to read and write data in TensorFlow's TFRecord format. ```python @union.task(container_image=custom_image) def process_tfrecord(file: TFRecordFile) -> int: count = 0 for record in tf.data.TFRecordDataset(file): count += 1 return count @union.workflow def tfrecord_workflow(file: TFRecordFile) -> int: return process_tfrecord(file=file) ``` ## TFRecord directories Union.ai supports directories containing multiple TFRecord files through the `TFRecordsDirectory` type. The `TensorFlowRecordsDirTransformer` manages the conversion of TFRecord directories to and from Union.ai literals. ### Transformer - **Name:** TensorFlow Record Directory - **Class:** `TensorFlowRecordsDirTransformer` - **Python Type:** `TFRecordsDirectory` - **Blob Format:** `TensorFlowRecord` - **Dimensionality:** `MULTIPART` ### Usage The `TensorFlowRecordsDirTransformer` allows you to work with directories of TFRecord files, which is useful for handling large datasets that are split across multiple files. #### Example ```python @union.task(container_image=custom_image) def process_tfrecords_dir(dir: TFRecordsDirectory) -> int: count = 0 for record in tf.data.TFRecordDataset(dir.path): count += 1 return count @union.workflow def tfrecords_dir_workflow(dir: TFRecordsDirectory) -> int: return process_tfrecords_dir(dir=dir) ``` ## Configuration class: `TFRecordDatasetConfig` The `TFRecordDatasetConfig` class is a data structure used to configure the parameters for creating a `tf.data.TFRecordDataset`, which allows for efficient reading of TFRecord files. This class uses the `DataClassJsonMixin` for easy JSON serialization. ### Attributes - **compression_type**: (Optional) Specifies the compression method used for the TFRecord files. Possible values include an empty string (no compression), "ZLIB", or "GZIP". - **buffer_size**: (Optional) Defines the size of the read buffer in bytes. If not set, defaults will be used based on the local or remote file system. - **num_parallel_reads**: (Optional) Determines the number of files to read in parallel. A value greater than one outputs records in an interleaved order. - **name**: (Optional) Assigns a name to the operation for easier identification in the pipeline. This configuration is crucial for optimizing the reading process of TFRecord datasets, especially when dealing with large datasets or when specific performance tuning is required. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/administration === # Administration This section covers the administration of Union.ai. ## Subpages - **Administration > Resources** - **Administration > Cost allocation** - **Administration > User management** - **Administration > Applications** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/administration/resources === # Resources Select **Resources** in the top right of the Union.ai interface to open a view showing the overall health and utilization of your Union.ai installation. ![Resources link](../../_static/images/user-guide/administration/resources/resources-link.png) Four tabs are available: **Administration > Resources > Executions**, **Administration > Resources > Resource Quotas**, and **Administration > Resources > Compute**. ## Executions ![Usage Executions](../../_static/images/user-guide/administration/resources/resources-executions.png) This tab displays information about workflows, tasks, resource consumption, and resource utilization. ### Filter The drop-downs at the top lets you filter the charts below by project, domain and time period: ![](../../_static/images/user-guide/administration/resources/filter.png) * **Project**: Dropdown with multi-select over all projects. Making a selection recalculates the charts accordingly. Defaults to **All Projects**. * **Domain**: Dropdown with multi-select over all domains (for example, **development**, **staging**, **production**). Making a selection recalculates the charts accordingly. Defaults to **All Domains**. * **Time Period Selector**: Dropdown to select the period over which the charts are plotted. Making a selection recalculates the charts accordingly. Defaults to **24 Hours**. All times are expressed in UTC. ### Workflow Executions in Final State This chart shows the overall status of workflows at the project-domain level. ![](../../_static/images/user-guide/administration/resources/workflow-executions-in-final-state.png) For all workflows in the selected project and domain which reached their final state during the selected time period, the chart shows: * The number of successful workflows. * The number of aborted workflows. * The number of failed workflows. See [Workflow States](/docs/v1/flyte//architecture/content/workflow-state-transitions#workflow-states) for the precise definitions of these states. ### Task Executions in Final State This chart shows the overall status of tasks at the project-domain level. ![](../../_static/images/user-guide/administration/resources/task-executions-in-final-state.png) For all tasks in the selected project and domain which reached their final state during the selected time period, the chart shows: * The number of successful tasks. * The number of aborted tasks. * The number of failed tasks. See [Task States](/docs/v1/flyte//architecture/content/workflow-state-transitions#task-states) for the precise definitions of these states. ### Running Pods This chart shows the absolute resource consumption for * Memory (MiB) * CPU (number of cores) * GPU (number of cores) You can select which parameter to show by clicking on the corresponding button at the top of the chart. You can also select whether to show **Requested**, **Used**, or both. ![Running Pods](../../_static/images/user-guide/administration/resources/running-pods.png) ### Utilization This chart shows the percent resource utilization for * Memory * CPU You can select which parameter to show by clicking on the corresponding button at the top of the chart. ![Utilization](../../_static/images/user-guide/administration/resources/utilization.png) ## Resource Quotas This dashboard displays the resource quotas for projects and domains in the organization. ![Resource Quotas](../../_static/images/user-guide/administration/resources/resources-resource-quotas.png) ### Namespaces and Quotas Under the hood, Union.ai uses Kubernetes to run workloads. To deliver multi-tenancy, the system uses Kubernetes [namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/). In AWS based installations, each project-domain pair is mapped to a namespace. In GCP-based installations each domain is mapped to a namespace. Within each namespace, a [resource quota](https://kubernetes.io/docs/concepts/policy/resource-quotas/) is set for each resource type (memory, CPU, GPU). This dashboard displays the current point-in-time quota consumption for memory, CPU, and GPU. Quotas are defined as part of the set-up of the instance types in your data plane. To change them, talk to the Union.ai team. ### Examples Resource requests and limits are set at the task level like this (see **Core concepts > Tasks > Task hardware environment > Customizing task resources**): ```python @union.task(requests=Resources(cpu="1", mem="1Gi"), limits=Resources(cpu="10", mem="10Gi")) ``` This task requests 1 CPU and 1 gibibyte of memory. It sets a limit of 10 CPUs and 10 gibibytes of memory. If a task requesting the above resources (1 CPU and 1Gi) is executed in a project (for example **cluster-observability**) and domain (for example, **development**) with 10 CPU and 10Gi of quota for CPU and memory respectively, the dashboard will show that 10% of both memory and CPU quotas have been consumed. Likewise, if a task requesting 10 CPU and 10 Gi of memory is executed, the dashboard will show that 100% of both memory and CPU quotas have been consumed. Likewise, if a task requesting 10 CPU and 10Gi of memory is executed, the dashboard will show that 100% of both memory and CPU quotas have been consumed. ### Quota Consumption For each resource type, the sum of all the `limits` parameters set on all the tasks in a namespace determines quota consumption for that resource. Within a namespace, a given resourceโ€™s consumption can never exceed that resourceโ€™s quota. ## Compute This dashboard displays information about configured node pools in the organization. ![Resources compute](../../_static/images/user-guide/administration/resources/resources-compute.png) ![Resources compute](../../_static/images/user-guide/administration/resources/resources-compute.png) Union.ai will schedule tasks on a node pool that meets the requirements of the task (as defined by the `requests` and `limits` parameters in the task definition) and can vertically scale these node pools according to the minimum and maximum configured limits. This dashboard shows all currently-configured node pools, whether they are interruptible, labels and taints, minimum and maximum sizes, and allocatable resources. The allocatable resource values reflect any compute necessary for Union.ai services to function. This is why the value may be slightly lower than the quoted value from the cloud provider. This value, however, does not account for any overhead that may be used by third-party services, like Ray, for example. ### Information displayed The dashboard provides the following information: * **Instance Type**: The type of instance/VM/node as defined by your cloud provider. * **Interruptible:** A boolean. True If the instance is interruptible. * **Labels:** Node pool labels which can be used to target tasks at specific node types. * **Taints:** Node pool taints which can be used to avoid tasks landing on a node if they do not have the appropriate toleration. * **Minimum:** Minimum node pool size. Note that if this is set to zero, the node pool will scale down completely when not in use. * **Maximum:** Maximum node pool size. * **Allocatable Resources:** * **CPU**: The maximum CPU you can request in a task definition after accounting for overheads and other factors. * **Memory**: The maximum memory you can request in a task definition after accounting for overheads and other factors. * **GPU**: The maximum number of GPUs you can request in a task definition after accounting for overheads and other factors. * **Ephemeral Storage**: The maximum storage you can request in a task definition after accounting for overheads and other factors. * Note that these values are estimates and may not reflect the exact allocatable resources on any node in your cluster. ### Examples In the screenshot above, there is a `t3a.xlarge` with `3670m` (3670 millicores) of allocatable CPU, and a larger `c5.4xlarge` with `15640m` of allocatable CPU. In order to schedule a workload on the smaller node, you could specify the following in a task definition: ```python @union.task(requests=Resources(cpu="3670m", mem="1Gi"), limits=Resources(cpu="3670m", mem="1Gi")) ``` In the absence of confounding factors (for example, other workloads fully utilizing all `t3a.xlarge` instances), this task will spin up a `t3a.xlarge` instance and run the execution on it, taking all available allocatable CPU resources. Conversely, if a user requests the following: ```python @union.task(requests=Resources(cpu="4000m", mem="1Gi"), limits=Resources(cpu="4000m", mem="1Gi")) ``` The workload will schedule on a larger instance (like the `c5.4xlarge`) because `4000m` exceeds the allocatable CPU on the `t3a.xlarge`, despite the fact that this instance type is [marketed](https://instances.vantage.sh/aws/ec2/t3a.xlarge) as having 4 CPU cores. The discrepancy is due to overheads and holdbacks introduced by Kubernetes to ensure adequate resources to schedule pods on the node. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/administration/cost-allocation === # Cost allocation Cost allocation allows you to track costs and resource utilization for your task and workflow executions. It provides the following breakdowns by project, domain, workflow/task name, and execution ID: * **Total Cost**: An estimate of the total cost. Total cost is composed of estimated costs of each container's allocated memory, CPU, and GPU, plus that container's proportion of unused compute resources on nodes it occupies. * **Allocated Memory**: The aggregate allocated memory (gigabyte-hours) across all containers in the selection. Allocated memory is calculated as max(requested memory, used memory). * **Memory Utilization**: The aggregate used memory divided by the aggregate allocated memory across all containers in the selection. * **Allocated CPU**: The aggregate allocated CPU (core-hours) across all containers in the selection. Allocated memory is calculated as max(requested CPU, used CPU). * **CPU Utilization**: The aggregate used CPU divided by the aggregate allocated CPU across all containers in the selection. * **Allocated GPU**: The aggregate allocated GPU (GPU-hours) across all containers in the selection (allocated GPU equals requested GPU). * **GPU SM Occupancy**: The weighted average SM occupancy (a measure of GPU usage efficiency) across all GPU containers in the selection. Additionally, it provides a stacked bar chart of the cost over time grouped by workflow/task name. The height of each bar is the sum of costs across each 15-minute interval. ## Suggested Usage Cost allocation is designed to show where costs are being incurred and to highlight opportunities for cost reduction through right-sizing resource requests. All tables are sorted in descending order of total cost, so users can scan across the rows to quickly identify expensive workloads with low memory, CPU, or GPU utilization. Steps can then be taken to reduce the resource requests for particular workflows. Union.ai's task-level monitoring functionality can be used to view granular resource usage for individual tasks, making this exercise straightforward. ## Accessing Cost Data Cost data is accessed by selecting the **Cost** button in the top right of the Union.ai interface: ![Cost link](../../_static/images/user-guide/administration/cost-allocation/cost-link.png) The **Cost** view displays three top level tabs: **Workload Costs**, **Compute Costs**, and **Invoices**. ### Workload Costs This tab provides a detailed breakdown of workflow/task costs and resource utilization, allowing you to filter by project, domain, workflow/task name, and execution ID. It offers views showing total cost, allocated memory, memory utilization, allocated CPU, CPU utilization, allocated GPU, and average GPU SM occupancy. Additionally, the time series shows total cost per Workflow/Task in a stacked bar format with 15-minute bars. ![Workload costs 1](../../_static/images/user-guide/administration/cost-allocation/workload-costs-1.png) ![Workload costs 2](../../_static/images/user-guide/administration/cost-allocation/workload-costs-2.png) ### Compute Costs This tab provides a summary of the cluster's overall compute costs. It includes information on total cost of worker nodes, total uptime by node type, and total cost by node type. ![Compute costs](../../_static/images/user-guide/administration/cost-allocation/compute-costs.png) ### Invoices This tab displays the total cost of running workflows and tasks in your Union.ai installation broken out by invoice. ![Invoices](../../_static/images/user-guide/administration/cost-allocation/invoices.png) ## Data collection and cost calculation The system collects container-level usage metrics, such as resource allocation and usage, node scaling information, and compute node pricing data from each cloud provider. These metrics are then processed to calculate the cost of each workflow execution and individual task. ## Total cost calculation The total cost per workflow or task execution is the sum of allocated cost and overhead cost: * **Allocated Cost**: Cost directly attributable to your workflow's resource usage (memory, CPU, and GPU). This is calculated based on the resources requested or consumed (whichever is higher) by the containers running your workloads. * **Overhead Cost**: Cost associated with the underlying cluster infrastructure that cannot be directly allocated to specific workflows or tasks. This is calculated by proportionally, assigning a share of the unallocated node costs to each entity based on its consumption of allocated resources. ## Allocated cost calculation The cost of CPU, memory, and GPU resources is calculated using the following approach: * **Resource consumption**: For CPU and Memory, the system determines the maximum of requested and used resources for each container. GPU consumption is determined by a containerโ€™s allocated GPU resources. Resource consumption is measured every 15 seconds. * **Node-level cost**: Hourly costs for CPU, memory, and GPU are calculated using a statistical model based on a regression of node types on their resource specs. These hourly costs are converted to a 15-second cost for consistency with the data collection interval. For node costs, the total hourly cost of each node type is used. * **Allocation to Entities**: The resource costs from each container are then allocated to the corresponding workflow or task execution. ## Overhead Cost Calculation Overhead costs represent the portion of the cluster's infrastructure cost not directly attributable to individual workflows or tasks. These costs are proportionally allocated to workflows/tasks and applications based on their use of allocated resources. Specifically: * The total allocated cost per node is calculated by summing the allocated costs (memory, CPU, and GPU) for all entities running on that node. * The overhead cost per node is the difference between the total node cost and the total allocated cost on that node. * The overhead cost is then proportionally allocated to each entity running on that node according to its share of the total allocated cost on that node. ## Limitations The system currently assumes that all nodes in the cluster are using on-demand pricing. Therefore, cost will be overestimated for spot and reserved instances, as well as special pricing arrangements with cloud providers. Overhead cost allocation is an approximation and might not perfectly reflect the true distribution of overhead costs. In particular, overhead costs are only evaluated within the scope of a single 15-second scrape interval. This means that the system can still fail to allocate costs to nodes which are left running after a given execution completes. Union.ai services and fees such as platform fees are not reflected in the dashboards. Cost is scoped to nodes that have been used in running executions. The accuracy of cost allocation depends on the accuracy of the underlying resource metrics as well as per-node pricing information. This feature limits lookback to 60 days and allows picking any time range within the past 60 days to assess cost. ## Future Enhancements Future enhancements may include: * Longer lookback period (i.e. 90 days) * Customizable pricing per node type * Data export * Per-task cost allocation granularity If you have an idea for what you and your business would like to see, please reach out to the Union.ai team. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/administration/user-management === # User management Union.ai comes with role-based access control management out of the box. The system is based on the following concepts: * **Action**: An action that can be performed by a **user** or **application**. For example, `register_flyte_inventory` is the action of registering tasks and workflows. * **Role**: A set of **actions**. The system includes built-in roles out of the box (see below) and also enabled administrators to define custom roles. * **Policy**: A set of bindings between a **role** and an **organization**, **project, domain or project-domain** pair. * **User** or **application**: An actor to which **policies** can be assigned. Through the assigned policies, the user or application acquires permission to perform the specified **actions** on the designated resources. A user is a person, registered and identified by **email address**. An application is an automated process (a bot, service, or other type of program), registered and identified by **application ID**. * **Organization**: A set of projects associated with a company, department, or other organization. * **Project**: A set of associated workflows, tasks, launch plans, and other Union.ai entities. * **Domain**: Categories representing the standard environments used in the development process: **development**, **staging**, and **production**. * **Project-domain pair**: The set of projects is divided orthogonally by the three **domains**. The result is a set of project-domain pairs. For example: `flytesnacks/development`, `flytesnacks/staging`, and `flytesnacks/production`. ## Actions The following is the full list of actions available in the Union.ai system: * `administer_project`: Permission to archive and update a project and manage customizable resources. * `manage_permissions`: Permission to manage user and machine applications and their policy assignments. * `create_flyte_executions`: Permission to launch new flyte executions. * `register_flyte_inventory`: Permission to register workflows, tasks, and launch plans. * `view_flyte_executions`: Permission to view historical flyte execution data. * `view_flyte_inventory`: Permission to view registered workflows, tasks, and launch plans. ## Built-in policies Union.ai ships with three built-in policies: **Admin**, **Contributor**, and **Viewer**. * An **Admin** has permission to perform all actions (`administer_project`, `manage_permissions`, `create_flyte_executions`, `register_flyte_inventory`, `view_flyte_executions`, `view_flyte_inventory`) across the organization (in all projects and domains). In other words: * Invite users and assign roles. * View the **Monitoring** and **Billing** dashboards. * Do everything a **Contributor** can do. * A **Contributor** has permission to perform the actions `create_flyte_executions`, `register_flyte_inventory`, `view_flyte_executions`, and `view_flyte_inventory`across the organization (in all projects and domains). In other words: * Register and execute workflows, tasks and launch plans. * Do everything a **Viewer** can do. * A **Viewer** has permission to perform the actions `view_flyte_executions` and `view_flyte_inventory`across the organization (in all projects and domains). In other words: * View workflows, tasks, launch plans, and executions. ## Multiple policies Users and applications are assigned to zero or more policies. A user or application with no policies will have no permissions but will not be **Administration > User management > Managing users and assigning policies > Removing a user**. For example, in the case of users, they will still appear on the **Administration > User management > Managing users and assigning policies**. A user or application with multiple policies will have the logical union of the permission sets of those policies. > [!NOTE] > The default roles that come out of the box are hierarchical. > The **Admin** permission set is a superset of the **Contributor** permission set and the **Contributor** permission set is a superset of **Viewer** permission set. > This means, for example, that if you make a user an **Admin**, then additionally assigning them **Contributor** or **Viewer** will make no difference. > But this is only the case due to how these particular roles are defined. > In general, it is possible to create roles where assigning multiple ones is meaningful. ## Custom roles and policies It is possible to create new custom roles and policies. Custom roles and policies can, for example, be used to mix and match permissions at the organization, project, or domain level. Roles and policies are created using the **Uctl CLI** (not the **Union CLI**). Make sure you have the **Uctl CLI**. ### Create a role Create a role spec file `my_role.yaml` that defines a set of actions: ```yaml :name: my_role.yaml name: Workflow Runner actions: - view_flyte_inventory - view_flyte_executions - create_flyte_executions ``` Create the role from the command line: ```shell $ uctl create role --roleFile my_role.yaml ``` ### Create a policy Create a policy spec file `my_policy.yaml` that binds roles to project/domain pairs. Here we create a policy that binds the **Contributor** role to `flytesnacks/development` and binds the **Workflow Runner** role (defined above) to `flytesnacks/production`: ```yaml :name: my_policy.yaml name: Workflow Developer Policy bindings: - role: Workflow Runner resource: project: flytesnacks domain: production - role: contributor # Boilerplate system role resource: project: flytesnacks domain: development ``` Create the policy from the command line: ```shell $ uctl create policy --policyFile my_policy.yaml ``` Any user or application to which this policy is assigned will be granted **Contributor** permissions to `flytesnacks/development` while being granted (the more restrictive) **Workflow Runner** permission to `flytesnacks/production`. ### Assign the policy to a user Once the policy is created you can assign it to a user using the **User Management** interface in the UI (see **Administration > User management > Managing users and assigning policies > Changing assigned policies** below) or using the command line: ```shell $ uctl append identityassignments \ --user "bob@contoso.com" \ --policy "Workflow Developer Policy" ``` Similarly, you can assign the policy to an application through the command line (there is currently no facility to assign policies to applications in the UI): ```shell $ uctl append identityassignments \ --application "contoso-operator" \ --policy "Workflow Developer Policy" ``` ## Initial onboarding The initial Union.ai onboarding process will set up your organization with at least one **Admin** user who will have permission to invite teammates and manage their roles. ## Managing users and assigning policies To add and remove users and to assign and unassign roles, you have to be an **Admin**. As an **Admin** you should see a **Users** icon at the top-right of the UI: ![](../../_static/images/user-guide/administration/user-management/users-button.png) Select this icon to display the **User Management** dialog displays the list of users: ![](../../_static/images/user-guide/administration/user-management/user-management.png "medium") Each user is listed with their assigned policies. You can search the list and filter by policy. ### Adding a user To add a new user, select **ADD USER**. In the **Add User** dialog, fill in the name and email of the new user and select the policies to assign, then select either **SUBMIT** or **SUBMIT AND ADD ANOTHER USER**: ![](../../_static/images/user-guide/administration/user-management/add-user.png "medium") The new user should expect to see an email invite from Okta after they have been added through this dialog. They should accept the invite and set up a password. At that point, they will be able to access the Union.ai UI. ### Changing assigned policies To change a user's assigned policies, go to the **User Management** dialog and select the user. The **Edit User** dialog will appear: ![](../../_static/images/user-guide/administration/user-management/edit-user.png "medium") To adjust the assigned policies, simply toggle the appropriate buttons and select **SUBMIT**. ### Removing a user To remove a user, in the **Edit User** dialog (above), select **REMOVE USER**. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/administration/applications === # Applications A Union.ai application is an identity through which external systems can perform actions in the system. An application can be bound to policies and granted permissions just like a human user. Applications are managed through the **Uctl CLI**. ## List existing apps ```shell $ uctl get apps ``` Output: ```text -------------------- --------------------- ---------------- ---------------------------------------- | ID (4) | CLIENT NAME | RESPONSE TYPES | GRANT TYPES | -------------------- -------------------- ---------------- ----------------------------------------- | contoso-flyteadmin | contoso flyteadmin | [CODE] | [CLIENT_CREDENTIALS AUTHORIZATION_CODE] | -------------------- -------------------- ---------------- ----------------------------------------- | contoso-uctl | contoso uctl | [CODE] | [AUTHORIZATION_CODE] | -------------------- -------------------- ---------------- ----------------------------------------- | contoso-operator | contoso operator | [CODE] | [CLIENT_CREDENTIALS AUTHORIZATION_CODE] | -------------------- -------------------- ---------------- ----------------------------------------- ``` > [!NOTE] > These 3 apps are built into the system. > Modifying these by editing, deleting or recreating them will disrupt the system. ## Exporting the spec of an existing app ```shell $ uctl get apps contoso-operator --appSpecFile app.yaml ``` Output: ```yaml clientId: contoso-operator clientName: contoso operator grantTypes: - CLIENT_CREDENTIALS - AUTHORIZATION_CODE redirectUris: - http://localhost:8080/authorization-code/callback responseTypes: - CODE tokenEndpointAuthMethod: CLIENT_SECRET_BASIC ``` ## Creating a new app First, create a specification file called `app.yaml` (for example) with the following contents (you can adjust the `clientId` and `clientName` to your requirements): ```yaml clientId: example-operator clientName: Example Operator grantTypes: - CLIENT_CREDENTIALS - AUTHORIZATION_CODE redirectUris: - http://localhost:8080/authorization-code/callback responseTypes: - CODE tokenEndpointAuthMethod: CLIENT_SECRET_BASIC ``` Now, create the app using the specification file: ```shell $ uctl create app --appSpecFile app.yaml ``` The response should look something like this: ```text ------------------ ------------------- ------------- --------- | NAME | CLIENT NAME | SECRET | CREATED | ------------------ ------------------- ------------- --------- | example-operator | Example Operator | | | ------------------ ------------------- ------------- --------- ``` Copy the `` to an editor for later use. This is the only time that the secret will be displayed. The secret is not stored by Union.ai. ## Update an existing app To update an existing app, update its specification file as desired while leaving the `clientId` the same, to identify which app is to be updated, and then do: ```shell $ uctl apply app --appSpecFile app.yaml ``` ## Delete an app To delete an app use the `uctl delete app` command and specify the app by ID: ```shell $ uctl delete app example-operator ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/programming === # Programming This section covers the general programming of Union.ai. ## Subpages - **Programming > Chaining Entities** - **Programming > Conditionals** - **Programming > Decorating tasks** - **Programming > Decorating workflows** - **Programming > Intratask checkpoints** - **Programming > Waiting for external inputs** - **Programming > Nested parallelism** - **Programming > Failure node** === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/programming/chaining-entities === # Chaining Entities Union.ai offers a mechanism for chaining entities using the `>>` operator. This is particularly valuable when chaining tasks and subworkflows without the need for data flow between the entities. ## Tasks Letโ€™s establish a sequence where `t1()` occurs after `t0()`, and `t2()` follows `t1()`. ```python import union @union.task def t2(): print("Running t2") return @union.task def t1(): print("Running t1") return @union.task def t0(): print("Running t0") return # Chaining tasks @union.workflow def chain_tasks_wf(): t2_promise = t2() t1_promise = t1() t0_promise = t0() t0_promise >> t1_promise t1_promise >> t2_promise ``` ## Subworkflows Just like tasks, you can chain subworkflows. ```python @union.workflow def sub_workflow_1(): t1() @union.workflow def sub_workflow_0(): t0() @union.workflow def chain_workflows_wf(): sub_wf1 = sub_workflow_1() sub_wf0 = sub_workflow_0() sub_wf0 >> sub_wf1 ``` > [!NOTE] > Chaining tasks and subworkflows is not supported in local Python environments. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/programming/conditionals === # Conditionals Union elevates conditions to a first-class construct named `conditional`, providing a powerful mechanism for selectively executing branches in a workflow. Conditions leverage static or dynamic data generated by tasks or received as workflow inputs. While conditions are highly performant in their evaluation, it's important to note that they are restricted to specific binary and logical operators and are applicable only to primitive values. To begin, import the necessary libraries. ```python import random import union from flytekit import conditional from flytekit.core.task import Echo ``` ## Simple branch In this example, we introduce two tasks, `calculate_circle_circumference` and `calculate_circle_area`. The workflow dynamically chooses between these tasks based on whether the input falls within the fraction range (0-1) or not. ```python @union.task def calculate_circle_circumference(radius: float) -> float: return 2 * 3.14 * radius # Task to calculate the circumference of a circle @union.task def calculate_circle_area(radius: float) -> float: return 3.14 * radius * radius # Task to calculate the area of a circle @union.workflow def shape_properties(radius: float) -> float: return ( conditional("shape_properties") .if_((radius >= 0.1) & (radius < 1.0)) .then(calculate_circle_circumference(radius=radius)) .else_() .then(calculate_circle_area(radius=radius)) ) if __name__ == "__main__": radius_small = 0.5 print(f"Circumference of circle (radius={radius_small}): {shape_properties(radius=radius_small)}") radius_large = 3.0 print(f"Area of circle (radius={radius_large}): {shape_properties(radius=radius_large)}") ``` ## Multiple branches We establish an `if` condition with multiple branches, which will result in a failure if none of the conditions is met. It's important to note that any `conditional` statement in Flyte is expected to be complete, meaning that all possible branches must be accounted for. ```python @union.workflow def shape_properties_with_multiple_branches(radius: float) -> float: return ( conditional("shape_properties_with_multiple_branches") .if_((radius >= 0.1) & (radius < 1.0)) .then(calculate_circle_circumference(radius=radius)) .elif_((radius >= 1.0) & (radius <= 10.0)) .then(calculate_circle_area(radius=radius)) .else_() .fail("The input must be within the range of 0 to 10.") ) ``` > [!NOTE] > Take note of the usage of bitwise operators (`&`). Due to Python's PEP-335, > the logical `and`, `or` and `not` operators cannot be overloaded. > Flytekit employs bitwise `&` and `|` as equivalents for logical `and` and `or` operators, > a convention also observed in other libraries. ## Consuming the output of a conditional Here, we write a task that consumes the output returned by a `conditional`. ```python @union.workflow def shape_properties_accept_conditional_output(radius: float) -> float: result = ( conditional("shape_properties_accept_conditional_output") .if_((radius >= 0.1) & (radius < 1.0)) .then(calculate_circle_circumference(radius=radius)) .elif_((radius >= 1.0) & (radius <= 10.0)) .then(calculate_circle_area(radius=radius)) .else_() .fail("The input must exist between 0 and 10.") ) return calculate_circle_area(radius=result) if __name__ == "__main__": radius_small = 0.5 print( f"Circumference of circle (radius={radius_small}) x Area of circle (radius={calculate_circle_circumference(radius=radius_small)}): {shape_properties_accept_conditional_output(radius=radius_small)}" ) ``` ## Using the output of a previous task in a conditional You can check if a boolean returned from the previous task is `True`, but unary operations are not supported directly. Instead, use the `is_true`, `is_false` and `is_none` methods on the result. ```python @union..task def coin_toss(seed: int) -> bool: """ Mimic a condition to verify the successful execution of an operation """ r = random.Random(seed) if r.random() < 0.5: return True return False @union..task def failed() -> int: """ Mimic a task that handles failure """ return -1 @union..task def success() -> int: """ Mimic a task that handles success """ return 0 @union..workflow def boolean_wf(seed: int = 5) -> int: result = coin_toss(seed=seed) return conditional("coin_toss").if_(result.is_true()).then(success()).else_().then(failed()) ``` [!NOTE] > *How do output values acquire these methods?* In a workflow, direct access to outputs is not permitted. > Inputs and outputs are automatically encapsulated in a special object known as `flytekit.extend.Promise`. ## Using boolean workflow inputs in a conditional You can directly pass a boolean to a workflow. ```python @union.workflow def boolean_input_wf(boolean_input: bool) -> int: return conditional("boolean_input_conditional").if_(boolean_input.is_true()).then(success()).else_().then(failed()) ``` > [!NOTE] > Observe that the passed boolean possesses a method called `is_true`. > This boolean resides within the workflow context and is encapsulated in a specialized Flytekit object. > This special object enables it to exhibit additional behavior. You can run the workflows locally as follows: ```python if __name__ == "__main__": print("Running boolean_wf a few times...") for index in range(0, 5): print(f"The output generated by boolean_wf = {boolean_wf(seed=index)}") print( f"Boolean input: {True if index < 2 else False}; workflow output: {boolean_input_wf(boolean_input=True if index < 2 else False)}" ) ``` ## Nested conditionals You can nest conditional sections arbitrarily inside other conditional sections. However, these nested sections can only be in the `then` part of a `conditional` block. ```python @union.workflow def nested_conditions(radius: float) -> float: return ( conditional("nested_conditions") .if_((radius >= 0.1) & (radius < 1.0)) .then( conditional("inner_nested_conditions") .if_(radius < 0.5) .then(calculate_circle_circumference(radius=radius)) .elif_((radius >= 0.5) & (radius < 0.9)) .then(calculate_circle_area(radius=radius)) .else_() .fail("0.9 is an outlier.") ) .elif_((radius >= 1.0) & (radius <= 10.0)) .then(calculate_circle_area(radius=radius)) .else_() .fail("The input must be within the range of 0 to 10.") ) if __name__ == "__main__": print(f"nested_conditions(0.4): {nested_conditions(radius=0.4)}") ``` ## Using the output of a task in a conditional Let's write a fun workflow that triggers the `calculate_circle_circumference` task in the event of a "heads" outcome, and alternatively, runs the `calculate_circle_area` task in the event of a "tail" outcome. ```python @union.workflow def consume_task_output(radius: float, seed: int = 5) -> float: is_heads = coin_toss(seed=seed) return ( conditional("double_or_square") .if_(is_heads.is_true()) .then(calculate_circle_circumference(radius=radius)) .else_() .then(calculate_circle_area(radius=radius)) ) ``` You can run the workflow locally as follows: ```python if __name__ == "__main__": default_seed_output = consume_task_output(radius=0.4) print( f"Executing consume_task_output(0.4) with default seed=5. Expected output: calculate_circle_area => {default_seed_output}" ) custom_seed_output = consume_task_output(radius=0.4, seed=7) print( f"Executing consume_task_output(0.4, seed=7). Expected output: calculate_circle_circumference => {custom_seed_output}" ) ``` ## Running a noop task in a conditional In some cases, you may want to skip the execution of a conditional workflow if a certain condition is not met. You can achieve this by using the `echo` task, which simply returns the input value. > [!NOTE] > To enable the echo plugin in the backend, add the plugin to Flyte's configuration file. > ```yaml > task-plugins: > enabled-plugins: > - echo > ``` ```python echo = Echo(name="echo", inputs={"radius": float}) @union.workflow def noop_in_conditional(radius: float, seed: int = 5) -> float: is_heads = coin_toss(seed=seed) return ( conditional("noop_in_conditional") .if_(is_heads.is_true()) .then(calculate_circle_circumference(radius=radius)) .else_() .then(echo(radius=radius)) ) ``` ## Run the example on the Flyte cluster To run the provided workflows on the Flyte cluster, use the following commands: ```shell $ union run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/656e63d1c8dded3e9e7161c7af6425e9fcd43f56/examples/advanced_composition/advanced_composition/conditional.py \ shape_properties --radius 3.0 ``` ```shell $ union run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/656e63d1c8dded3e9e7161c7af6425e9fcd43f56/examples/advanced_composition/advanced_composition/conditional.py \ shape_properties_with_multiple_branches --radius 11.0 ``` ```shell $ union run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/656e63d1c8dded3e9e7161c7af6425e9fcd43f56/examples/advanced_composition/advanced_composition/conditional.py \ shape_properties_accept_conditional_output --radius 0.5 ``` ```shell $ union run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/656e63d1c8dded3e9e7161c7af6425e9fcd43f56/examples/advanced_composition/advanced_composition/conditional.py \ boolean_wf ``` ```shell $ union run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/656e63d1c8dded3e9e7161c7af6425e9fcd43f56/examples/advanced_composition/advanced_composition/conditional.py \ boolean_input_wf --boolean_input ``` ```shell $ union run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/656e63d1c8dded3e9e7161c7af6425e9fcd43f56/examples/advanced_composition/advanced_composition/conditional.py \ nested_conditions --radius 0.7 ``` ```shell $ union run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/656e63d1c8dded3e9e7161c7af6425e9fcd43f56/examples/advanced_composition/advanced_composition/conditional.py \ consume_task_output --radius 0.4 --seed 7 ``` ```shell $ union run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/656e63d1c8dded3e9e7161c7af6425e9fcd43f56/examples/advanced_composition/advanced_composition/conditional.py \ noop_in_conditional --radius 0.4 --seed 5 ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/programming/decorating_tasks === # Decorating tasks You can easily change how tasks behave by using decorators to wrap your task functions. In order to make sure that your decorated function contains all the type annotation and docstring information that Flyte needs, you will need to use the built-in `functools.wraps` decorator. To begin, create a file called `decorating_tasks.py`. Add the imports: ```python import logging import union from functools import partial, wraps ``` Create a logger to monitor the execution's progress. ```python logger = logging.getLogger(__file__) ``` ## Using a single decorator We define a decorator that logs the input and output details for a decorated task. ```python def log_io(fn): @wraps(fn) def wrapper(*args, **kwargs): logger.info(f"task {fn.__name__} called with args: {args}, kwargs: {kwargs}") out = fn(*args, **kwargs) logger.info(f"task {fn.__name__} output: {out}") return out return wrapper ``` We create a task named `t1` that is decorated with `log_io`. > [!NOTE] > The order of invoking the decorators is important. `@task` should always be the outer-most decorator. ```python @union.task @log_io def t1(x: int) -> int: return x + 1 ``` ## Stacking multiple decorators You can also stack multiple decorators on top of each other as long as `@task` is the outer-most decorator. We define a decorator that verifies if the output from the decorated function is a positive number before it's returned. If this assumption is violated, it raises a `ValueError` exception. ```python def validate_output(fn=None, *, floor=0): @wraps(fn) def wrapper(*args, **kwargs): out = fn(*args, **kwargs) if out <= floor: raise ValueError(f"output of task {fn.__name__} must be a positive number, found {out}") return out if fn is None: return partial(validate_output, floor=floor) return wrapper ``` > [!NOTE] > The output of the `validate_output` task uses `functools.partial` to implement parameterized decorators. We define a function that uses both the logging and validator decorators. ```python @union.task @log_io @validate_output(floor=10) def t2(x: int) -> int: return x + 10 ``` Finally, we compose a workflow that calls `t1` and `t2`. ```python @union.workflow def decorating_task_wf(x: int) -> int: return t2(x=t1(x=x)) ``` ## Run the example on Union.ai To run the workflow, execute the following command: ```bash union run --remote decorating_tasks.py decorating_task_wf --x 10 ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/programming/decorating_workflows === # Decorating workflows The behavior of workflows can be modified in a lightweight fashion by using the built-in `functools.wraps` decorator pattern, similar to using decorators to **Programming > Decorating workflows > customize task behavior**. However, unlike in the case of tasks, we need to do a little extra work to make sure that the DAG underlying the workflow executes tasks in the correct order. ## Setup-teardown pattern The main use case of decorating `@union.workflow`-decorated functions is to establish a setup-teardown pattern to execute task before and after your main workflow logic. This is useful when integrating with other external services like [wandb](https://wandb.ai/site) or [clearml](https://clear.ml/), which enable you to track metrics of model training runs. To begin, create a file called `decorating_workflows`. Import the necessary libraries: ```python from functools import partial, wraps from unittest.mock import MagicMock import union from flytekit import FlyteContextManager from flytekit.core.node_creation import create_node ``` Let's define the tasks we need for setup and teardown. In this example, we use the `unittest.mock.MagicMock` class to create a fake external service that we want to initialize at the beginning of our workflow and finish at the end. ```python external_service = MagicMock() @union.task def setup(): print("initializing external service") external_service.initialize(id=flytekit.current_context().execution_id) @union.task def teardown(): print("finish external service") external_service.complete(id=flytekit.current_context().execution_id) ``` As you can see, you can even use Flytekit's current context to access the `execution_id` of the current workflow if you need to link Flyte with the external service so that you reference the same unique identifier in both the external service and Flyte. ## Workflow decorator We create a decorator that we want to use to wrap our workflow function. ```python def setup_teardown(fn=None, *, before, after): @wraps(fn) def wrapper(*args, **kwargs): # get the current flyte context to obtain access to the compilation state of the workflow DAG. ctx = FlyteContextManager.current_context() # defines before node before_node = create_node(before) # ctx.compilation_state.nodes == [before_node] # under the hood, flytekit compiler defines and threads # together nodes within the `my_workflow` function body outputs = fn(*args, **kwargs) # ctx.compilation_state.nodes == [before_node, *nodes_created_by_fn] # defines the after node after_node = create_node(after) # ctx.compilation_state.nodes == [before_node, *nodes_created_by_fn, after_node] # compile the workflow correctly by making sure `before_node` # runs before the first workflow node and `after_node` # runs after the last workflow node. if ctx.compilation_state is not None: # ctx.compilation_state.nodes is a list of nodes defined in the # order of execution above workflow_node0 = ctx.compilation_state.nodes[1] workflow_node1 = ctx.compilation_state.nodes[-2] before_node >> workflow_node0 workflow_node1 >> after_node return outputs if fn is None: return partial(setup_teardown, before=before, after=after) return wrapper ``` There are a few key pieces to note in the `setup_teardown` decorator above: 1. It takes a `before` and `after` argument, both of which need to be `@union.task`-decorated functions. These tasks will run before and after the main workflow function body. 2. The [create_node](https://github.com/flyteorg/flytekit/blob/9e156bb0cf3d1441c7d1727729e8f9b4bbc3f168/flytekit/core/node_creation.py#L18) function to create nodes associated with the `before` and `after` tasks. 3. When `fn` is called, under the hood the system creates all the nodes associated with the workflow function body 4. The code within the `if ctx.compilation_state is not None:` conditional is executed at compile time, which is where we extract the first and last nodes associated with the workflow function body at index `1` and `-2`. 5. The `>>` right shift operator ensures that `before_node` executes before the first node and `after_node` executes after the last node of the main workflow function body. ## Defining the DAG We define two tasks that will constitute the workflow. ```python @union.task def t1(x: float) -> float: return x - 1 @union.task def t2(x: float) -> float: return x**2 ``` And then create our decorated workflow: ```python @union.workflow @setup_teardown(before=setup, after=teardown) def decorating_workflow(x: float) -> float: return t2(x=t1(x=x)) ``` ## Run the example on the Flyte cluster To run the provided workflow on the Flyte cluster, use the following command: ```bash union run --remote decorating_workflows.py decorating_workflow --x 10.0 ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/programming/intratask_checkpoints === # Intratask checkpoints A checkpoint in Flyte serves to recover a task from a previous failure by preserving the task's state before the failure and resuming from the latest recorded state. ## Why intratask checkpoints? The inherent design of Flyte, being a workflow engine, allows users to break down operations, programs or ideas into smaller tasks within workflows. In the event of a task failure, the workflow doesn't need to rerun the previously completed tasks. Instead, it can retry the specific task that encountered an issue. Once the problematic task succeeds, it won't be rerun. Consequently, the natural boundaries between tasks act as implicit checkpoints. However, there are scenarios where breaking a task into smaller tasks is either challenging or undesirable due to the associated overhead. This is especially true when running a substantial computation in a tight loop. In such cases, users may consider splitting each loop iteration into individual tasks using dynamic workflows. Yet, the overhead of spawning new tasks, recording intermediate results, and reconstructing the state can incur additional expenses. ### Use case: Model training An exemplary scenario illustrating the utility of intra-task checkpointing is during model training. In situations where executing multiple epochs or iterations with the same dataset might be time-consuming, setting task boundaries can incur a high bootstrap time and be costly. Flyte addresses this challenge by providing a mechanism to checkpoint progress within a task execution, saving it as a file or set of files. In the event of a failure, the checkpoint file can be re-read to resume most of the state without rerunning the entire task. This feature opens up possibilities to leverage alternate, more cost-effective compute systems, such as [AWS spot instances](https://aws.amazon.com/ec2/spot/), [GCP pre-emptible instances](https://cloud.google.com/compute/docs/instances/preemptible) and others. These instances offer great performance at significantly lower price points compared to their on-demand or reserved counterparts. This becomes feasible when tasks are constructed in a fault-tolerant manner. For tasks running within a short duration, e.g., less than 10 minutes, the likelihood of failure is negligible, and task-boundary-based recovery provides substantial fault tolerance for successful completion. However, as the task execution time increases, the cost of re-running it also increases, reducing the chances of successful completion. This is precisely where Flyte's intra-task checkpointing proves to be highly beneficial. Here's an example illustrating how to develop tasks that leverage intra-task checkpointing. It's important to note that Flyte currently offers the low-level API for checkpointing. Future integrations aim to incorporate higher-level checkpointing APIs from popular training frameworks like Keras, PyTorch, Scikit-learn, and big-data frameworks such as Spark and Flink, enhancing their fault-tolerance capabilities. Create a file called `checkpoint.py`: Import the required libraries: ```python import union from flytekit.exceptions.user import FlyteRecoverableException RETRIES = 3 ``` We define a task to iterate precisely `n_iterations`, checkpoint its state, and recover from simulated failures: ```python # Define a task to iterate precisely `n_iterations`, checkpoint its state, and recover from simulated failures. @union.task(retries=RETRIES) def use_checkpoint(n_iterations: int) -> int: cp = union.current_context().checkpoint prev = cp.read() start = 0 if prev: start = int(prev.decode()) # Create a failure interval to simulate failures across 'n' iterations and then succeed after configured retries failure_interval = n_iterations // RETRIES index = 0 for index in range(start, n_iterations): # Simulate a deterministic failure for demonstration. Showcasing how it eventually completes within the given retries if index > start and index % failure_interval == 0: raise FlyteRecoverableException(f"Failed at iteration {index}, failure_interval {failure_interval}.") # Save progress state. It is also entirely possible to save state every few intervals cp.write(f"{index + 1}".encode()) return index ``` The checkpoint system offers additional APIs. The code can be found at [checkpointer code](https://github.com/flyteorg/flytekit/blob/master/flytekit/core/checkpointer.py). Create a workflow that invokes the task: The task will automatically undergo retries in the event of a [FlyteRecoverableException](../../api-reference/flytekit-sdk/packages/flytekit.exceptions.base#flytekitexceptionsbaseflyterecoverableexception) ```python @union.workflow def checkpointing_example(n_iterations: int) -> int: return use_checkpoint(n_iterations=n_iterations) ``` The local checkpoint is not utilized here because retries are not supported: ```python if __name__ == "__main__": try: checkpointing_example(n_iterations=10) except RuntimeError as e: # noqa : F841 # Since no retries are performed, an exception is expected when run locally pass ``` ## Run the example on the Flyte cluster To run the provided workflow on the Flyte cluster, use the following command: ```bash pyflyte run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/advanced_composition/advanced_composition/checkpoint.py \ checkpointing_example --n_iterations 10 ``` ```bash union run --remote checkpoint.py checkpointing_example --n_iterations 10 ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/programming/waiting_for_external_inputs === # Waiting for external inputs There are use cases where you may want a workflow execution to pause, only to continue when some time has passed or when it receives some inputs that are external to the workflow execution inputs. You can think of these as execution-time inputs, since they need to be supplied to the workflow after it's launched. Examples of this use case would be: 1. **Model Deployment**: A hyperparameter-tuning workflow that trains `n` models, where a human needs to inspect a report before approving the model for downstream deployment to some serving layer. 2. **Data Labeling**: A workflow that iterates through an image dataset, presenting individual images to a human annotator for them to label. 3. **Active Learning**: An [active learning](https://en.wikipedia.org/wiki/Active_learning_(machine_learning)) workflow that trains a model, shows examples for a human annotator to label based on which examples it's least/most certain about or would provide the most information to the model. These use cases can be achieved in Flyte with the `flytekit.sleep`, `flytekit.wait_for_input`, and `flytekit.approve` workflow nodes. Although all of the examples above are human-in-the-loop processes, these constructs allow you to pass inputs into a workflow from some arbitrary external process (human or machine) in order to continue. > [!NOTE] > These functions can only be used inside `@union.workflow`-decorated > functions, `@union.dynamic`-decorated functions, or > imperative workflows. ## Pause executions with the `sleep` node The simplest case is when you want your workflow to `flytekit.sleep` for some specified amount of time before continuing. Though this type of node may not be used often in a production setting, you might want to use it, for example, if you want to simulate a delay in your workflow to mock out the behavior of some long-running computation. ```python from datetime import timedelta import union from flytekit import sleep @union.task def long_running_computation(num: int) -> int: """A mock task pretending to be a long-running computation.""" return num @union.workflow def sleep_wf(num: int) -> int: """Simulate a "long-running" computation with sleep.""" # increase the sleep duration to actually make it long-running sleeping = sleep(timedelta(seconds=10)) result = long_running_computation(num=num) sleeping >> result return result ``` As you can see above, we define a simple `add_one` task and a `sleep_wf` workflow. We first create a `sleeping` and `result` node, then order the dependencies with the `>>` operator such that the workflow sleeps for 10 seconds before kicking off the `result` computation. Finally, we return the `result`. > [!NOTE] > You can learn more about the `>>` chaining operator **Programming > Chaining Entities**. Now that you have a general sense of how this works, let's move onto the `flytekit.wait_for_input` workflow node. ## Supply external inputs with `wait_for_input` With the `flytekit.wait_for_input` node, you can pause a workflow execution that requires some external input signal. For example, suppose that you have a workflow that publishes an automated analytics report, but before publishing it you want to give it a custom title. You can achieve this by defining a `wait_for_input` node that takes a `str` input and finalizes the report: ```python import typing from flytekit import wait_for_input @union.task def create_report(data: typing.List[float]) -> dict: # o0 """A toy report task.""" return { "mean": sum(data) / len(data), "length": len(data), "max": max(data), "min": min(data), } @union.task def finalize_report(report: dict, title: str) -> dict: return {"title": title, **report} @union.workflow def reporting_wf(data: typing.List[float]) -> dict: report = create_report(data=data) title_input = wait_for_input("title", timeout=timedelta(hours=1), expected_type=str) return finalize_report(report=report, title=title_input) ``` Let's break down what's happening in the code above: - In `reporting_wf` we first create the raw `report`. - Then, we define a `title` node that will wait for a string to be provided through the Flyte API, which can be done through the Flyte UI or through `FlyteRemote` (more on that later). This node will time out after 1 hour. - Finally, we pass the `title_input` promise into `finalize_report`, which attaches the custom title to the report. > [!NOTE] > The `create_report` task is just a toy example. In a realistic example, this > report might be an HTML file or set of visualizations. This can be rendered > in the Flyte UI with **Development cycle > Decks**. As mentioned in the beginning of this page, this construct can be used for selecting the best-performing model in cases where there isn't a clear single metric to determine the best model, or if you're doing data labeling using a Flyte workflow. ## Continue executions with `approve` Finally, the `flytekit.approve` workflow node allows you to wait on an explicit approval signal before continuing execution. Going back to our report-publishing use case, suppose that we want to block the publishing of a report for some reason (e.g. if they don't appear to be valid): ```python from flytekit import approve @union.workflow def reporting_with_approval_wf(data: typing.List[float]) -> dict: report = create_report(data=data) title_input = wait_for_input("title", timeout=timedelta(hours=1), expected_type=str) final_report = finalize_report(report=report, title=title_input) # approve the final report, where the output of approve is the final_report # dictionary. return approve(final_report, "approve-final-report", timeout=timedelta(hours=2)) ``` The `approve` node will pass the `final_report` promise through as the output of the workflow, provided that the `approve-final-report` gets an approval input via the Flyte UI or Flyte API. You can also use the output of the `approve` function as a promise, feeding it to a subsequent task. Let's create a version of our report-publishing workflow where the approval happens after `create_report`: ```python @union.workflow def approval_as_promise_wf(data: typing.List[float]) -> dict: report = create_report(data=data) title_input = wait_for_input("title", timeout=timedelta(hours=1), expected_type=str) # wait for report to run so that the user can view it before adding a custom # title to the report report >> title_input final_report = finalize_report( report=approve(report, "raw-report-approval", timeout=timedelta(hours=2)), title=title_input, ) return final_report ``` ## Working with conditionals The node constructs by themselves are useful, but they become even more useful when we combine them with other Flyte constructs, like **Programming > Conditionals**. To illustrate this, let's extend the report-publishing use case so that we produce an "invalid report" output in case we don't approve the final report: ```python from flytekit import conditional @union.task def invalid_report() -> dict: return {"invalid_report": True} @union.workflow def conditional_wf(data: typing.List[float]) -> dict: report = create_report(data=data) title_input = wait_for_input("title-input", timeout=timedelta(hours=1), expected_type=str) # Define a "review-passes" wait_for_input node so that a human can review # the report before finalizing it. review_passed = wait_for_input("review-passes", timeout=timedelta(hours=2), expected_type=bool) report >> review_passed # This conditional returns the finalized report if the review passes, # otherwise it returns an invalid report output. return ( conditional("final-report-condition") .if_(review_passed.is_true()) .then(finalize_report(report=report, title=title_input)) .else_() .then(invalid_report()) ) ``` On top of the `approved` node, which we use in the `conditional` to determine which branch to execute, we also define a `disapprove_reason` gate node, which will be used as an input to the `invalid_report` task. ## Sending inputs to `wait_for_input` and `approve` nodes Assuming that you've registered the above workflows on a Flyte cluster that's been started with **Programming > Waiting for external inputs > flytectl demo start**, there are two ways of using `wait_for_input` and `approve` nodes: ### Using the Flyte UI If you launch the `reporting_wf` workflow on the Flyte UI, you'll see a **Graph** view of the workflow execution like this: ![Reporting workflow wait for input graph](../../_static/images/user-guide/programming/waiting-for-external-inputs/wait-for-input-graph.png) Clicking on the play-circle icon of the `title` task node or the **Resume** button on the sidebar will create a modal form that you can use to provide the custom title input. ![Reporting workflow wait for input form](../../_static/images/user-guide/programming/waiting-for-external-inputs/wait-for-input-form.png) ### Using `FlyteRemote` For many cases it's enough to use Flyte UI to provide inputs/approvals on gate nodes. However, if you want to pass inputs to `wait_for_input` and `approve` nodes programmatically, you can use the `FlyteRemote.set_signal` method. Using the `gate_node_with_conditional_wf` workflow, the example below allows you to set values for `title-input` and `review-passes` nodes. ```python import typing from flytekit.remote.remote import FlyteRemote from flytekit.configuration import Config remote = FlyteRemote( Config.for_sandbox(), default_project="flytesnacks", default_domain="development", ) # First kick off the workflow flyte_workflow = remote.fetch_workflow( name="core.control_flow.waiting_for_external_inputs.conditional_wf" ) # Execute the workflow execution = remote.execute(flyte_workflow, inputs={"data": [1.0, 2.0, 3.0, 4.0, 5.0]}) # Get a list of signals available for the execution signals = remote.list_signals(execution.id.name) # Set a signal value for the "title" node. Make sure that the "title-input" # node is in the `signals` list above remote.set_signal("title-input", execution.id.name, "my report") # Set signal value for the "review-passes" node. Make sure that the "review-passes" # node is in the `signals` list above remote.set_signal("review-passes", execution.id.name, True) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/programming/nested-parallelism === # Nested parallelism For exceptionally large or complicated workflows that canโ€™t be adequately implemented as dynamic workflows or map tasks, it can be beneficial to have multiple levels of workflow parallelization. This is useful for multiple reasons: - Better code organization - Better code reuse - Better testing - Better debugging - Better monitoring, since each subworkflow can be run independently and monitored independently - Better performance and scale, since each subworkflow is executed as a separate workflow and thus can be distributed among different propeller workers and shards. This allows for better parallelism and scale. ## Nested dynamic workflows You can use nested dynamic workflows to break down a large workflow into smaller workflows and then compose them together to form a hierarchy. In this example, a top-level workflow uses two levels of dynamic workflows to process a list through some simple addition tasks and then flatten the list again. ### Example code ```python """ A core workflow parallelized as six items with a chunk size of two will be structured as follows: multi_wf -> level1 -> level2 -> core_wf -> step1 -> step2 -> core_wf -> step1 -> step2 level2 -> core_wf -> step1 -> step2 -> core_wf -> step1 -> step2 level2 -> core_wf -> step1 -> step2 -> core_wf -> step1 -> step2 """ import union @union.task def step1(a: int) -> int: return a + 1 @union.task def step2(a: int) -> int: return a + 2 @union.workflow def core_wf(a: int) -> int: return step2(a=step1(a=a)) core_wf_lp = union.LaunchPlan.get_or_create(core_wf) @union.dynamic def level2(l: list[int]) -> list[int]: return [core_wf_lp(a=a) for a in l] @union.task def reduce(l: list[list[int]]) -> list[int]: f = [] for i in l: f.extend(i) return f @union.dynamic def level1(l: list[int], chunk: int) -> list[int]: v = [] for i in range(0, len(l), chunk): v.append(level2(l=l[i:i + chunk])) return reduce(l=v) @union.workflow def multi_wf(l: list[int], chunk: int) -> list[int]: return level1(l=l, chunk=chunk) ``` Overrides let you add additional arguments to the launch plan you are looping over in the dynamic. Here we add caching: ```python @union.task def increment(num: int) -> int: return num + 1 @union.workflow def child(num: int) -> int: return increment(num=num) child_lp = union.LaunchPlan.get_or_create(child) @union.dynamic def spawn(n: int) -> list[int]: l = [] for i in [1,2,3,4,5]: l.append(child_lp(num=i).with_overrides(cache=True, cache_version="1.0.0")) # you can also pass l to another task if you want return l ``` ## Mixed parallelism This example is similar to nested dynamic workflows, but instead of using a dynamic workflow to parallelize a core workflow with serial tasks, we use a core workflow to call a map task, which processes both inputs in parallel. This workflow has one less layer of parallelism, so the outputs wonโ€™t be the same as those of the nested parallelization example, but it does still demonstrate how you can mix these different approaches to achieve concurrency. ### Example code ```python """ A core workflow parallelized as six items with a chunk size of two will be structured as follows: multi_wf -> level1 -> level2 -> mappable -> mappable level2 -> mappable -> mappable level2 -> mappable -> mappable """ import union @union.task def mappable(a: int) -> int: return a + 2 @union.workflow def level2(l: list[int]) -> list[int]: return union.map(mappable)(a=l) @union.task def reduce(l: list[list[int]]) -> list[int]: f = [] for i in l: f.extend(i) return f @union.dynamic def level1(l: list[int], chunk: int) -> list[int]: v = [] for i in range(0, len(l), chunk): v.append(level2(l=l[i : i + chunk])) return reduce(l=v) @union.workflow def multi_wf(l: list[int], chunk: int) -> list[int]: return level1(l=l, chunk=chunk) ``` ## Design considerations While you can nest even further if needed, or incorporate map tasks if your inputs are all the same type, the design of your workflow should be informed by the actual data youโ€™re processing. For example, if you have a big library of music from which youโ€™d like to extract the lyrics, the first level could loop through all the albums, and the second level could process each song. If youโ€™re just processing an enormous list of the same input, itโ€™s best to keep your code simple and let the scheduler handle optimizing the execution. Additionally, unless you need dynamic workflow features like mixing and matching inputs and outputs, itโ€™s usually most efficient to use a map task, which has the added benefit of keeping the UI clean. You can also choose to limit the scale of parallel execution at a few levels. The max_parallelism attribute can be applied at the workflow level and will limit the number of parallel tasks being executed. (This is set to 25 by default.) Within map tasks, you can specify a concurrency argument, which will limit the number of mapped tasks that can run in parallel at any given time. === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/programming/failure-node === # Failure node The failure node feature enables you to designate a specific node to execute in the event of a failure within your workflow. For example, a workflow involves creating a cluster at the beginning, followed by the execution of tasks, and concludes with the deletion of the cluster once all tasks are completed. However, if any task within the workflow encounters an error, the system will abort the entire workflow and wonโ€™t delete the cluster. This poses a challenge if you still need to clean up the cluster even in a task failure. To address this issue, you can add a failure node into your workflow. This ensures that critical actions, such as deleting the cluster, are executed even in the event of failures occurring throughout the workflow execution. ```python import typing import union from flytekit import WorkflowFailurePolicy from flytekit.types.error.error import FlyteError @union.task def create_cluster(name: str): print(f"Creating cluster: {name}") ``` Create a task that will fail during execution: ```python # Create a task that will fail during execution @union.task def t1(a: int, b: str): print(f"{a} {b}") raise ValueError("Fail!") ``` Create a task that will be executed if any of the tasks in the workflow fail: ```python @union.task def clean_up(name: str, err: typing.Optional[FlyteError] = None): print(f"Deleting cluster {name} due to {err}") ``` Specify the `on_failure` to a cleanup task. This task will be executed if any of the tasks in the workflow fail. The inputs of `clean_up` must exactly match the workflowโ€™s inputs. Additionally, the `err` parameter will be populated with the error message encountered during execution. ```python @union.workflow def wf(a: int, b: str): create_cluster(name=f"cluster-{a}") t1(a=a, b=b) ``` By setting the failure policy to `FAIL_AFTER_EXECUTABLE_NODES_COMPLETE` to ensure that the `wf1` is executed even if the subworkflow fails. In this case, both parent and child workflows will fail, resulting in the `clean_up` task being executed twice: ```python # In this case, both parent and child workflows will fail, # resulting in the `clean_up` task being executed twice. @union.workflow(on_failure=clean_up, failure_policy=WorkflowFailurePolicy.FAIL_AFTER_EXECUTABLE_NODES_COMPLETE) def wf1(name: str = "my_cluster"): c = create_cluster(name=name) subwf(name="another_cluster") t = t1(a=1, b="2") d = delete_cluster(name=name) c >> t >> d ``` You can also set the `on_failure` to a workflow. This workflow will be executed if any of the tasks in the workflow fail: ```python @union.workflow(on_failure=clean_up_wf) def wf2(name: str = "my_cluster"): c = create_cluster(name=name) t = t1(a=1, b="2") d = delete_cluster(name=name) c >> t >> d ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/user-guide/faq === # FAQ ## Onboarding my organization to Union.ai ### What information does Union.ai need to set up my service? When you initially onboard your organization to Union.ai you must specify which cloud provider(s) you wish to use and the configuration of the machine types you want. For details, see **Configuring your data plane**. ### How do I change the machine types in my cluster? If you have already been onboarded and wish to change your machine types, Union.ai will need to re-configure your node groups (in AWS) or instance groups (in GCP). To initiate the process, submit the [Node Group Configuration Change form](https://wkf.ms/3pGNJqh). ## Data storage and handling ### How does Union.ai store my data? When data is passed from task to task in a workflow (and output at the end of the workflow), the workflow engine manages the transfer of these values. The system distinguishes between metadata and raw data. Primitive values (`int`, `str`, etc.) are stored directly in the metadata store while complex data objects (`pandas.DataFrame`, `FlyteFile`, etc.) are stored by reference with the reference in metadata and the actual data in the raw data store. By default, both metadata and raw data are stored in Union.ai's internal object store, located in your data plane in a pre-configured S3/GCS bucket. For more details see **Data input/output > Task input and output**. ### Can I change the raw data storage location? Yes. See **Data input/output > Task input and output > Changing the raw data storage location**. ### Can I use my own blob store for data storage that I handle myself? Yes. You can certainly configure your own blob storage and then use your chosen library (like `boto3`, for example) to interact with that storage within your task code. The only caveat is that you must ensure that your task code has access to the storage. See **Enabling AWS resources > Enabling AWS S3**, **Enabling GCP resources > Enabling Google Cloud Storage**, or **Enabling Azure resources > Enabling Azure Blob Storage**. ### Can I control access to my own blob store? Yes. As with all resources used by your task code, the storage must be accessible from within the cluster running that code on your data plane. However, the data plane is your own, and you have full control over access. See **Enabling AWS resources > Enabling AWS S3**, **Enabling GCP resources > Enabling Google Cloud Storage**, or **Enabling Azure resources > Enabling Azure Blob Storage**. ### Could someone maliciously delete or otherwise access my raw data? No. Your raw data resides in your data plane and is stored either in the default raw data storage or in storage that you set up yourself. In either case, you control access to it. The Union.ai team does have access to your data plane for purposes of maintenance but does not have access to your raw data, secrets in secret managers, database, etc. unless you choose to permit such access. Having said that, since the data plane is yours, you are ultimately responsible for preventing access by malicious third parties. ### Can I use s3fs from within a task? Yes, but you probably don't need to. [`s3fs`](https://github.com/s3fs-fuse/s3fs-fuse) is a FUSE-based file system backed by Amazon S3. It is possible to set up `s3fs` in your task container image and use it from within your task code. However, in most cases using either `FlyteFile`/`FlyteDirectory` or a library like `boto3` to access an S3 bucket directly is preferred (and easier). If you do need to use `s3fs`, here are the basic steps: * Set up the S3 bucket that you wish to access. * Enable access to the bucket from your task code by configuring an appropriate IAM policy. See **Enabling AWS resources > Enabling AWS S3**. * Specify your task container image to have `s3fs` correctly installed and configured. * In the task decorator, configure a `PodTemplate` to run the task container in privileged mode (see links below). * In your task code, invoke the `s3fs` command line tool to mount the S3-backed volume. For example: ```python subprocess.run(['s3fs', bucket_and_path, mount_point, '-o', 'iam_role=auto'], check=True) ``` See also: * [Configure a Security Context for a Pod or Container](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) ### Can I use BigQuery from within a task? If your Union.ai data plane is running on GCP, access to BigQuery should be enabled by default and bound to the default Google Service Account (referred to in this documentation as **\**). For details see **Enabling GCP resources**. If you want to bind it to a different GSA, follow the instructions in **Enabling GCP resources > Enabling BigQuery**. To actually access your BigQuery instance from your code, you will need to use a `BigQueryTask`. For details see **Connectors > BigQuery connector**. ## `FlyteFile` and `FlyteDirectory` ### Where do `FlyteFile` and `FlyteDirectory` store their data? **Data input/output > FlyteFile and FlyteDirectory** are two Python classes provided by Union.ai to make it easy to pass files from one task to the next within a workflow. They do this by wrapping a file or directory location path and, if necessary, uploading the referenced file to Union.ai's internal object store to persist it across task containers. ### Can I accidentally overwrite `FlyteFil`e data? In general, no. When a task returns a **Data input/output > FlyteFile and FlyteDirectory** whose source is local to the origin container, Union.ai automatically uploads it to a location with a randomized path in the raw data store. This ensures that subsequent runs will not overwrite earlier data. ### Can I use my own blob store for `FlyteFile` and `FlyteDirectory` data storage? Yes. If you do not want to use the default raw output store that is provided with your data plane you can configure your own storage. ### How do the typed aliases of `FlyteFile` and `FlyteDirectory` work? `FlyteFile` and `FlyteDirectory` include specific type annotations such as `PDFFile`, `JPEGImageFile`, and so forth. These aliases can be used when handling a file or directory of the specified type. For details see **Data input/output > FlyteFile and FlyteDirectory > Typed aliases**. ## Building and running workflows ### What SDK should I download and use in workflow code? You should install the `union` package, which will install the Union and Flytekit SDKs and the `union` command-line tool. You will need to use theย Flytekitย SDK the majority of the time in the code to import core features and use theย Unionย SDK for Union.ai-specific features, such as artifacts. ### How do I authenticate `uctl` and `union` CLIs to Union.ai? The command-line tools `uctl` and `union` need to authenticate in order to connect with your Union.ai instance (for example, when registering a workflow). There are three ways to set up authentication. 1. **PKCE**: This is the default method. When using this method, a browser pops up to authenticate the user. 2. **DeviceFlow**: A URL will be output to your terminal. Navigate to it in your browser and follow the directions. 3. **ClientSecret:** This is the headless option. It can be used, for example, by CI bots. With this method, you create a Union.ai application and configure your tools to pass the Client ID and App Secret to Union.ai. These methods are all configured in the `config.yaml` that your `uctl` or `union` command uses. See **Development cycle > Authentication** for full details. Note that if you wish to run or register workflows in a remote SSH session, you will need to authenticate using the DeviceFlow or ClientSecret methods as PKCE attempts to open a local browser from the CLI. ### How do I specify resource requirements for a task? You can specify either `requests` or `limits` (or both) on the resources that will be used by a specific task when it runs in its container. This is done by setting the `requests` or `limits` property in the `@union.task` decorator to a `Resources` configuration object. Within the `Resources` object you can specify the number of CPU cores, the number of GPU cores, the amount of main memory, the amount of persistent storage, and the amount of ephemeral storage. You can also override the settings in the `@union.task` in a for more fine-grained control using the `with_overrides` method when invoking the task function. See **Core concepts > Tasks > Task hardware environment > Customizing task resources** for more information. ### What command-line tools should I use to register and run workflows? You should use the `union` CLI to register and run workflows and perform other operations on the command line. The `union` CLI is installed when you install the `union` package, which will also install the Union.ai and Flytekit SDKs. ### How do I fix import errors when running workflows remotely? If you run your workflows with `union run --remote ...`, you may encounter import errors when importing functions, classes, or variables from other modules in your project repository. For example, if you have the following repository structure, and you want to import a model from `my_model.py`, some constants from `constants.py`, and a helper function from `utils.py` in a task that is defined in `my_workflow.py`, you will encounter import errors unless these Python modules were explicitly added to the image used by the task, since the container running the task does not recognize these modules by default. ```shell โ”œโ”€โ”€ requirements.txt โ””โ”€โ”€ my_lib โ”œโ”€โ”€ __init__.py โ”œโ”€โ”€ models โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ””โ”€โ”€ my_model.py โ””โ”€โ”€ workflows โ”œโ”€โ”€ __init__.py โ”œโ”€โ”€ constants.py โ”œโ”€โ”€ my_workflow.py โ””โ”€โ”€ workflow_helper_functions โ”œโ”€โ”€ __init__.py โ””โ”€โ”€ utils.py ``` Instead of building a custom Dockerfile that copies all the files and modules in your repository structure, you can do one of the following: 1. Use the `--copy-all` flag in `union run --remote ...` 2. Use `union register` to register your workflow and run it later. Both of these methods work by adding all the files within your local project root to the container running your tasks. The project root is defined as the directory immediately above the highest-level directory containing an `__init__.py` file. ### What happens if an automated process launches a very large number of workflows? By default, Union.ai has a built-in limiting mechanism that prevents more than 10,000 concurrent workflow executions per data plane cluster (equivalently, per organization). This limit can be adjusted on a per-customer basis (talk to the Union.ai team). Executions beyond the limit will be executed as soon as resources become available. While waiting, the workflow execution will be reported as in the UNKNOWN state. This limit prevents workflow requests from overwhelming the cluster and, in effect, performing a self-caused denial of service attack. ### How can I constrain the number of parallel executions for large, complex workflows? Workflows can quickly get complex when dynamic workflows iterate over varying length inputs, workflows call subworkflows, and map tasks iterate over a series of inputs. There are two levers to control the parallelism of a workflow: `max_parallelism` which controls parallelism at a workflow level, and `concurrency` which controls parallelism at a map task level. Another way of thinking about this is that `max_parallelism` controls the number of simultaneous executions of all tasks _except_ for map tasks which are controlled separately. This means that the total number of simultaneous executions during a workflow run cannot exceed `max_parallelism * concurrency` which would be the case if each parallel execution at the workflow level had its own map task. By default, `max_parallelism` is set to 25. If `concurrency` is not set for a map task, the current default behavior is to execute over all inputs to the map task. The trade-off that must be balanced when setting `max_parallelism` and `concurrency` is with resource availability at a workflow level. If parallelism is too high, tasks can time out before resources can be allocated to them, making it important to consider the resource requirements of your tasks that will run in parallel. When interpreting parallelism in the UI, it is important to note that dynamic workflows will immediately list all planned executions, even if the number exceeds `max_parallelism`. However, this does not mean that all the executions are running. By toggling any embedded tasks or subworkflows, you should see an UNKNOWN status for any tasks that have not yet been processes due to the limitations of `max_parallelism`. === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials === # Tutorials This section provides tutorials that walk you through the process of building AI/ML applications on Union.ai. The example applications range from training XGBoost models in tabular datasets to fine-tuning large language models for text generation tasks. ### ๐Ÿ”— **Language Models > Sentiment Classifier** Fine-tune a pre-trained language model in the IMDB dataset for sentiment classification. ### ๐Ÿ”— [Agentic Retrieval Augmented Generation](language-models/agentic-rag) Build an agentic retrieval augmented generation system with ChromaDB and Langchain. ### ๐Ÿ”— **Language Models > Soft Clustering Hdbscan** Use HDBSCAN soft clustering with headline embeddings and UMAP on GPUs. ### ๐Ÿ”— [Deploy a Fine-Tuned Llama Model to an iOS App with MLC-LLM](language-models/llama_edge_deployment) Fine-tune a Llama 3 model on the Cohere Aya Telugu subset and generate a model artifact for deployment as an iOS app. ### ๐Ÿ”— **Parallel Processing and Job Scheduling > Reddit Slack Bot** Securely store Reddit and Slack authentication data while pushing relevant Reddit posts to slack on a consistent basis. ### ๐Ÿ”— **Parallel Processing and Job Scheduling > Wikipedia Embeddings** Create embeddings for the Wikipedia dataset, powered by Union.ai actors. ### ๐Ÿ”— **Time Series > Time Series Forecaster Comparison** Visually compare the output of various time series forecasters while maintaining lineage of the training and forecasted data. ### ๐Ÿ”— **Time Series > Gluonts Time Series** Train and evaluate a time series forecasting model with GluonTS. ### ๐Ÿ”— **Finance > Credit Default Xgboost** Use NVIDIA RAPIDS `cuDF` DataFrame library and `cuML` machine learning to predict credit default. ### ๐Ÿ”— **Bioinformatics > Alignment** Pre-process raw sequencing reads, build an index, and perform alignment to a reference genome using the Bowtie2 aligner. ### ๐Ÿ”— [Video Dubbing with Open-Source Models](multimodal-ai/video-dubbing) Use open-source models to dub videos. ### ๐Ÿ”— [Efficient Named Entity Recognition with vLLM](language-models/vllm-serving-on-actor) Serve a vLLM model on a warm container and trigger inference automatically with artifacts. ### ๐Ÿ”— **Diffusion models > Mochi Video Generation** Run the Mochi 1 text-to-video generation model by Genmo on Union.ai. ### ๐Ÿ”— **Compound AI Systems > Pdf To Podcast Blueprint** Leverage Union.ai to productionize NVIDIA blueprint workflows. ### ๐Ÿ”— **Retrieval Augmented Generation > Building a Contextual RAG Workflow with Together AI** Build a contextual RAG workflow for enterprise use. ## Subpages - **Bioinformatics** - **Compound AI Systems** - **Diffusion models** - **Finance** - **Language Models** - **Parallel Processing and Job Scheduling** - **Retrieval Augmented Generation** - **Serving** - **Time Series** === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/bioinformatics === # Bioinformatics Bioinformatics encompasses all the ways we aim to solve biological problems by computational means. Union.ai provides a number of excellent abstractions and features for solving such problems in a reliable, reproducible and ergonomic way. ## Subpages - **Bioinformatics > Alignment** === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/bioinformatics/alignment === --- **Source**: tutorials/bioinformatics/alignment.md **URL**: /docs/v1/selfmanaged/tutorials/bioinformatics/alignment/ **Weight**: 2 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/compound-ai-systems === # Compound AI Systems Compound AI Systems refer to artificial intelligence systems that combine multiple AI and software components to create a more complex and powerful system. Instead of focusing on a single model or data type, Compound AI Systems combine models with different modalities and software components like databases, vector stores, and more to solve a given task or problem. In the following examples, youโ€™ll explore how Compound AI Systems can be applied to manipulate and analyze various types of data. ## Subpages - **Compound AI Systems > Video Dubbing** - **Compound AI Systems > Text To Sql Agent** - **Compound AI Systems > Pdf To Podcast Blueprint** - **Compound AI Systems > Llama Index Rag** - **Compound AI Systems > Enterprise Rag Blueprint** === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/compound-ai-systems/video-dubbing === --- **Source**: tutorials/compound-ai-systems/video-dubbing.md **URL**: /docs/v1/selfmanaged/tutorials/compound-ai-systems/video-dubbing/ **Weight**: 2 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/compound-ai-systems/text_to_sql_agent === --- **Source**: tutorials/compound-ai-systems/text_to_sql_agent.md **URL**: /docs/v1/selfmanaged/tutorials/compound-ai-systems/text_to_sql_agent/ **Weight**: 9 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/compound-ai-systems/pdf-to-podcast-blueprint === --- **Source**: tutorials/compound-ai-systems/pdf-to-podcast-blueprint.md **URL**: /docs/v1/selfmanaged/tutorials/compound-ai-systems/pdf-to-podcast-blueprint/ **Weight**: 9 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/compound-ai-systems/llama_index_rag === --- **Source**: tutorials/compound-ai-systems/llama_index_rag.md **URL**: /docs/v1/selfmanaged/tutorials/compound-ai-systems/llama_index_rag/ **Weight**: 9 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/compound-ai-systems/enterprise-rag-blueprint === --- **Source**: tutorials/compound-ai-systems/enterprise-rag-blueprint.md **URL**: /docs/v1/selfmanaged/tutorials/compound-ai-systems/enterprise-rag-blueprint/ **Weight**: 9 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/diffusion-models === # Diffusion models Diffusion models are a class of generative models widely used in image generation and other computer vision tasks. They are at the forefront of generative AI, powering popular text-to-image tools such as Stability AIโ€™s Stable Diffusion, OpenAIโ€™s DALL-E (starting from DALL-E 2), MidJourney, and Googleโ€™s Imagen. These models offer significant improvements in performance and stability over earlier architectures for image synthesis, including variational autoencoders (VAEs), generative adversarial networks (GANs), and autoregressive models like PixelCNN. In the examples provided, you'll explore how to apply diffusion models to various use cases. ## Subpages - **Diffusion models > Mochi Video Generation** === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/diffusion-models/mochi-video-generation === --- **Source**: tutorials/diffusion-models/mochi-video-generation.md **URL**: /docs/v1/selfmanaged/tutorials/diffusion-models/mochi-video-generation/ **Weight**: 2 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/finance === # Finance Machine learning (ML) and artificial intelligence (AI) are revolutionizing the finance industry. By processing vast amounts of data, these technologies enable applications such as: risk assessment, fraud detection, and customer segmentation. In these examples, you'll learn how to use Union.ai for finance applications. ## Subpages - **Finance > Credit Default Xgboost** === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/finance/credit-default-xgboost === --- **Source**: tutorials/finance/credit-default-xgboost.md **URL**: /docs/v1/selfmanaged/tutorials/finance/credit-default-xgboost/ **Weight**: 2 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/language-models === # Language Models Language models (LMs) are a type of deep learning model that fundamentally predicts tokens within some context window, either in a [masked](https://huggingface.co/docs/transformers/main/en/tasks/masked_language_modeling) or [causal](https://huggingface.co/docs/transformers/en/tasks/language_modeling) manner. Large language models (LLMs) are a type of language model that have many trainable parameters, which in recent times can be hundreds of millions to trillions of parameters. LMs can also perform a wider range of inference-time tasks compared to traditional ML methods because they can operate on structured and unstructured text data. This means they can perform tasks like text generation, API function calling, summarization, and question-answering. In these examples, you'll learn how to use LMs of different sizes for different use cases, from sentiment analysis to retrieval augmented generation (RAG). ## Subpages - **Language Models > Sentiment Classifier** - **Language Models > Soft Clustering Hdbscan** - **Language Models > Liger Kernel Finetuning** - **Language Models > Data Streaming** === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/language-models/sentiment-classifier === --- **Source**: tutorials/language-models/sentiment-classifier.md **URL**: /docs/v1/selfmanaged/tutorials/language-models/sentiment-classifier/ **Weight**: 2 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/language-models/soft-clustering-hdbscan === --- **Source**: tutorials/language-models/soft-clustering-hdbscan.md **URL**: /docs/v1/selfmanaged/tutorials/language-models/soft-clustering-hdbscan/ **Weight**: 4 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/language-models/liger-kernel-finetuning === --- **Source**: tutorials/language-models/liger-kernel-finetuning.md **URL**: /docs/v1/selfmanaged/tutorials/language-models/liger-kernel-finetuning/ **Weight**: 5 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/language-models/data-streaming === --- **Source**: tutorials/language-models/data-streaming.md **URL**: /docs/v1/selfmanaged/tutorials/language-models/data-streaming/ **Weight**: 5 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/parallel-processing-and-job-scheduling === # Parallel Processing and Job Scheduling Union.ai offers robust capabilities for parallel processing, providing various parallelization strategies allowing for the efficient execution of tasks across multiple nodes. Union.ai also has a flexible job scheduling system. You can schedule workflows to run at specific intervals, or based on external events, ensuring that processes are executed exactly when needed. In this section, we will see some examples demonstrating these features and capabilities. ## Subpages - **Parallel Processing and Job Scheduling > Reddit Slack Bot** - **Parallel Processing and Job Scheduling > Wikipedia Embeddings** === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/parallel-processing-and-job-scheduling/reddit-slack-bot === --- **Source**: tutorials/parallel-processing-and-job-scheduling/reddit-slack-bot.md **URL**: /docs/v1/selfmanaged/tutorials/parallel-processing-and-job-scheduling/reddit-slack-bot/ **Weight**: 2 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/parallel-processing-and-job-scheduling/wikipedia-embeddings === --- **Source**: tutorials/parallel-processing-and-job-scheduling/wikipedia-embeddings.md **URL**: /docs/v1/selfmanaged/tutorials/parallel-processing-and-job-scheduling/wikipedia-embeddings/ **Weight**: 3 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/retrieval-augmented-generation === # Retrieval Augmented Generation Union.ai enables production-grade RAG pipelines with a focus on performance, scalability, and ease of use. In this section, we will see some examples demonstrating how to extract documents from various data sources, create in-memory vector databases, and use them to implement RAG pipelines using LLM providers and Union-hosted LLMs. ## Subpages - **Retrieval Augmented Generation > Agentic Rag** - **Retrieval Augmented Generation > Lance Db Rag** - **Retrieval Augmented Generation > Building a Contextual RAG Workflow with Together AI** === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/retrieval-augmented-generation/agentic-rag === --- **Source**: tutorials/retrieval-augmented-generation/agentic-rag.md **URL**: /docs/v1/selfmanaged/tutorials/retrieval-augmented-generation/agentic-rag/ **Weight**: 3 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/retrieval-augmented-generation/lance-db-rag === --- **Source**: tutorials/retrieval-augmented-generation/lance-db-rag.md **URL**: /docs/v1/selfmanaged/tutorials/retrieval-augmented-generation/lance-db-rag/ **Weight**: 3 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/retrieval-augmented-generation/contextual-rag === # Building a Contextual RAG Workflow with Together AI This notebook walks you through building a Contextual RAG (Retrieval-Augmented Generation) workflow using Together's embedding, reranker, and chat models. It ties together web scraping, embedding generation, and serving into one cohesive application. We take the [existing Contextual RAG Together app](https://docs.together.ai/docs/how-to-implement-contextual-rag-from-anthropic) and make it "production-grade" with Union.ai โ€” ready for enterprise deployment. ![Contextual RAG App](../../_static/images/tutorials/retrieval-augmented-generation/contextual-rag/contextual_rag.png) ## Workflow overview The workflow follows these steps: 1. Fetches all links to Paul Graham's essays. 2. Scrapes web content to retrieve the full text of the essays. 3. Splits the text into smaller chunks for processing. 4. Appends context from the relevant essay to each chunk. 5. Generates embeddings and stores them in a hosted vector database. 6. Creates a keyword index for efficient retrieval. 7. Serves a FastAPI app to expose the RAG functionality. 8. Provides a Gradio app, using the FastAPI endpoint, for an easy-to-use RAG interface. ## Execution approach This workflow is designed for local execution first, allowing you to test and validate it before deploying and scaling it on a Union.ai cluster. This staged approach ensures smooth transitions from development to production. Before running the workflow, make sure to install `union`: ``` pip install union ``` ### Local execution First, we import the required dependencies to ensure the workflow runs smoothly. Next, we define an actor environment, as the workflow relies on actor tasks throughout the process. **Core concepts > Actors** let us reuse a container and its environment across tasks, avoiding the overhead of starting a new container for each task. In this workflow, we define a single actor and reuse it consistently since the underlying components donโ€™t require independent scaling or separate environments. Within the actor environment, we specify the `ImageSpec`, which defines the container image that tasks in the workflow will use. With Union.ai, every task runs in its own dedicated container, requiring a container image. Instead of manually creating a Dockerfile, we define the image specification in Python. When run on Union.ai Serverless, the container image is built remotely, simplifying the setup. We also configure the actorโ€™s replica count to 10, meaning 10 workers are provisioned to handle tasks, allowing up to 10 tasks to run in parallel, provided sufficient resources. The TTL (time to live) is set to 120 seconds, ensuring the actor remains active for this period when no tasks are being processed. Finally, we create a Pydantic `BaseModel` named `Document` to capture metadata for each document used by the RAG app. This model ensures consistent data structuring and smooth integration throughout the workflow. NOTE: Add your Together AI API key (`TOGETHER_API_KEY`) to the `.env` file before running the notebook. ```python import os from pathlib import Path from typing import Annotated, Optional from urllib.parse import urljoin import numpy as np import requests import union from flytekit.core.artifact import Artifact from flytekit.exceptions.base import FlyteRecoverableException from flytekit.types.directory import FlyteDirectory from flytekit.types.file import FlyteFile from pydantic import BaseModel from union.actor import ActorEnvironment import union actor = ActorEnvironment( name="contextual-rag", replica_count=10, ttl_seconds=120, container_image=union.ImageSpec( name="contextual-rag", packages=[ "together==1.3.10", "beautifulsoup4==4.12.3", "bm25s==0.2.5", "pydantic>2", "pymilvus>=2.5.4", "union>=0.1.139", "flytekit>=1.15.0b5", ], ), secret_requests=[ union.Secret( key="together-api-key", env_var="TOGETHER_API_KEY", mount_requirement=union.Secret.MountType.ENV_VAR, ), union.Secret( key="milvus-uri", env_var="MILVUS_URI", mount_requirement=union.Secret.MountType.ENV_VAR, ), union.Secret( key="milvus-token", env_var="MILVUS_TOKEN", mount_requirement=union.Secret.MountType.ENV_VAR, ) ], ) class Document(BaseModel): idx: int title: str url: str content: Optional[str] = None chunks: Optional[list[str]] = None prompts: Optional[list[str]] = None contextual_chunks: Optional[list[str]] = None tokens: Optional[list[list[int]]] = None ``` We begin by defining an actor task to parse the main page of Paul Graham's essays. This task extracts a list of document titles and their respective URLs. Since actor tasks run within the shared actor environment we set up earlier, they efficiently reuse the same container and environment. ```python @actor.task def parse_main_page( base_url: str, articles_url: str, local: bool = False ) -> list[Document]: from bs4 import BeautifulSoup assert base_url.endswith("/"), f"Base URL must end with a slash: {base_url}" response = requests.get(urljoin(base_url, articles_url)) soup = BeautifulSoup(response.text, "html.parser") td_cells = soup.select("table > tr > td > table > tr > td") documents = [] idx = 0 for td in td_cells: img = td.find("img") if img and int(img.get("width", 0)) <= 15 and int(img.get("height", 0)) <= 15: a_tag = td.find("font").find("a") if td.find("font") else None if a_tag: documents.append( Document( idx=idx, title=a_tag.text, url=urljoin(base_url, a_tag["href"]) ) ) idx += 1 if local: return documents[:3] return documents ``` Next, we define an actor task to scrape the content of each document. Using the list of URLs gathered in the previous step, this task extracts the full text of the essays, ensuring that all relevant content is retrieved for further processing. We also set `retries` to `3`, meaning the task will be retried three times before the error is propagated. ```python @actor.task(retries=3) def scrape_pg_essays(document: Document) -> Document: from bs4 import BeautifulSoup try: response = requests.get(document.url) except Exception as e: raise FlyteRecoverableException(f"Failed to scrape {document.url}: {str(e)}") response.raise_for_status() soup = BeautifulSoup(response.text, "html.parser") content = soup.find("font") text = None if content: text = " ".join(content.get_text().split()) document.content = text return document ``` Then, define an actor task to create chunks for each document. Chunks are necessary because we need to append context to each chunk, ensuring the RAG app can process the information effectively. ```python @actor.task(cache=True, cache_version="0.2") def create_chunks(document: Document, chunk_size: int, overlap: int) -> Document: if document.content: content_chunks = [ document.content[i : i + chunk_size] for i in range(0, len(document.content), chunk_size - overlap) ] document.chunks = content_chunks return document ``` Next, we use Together AI to generate context for each chunk of text, using the secret we initialized earlier. The system retrieves relevant context based on the entire document, ensuring accurate and meaningful outputs. Notice that we set **Core concepts > Caching** to `True` for this task to avoid re-running the execution for the same inputs. This ensures that if the document and model remain unchanged, the outputs are retrieved directly from the cache, improving efficiency. Once the context is generated, we map the chunks back to their respective documents. ```python @actor.task(cache=True, cache_version="0.4") def generate_context(document: Document, model: str) -> Document: from together import Together CONTEXTUAL_RAG_PROMPT = """ Given the document below, we want to explain what the chunk captures in the document. {WHOLE_DOCUMENT} Here is the chunk we want to explain: {CHUNK_CONTENT} Answer ONLY with a succinct explanation of the meaning of the chunk in the context of the whole document above. """ client = Together(api_key=os.getenv("TOGETHER_API_KEY")) contextual_chunks = [ f"{response.choices[0].message.content} {chunk}" for chunk in (document.chunks or []) for response in [ client.chat.completions.create( model=model, messages=[ { "role": "user", "content": CONTEXTUAL_RAG_PROMPT.format( WHOLE_DOCUMENT=document.content, CHUNK_CONTENT=chunk, ), } ], temperature=1, ) ] ] # Assign the contextual chunks back to the document document.contextual_chunks = contextual_chunks if contextual_chunks else None return document ``` We define an embedding function to generate embeddings for each chunk. This function converts the chunks into vector representations, which we can store in a vector database for efficient retrieval and processing. Next, we create a vector index and store the embeddings in the [Milvus](https://milvus.io/) vector database. For each embedding, we store the ID, document, and document title. These details ensure the embeddings are ready for efficient retrieval during the RAG process. By setting `cache` to `True`, we avoid redundant upserts or inserts for the same document. Instead, we can add new records or update existing ones only if the content has changed. This approach keeps the vector database up-to-date efficiently, minimizing resource usage while maintaining accuracy. Note: We're using the Milvus hosted vector database to store the embeddings. However, you can replace it with any vector database of your choice based on your requirements. ```python from together import Together def get_embedding(chunk: str, embedding_model: str): client = Together( api_key=os.getenv("TOGETHER_API_KEY") ) outputs = client.embeddings.create( input=chunk, model=embedding_model, ) return outputs.data[0].embedding @actor.task(cache=True, cache_version="0.19", retries=5) def create_vector_index( document: Document, embedding_model: str, local: bool = False ) -> Document: from pymilvus import DataType, MilvusClient if local: client = MilvusClient("test_milvus.db") else: try: client = MilvusClient(uri=os.getenv("MILVUS_URI"), token=os.getenv("MILVUS_TOKEN")) except Exception as e: raise FlyteRecoverableException( f"Failed to connect to Milvus: {e}" ) collection_name = "paul_graham_collection" if not client.has_collection(collection_name): schema = client.create_schema() schema.add_field( "id", DataType.INT64, is_primary=True, auto_id=True ) schema.add_field("document_index", DataType.VARCHAR, max_length=255) schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=1024) schema.add_field("title", DataType.VARCHAR, max_length=255) index_params = client.prepare_index_params() index_params.add_index("embedding", metric_type="COSINE") client.create_collection(collection_name, dimension=512, schema=schema, index_params=index_params) if not document.contextual_chunks: return document # Exit early if there are no contextual chunks # Generate embeddings for chunks embeddings = [get_embedding(chunk[:512], embedding_model) for chunk in document.contextual_chunks] # NOTE: Trimming the chunk for the embedding model's context window embeddings_np = np.array(embeddings, dtype=np.float32) ids = [ f"id{document.idx}_{chunk_idx}" for chunk_idx, _ in enumerate(document.contextual_chunks) ] titles = [document.title] * len(document.contextual_chunks) client.upsert( collection_name, [ {"id": index, "document_index": document_index, "embedding": embedding, "title": title} for index, (document_index, embedding, title) in enumerate(zip(ids, embeddings_np.tolist(), titles)) ] ) return document ``` Lastly, we create a BM25S keyword index to organize the document chunks. This index is great for keyword-based searches and works well alongside vector indexing. We also store a mapping between document IDs and their corresponding contextual chunk data, making it easier to retrieve content during the RAG process. ```python @actor.task(cache=True, cache_version="0.5") def create_bm25s_index(documents: list[Document]) -> tuple[FlyteDirectory, FlyteFile]: import json import bm25s # Prepare data for JSON data = { f"id{doc_idx}_{chunk_idx}": contextual_chunk for doc_idx, document in enumerate(documents) if document.contextual_chunks for chunk_idx, contextual_chunk in enumerate(document.contextual_chunks) } retriever = bm25s.BM25(corpus=list(data.values())) retriever.index(bm25s.tokenize(list(data.values()))) ctx = union.current_context() working_dir = Path(ctx.working_directory) bm25s_index_dir = working_dir / "bm25s_index" contextual_chunks_json = working_dir / "contextual_chunks.json" retriever.save(str(bm25s_index_dir)) # Write the data to a JSON file with open(contextual_chunks_json, "w", encoding="utf-8") as json_file: json.dump(data, json_file, indent=4, ensure_ascii=False) return FlyteDirectory(path=bm25s_index_dir), FlyteFile(contextual_chunks_json) ``` We define a **Core concepts > Workflows > Standard workflows** to execute these tasks in sequence. By using **Retrieval Augmented Generation > Building a Contextual RAG Workflow with Together AI > map tasks**, we run operations in parallel while respecting the resource constraints of each task. This approach **significantly improves execution speed**. We set the concurrency to 2, meaning two tasks will run in parallel. Note that the replica count for actors is set to 10, but this can be overridden at the map task level. We're doing this because having too many parallel clients could cause server availability issues. The final output of this workflow includes the BM25S keyword index and the contextual chunks mapping file, both returned as **Core concepts > Artifacts**. The Artifact Service automatically indexes and assigns semantic meaning to all outputs from Union.ai tasks and workflow executions, such as models, files, or other data. This makes it easy to track, access, and orchestrate pipelines directly through their outputs. In this case, the keyword index and file artifacts are directly used during app serving. We also set up a retrieval task to fetch embeddings for local execution. Once everythingโ€™s in place, we run the workflow and the retrieval task locally, producing a set of relevant chunks. One advantage of running locally is that all tasks and workflows are Python functions, making it easy to test everything before moving to production. This approach allows you to experiment locally and then deploy the same workflow in a production environment, ensuring itโ€™s production-ready. You get the flexibility to test and refine your workflow without compromising on the capabilities needed for deployment. ```python import functools from dataclasses import dataclass from dotenv import load_dotenv load_dotenv() # Ensure the secret (together API key) is present in the .env file BM25Index = Artifact(name="bm25s-index") ContextualChunksJSON = Artifact(name="contextual-chunks-json") @union.workflow def build_indices_wf( base_url: str = "https://paulgraham.com/", articles_url: str = "articles.html", embedding_model: str = "BAAI/bge-large-en-v1.5", chunk_size: int = 250, overlap: int = 30, model: str = "deepseek-ai/DeepSeek-R1", local: bool = True, ) -> tuple[ Annotated[FlyteDirectory, BM25Index], Annotated[FlyteFile, ContextualChunksJSON] ]: tocs = parse_main_page(base_url=base_url, articles_url=articles_url, local=local) scraped_content = union.map(scrape_pg_essays, concurrency=2)(document=tocs) chunks = union.map( functools.partial(create_chunks, chunk_size=chunk_size, overlap=overlap) )(document=scraped_content) contextual_chunks = union.map(functools.partial(generate_context, model=model))( document=chunks ) union.map( functools.partial( create_vector_index, embedding_model=embedding_model, local=local ), concurrency=2 )(document=contextual_chunks) bm25s_index, contextual_chunks_json_file = create_bm25s_index( documents=contextual_chunks ) return bm25s_index, contextual_chunks_json_file @dataclass class RetrievalResults: vector_results: list[list[str]] bm25s_results: list[list[str]] @union.task def retrieve( bm25s_index: FlyteDirectory, contextual_chunks_data: FlyteFile, embedding_model: str = "BAAI/bge-large-en-v1.5", queries: list[str] = [ "What to do in the face of uncertainty?", "Why won't people write?", ], ) -> RetrievalResults: import json import bm25s import numpy as np from pymilvus import MilvusClient client = MilvusClient("test_milvus.db") # Generate embeddings for the queries using Together query_embeddings = [ get_embedding(query, embedding_model) for query in queries ] query_embeddings_np = np.array(query_embeddings, dtype=np.float32) collection_name = "paul_graham_collection" results = client.search( collection_name, query_embeddings_np, limit=5, search_params={"metric_type": "COSINE"}, anns_field="embedding", output_fields=["document_index", "title"] ) # Load BM25S index retriever = bm25s.BM25() bm25_index = retriever.load(save_dir=bm25s_index.download()) # Load contextual chunk data with open(contextual_chunks_data, "r", encoding="utf-8") as json_file: contextual_chunks_data_dict = json.load(json_file) # Perform BM25S-based retrieval bm25s_idx_result = bm25_index.retrieve( query_tokens=bm25s.tokenize(queries), k=5, corpus=np.array(list(contextual_chunks_data_dict.values())), ) # Return results as a dataclass return RetrievalResults( vector_results=results, bm25s_results=bm25s_idx_result.documents.tolist(), ) if __name__ == "__main__": bm25s_index, contextual_chunks_data = build_indices_wf() results = retrieve( bm25s_index=bm25s_index, contextual_chunks_data=contextual_chunks_data ) print(results) ``` ### Remote execution To provide the Together AI API key to the actor during remote execution, we send it as a **Development cycle > Managing secrets > Creating secrets**. We can create this secret using the Union.ai CLI before running the workflow. Simply run the following commands: ``` union create secret together-api-key ``` To run the workflow remotely on a Union.ai cluster, we start by logging into the cluster. ```python !union create login --serverless ``` Then, we initialize a Union.ai remote object to execute the workflow on the cluster. The [UnionRemote](../../user-guide/development-cycle/union-remote) Python API supports functionality similar to that of the Union CLI, enabling you to manage Union.ai workflows, tasks, launch plans and artifacts from within your Python code. ```python from union.remote import UnionRemote remote = UnionRemote(default_project="default", default_domain="development") ``` ```python indices_execution = remote.execute(build_indices_wf, inputs={"local": False}) print(indices_execution.execution_url) ``` We define a launch plan to run the workflow daily. A **Core concepts > Launch plans** serves as a template for invoking the workflow. The scheduled launch plan ensures that the vector database and keyword index are regularly updated, keeping the data fresh and synchronized. Be sure to note the `version` field when registering the launch plan. Each Union entity (task, workflow, launch plan) is automatically versioned, as every entity is associated with a version by default. ```python lp = union.LaunchPlan.get_or_create( workflow=build_indices_wf, name="vector_db_ingestion_activate", schedule=union.CronSchedule( schedule="0 1 * * *" ), # Run every day to update the databases auto_activate=True, ) registered_lp = remote.register_launch_plan(entity=lp) ``` ## Deploy apps We deploy the FastAPI and Gradio applications to serve the RAG app with Union.ai. FastAPI is used to define the endpoint for serving the app, while Gradio is used to create the user interface. When defining the app, we can specify inputs, images (using `ImageSpec`), resources to assign to the app, secrets, replicas, and more. We can organize the app specs into separate files. The FastAPI app spec is available in the `fastapi_app.py` file, and the Gradio app spec is in the `gradio_app.py` file. We retrieve the artifacts and send them as inputs to the FastAPI app. We can then retrieve the app's endpoint to use in the other app. Finally, we either create the app if it doesn't already exist or update it if it does. While weโ€™re using FastAPI and Gradio here, you can use any Python-based front-end and API frameworks to define your apps. ```python import os from union.app import App, Input fastapi_app = App( name="contextual-rag-fastapi", inputs=[ Input( name="bm25s_index", value=BM25Index.query(), download=True, env_var="BM25S_INDEX", ), Input( name="contextual_chunks_json", value=ContextualChunksJSON.query(), download=True, env_var="CONTEXTUAL_CHUNKS_JSON", ), ], container_image=union.ImageSpec( name="contextual-rag-fastapi", packages=[ "together", "bm25s", "pymilvus", "uvicorn[standard]", "fastapi[standard]", "union-runtime>=0.1.10", "flytekit>=1.15.0b5", ], ), limits=union.Resources(cpu="1", mem="3Gi"), port=8080, include=["fastapi_app.py"], args=["uvicorn", "fastapi_app:app", "--port", "8080"], min_replicas=1, max_replicas=1, secrets=[ union.Secret( key="together-api-key", env_var="TOGETHER_API_KEY", mount_requirement=union.Secret.MountType.ENV_VAR ), union.Secret( key="milvus-uri", env_var="MILVUS_URI", mount_requirement=union.Secret.MountType.ENV_VAR, ), union.Secret( key="milvus-token", env_var="MILVUS_TOKEN", mount_requirement=union.Secret.MountType.ENV_VAR, ), ], ) gradio_app = App( name="contextual-rag-gradio", inputs=[ Input( name="fastapi_endpoint", value=fastapi_app.query_endpoint(public=False), env_var="FASTAPI_ENDPOINT", ) ], container_image=union.ImageSpec( name="contextual-rag-gradio", packages=["gradio", "union-runtime>=0.1.5"], ), limits=union.Resources(cpu="1", mem="1Gi"), port=8080, include=["gradio_app.py"], args=[ "python", "gradio_app.py", ], min_replicas=1, max_replicas=1, ) ``` ```python from union.remote._app_remote import AppRemote app_remote = AppRemote(project="default", domain="development") app_remote.create_or_update(fastapi_app) app_remote.create_or_update(gradio_app) ``` The apps will be deployed at the URLs provided in the output, which you can access. Below are some example queries to test the Gradio application: - What did Paul Graham do growing up? - What did the author do during their time in art school? - Can you give me a summary of the author's life? - What did the author do during their time at Yale? - What did the author do during their time at YC? ```python # If you want to stop the apps, hereโ€™s how you can do it: # app_remote.stop(name="contextual-rag-fastapi-app") # app_remote.stop(name="contextual-rag-gradio-app") ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/serving === # Serving Union.ai enables you to implement serving in various contexts: - High throughput batch inference with NIMs, vLLM, and Actors - Low latency online inference using frameworks vLLM, SGLang. - Web endpoints using frameworks like FastAPI and Flask. - Interactive web apps using your favorite Python-based front-end frameworks like Streamlit, Gradio, and more. - Edge inference using MLC-LLM. In this section, we will see examples demonstrating how to implement serving in these contexts using constructs like Union Actors, Serving Apps, and Artifacts. ## Subpages - **Serving > Custom Webhooks** - **Serving > Marimo Wasm** - **Serving > Finetune Unsloth Serve** - **Serving > Modular Max Qwen** - **Serving > Weave** - **Serving > Arize** - **Serving > Llama Edge Deployment** - **Serving > Vllm Serving On Actor** - **Serving > Nim On Actor** === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/serving/custom-webhooks === --- **Source**: tutorials/serving/custom-webhooks.md **URL**: /docs/v1/selfmanaged/tutorials/serving/custom-webhooks/ **Weight**: 2 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/serving/marimo-wasm === --- **Source**: tutorials/serving/marimo-wasm.md **URL**: /docs/v1/selfmanaged/tutorials/serving/marimo-wasm/ **Weight**: 2 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/serving/finetune-unsloth-serve === --- **Source**: tutorials/serving/finetune-unsloth-serve.md **URL**: /docs/v1/selfmanaged/tutorials/serving/finetune-unsloth-serve/ **Weight**: 2 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/serving/modular-max-qwen === --- **Source**: tutorials/serving/modular-max-qwen.md **URL**: /docs/v1/selfmanaged/tutorials/serving/modular-max-qwen/ **Weight**: 2 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/serving/weave === --- **Source**: tutorials/serving/weave.md **URL**: /docs/v1/selfmanaged/tutorials/serving/weave/ **Weight**: 5 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/serving/arize === --- **Source**: tutorials/serving/arize.md **URL**: /docs/v1/selfmanaged/tutorials/serving/arize/ **Weight**: 5 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/serving/llama_edge_deployment === --- **Source**: tutorials/serving/llama_edge_deployment.md **URL**: /docs/v1/selfmanaged/tutorials/serving/llama_edge_deployment/ **Weight**: 6 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/serving/vllm-serving-on-actor === --- **Source**: tutorials/serving/vllm-serving-on-actor.md **URL**: /docs/v1/selfmanaged/tutorials/serving/vllm-serving-on-actor/ **Weight**: 7 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/serving/nim-on-actor === --- **Source**: tutorials/serving/nim-on-actor.md **URL**: /docs/v1/selfmanaged/tutorials/serving/nim-on-actor/ **Weight**: 8 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/time-series === # Time Series Time series analysis is a statistical method used to analyze data points collected over time. Unlike other data types, time series data has a specific order, and the position of each data point is crucial. This allows us to study patterns, trends, and cycles within the data. In these examples, you'll learn how to use Union.ai to forecast and analyze time series data. ## Subpages - **Time Series > Gluonts Time Series** - **Time Series > Time Series Forecaster Comparison** === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/time-series/gluonts-time-series === --- **Source**: tutorials/time-series/gluonts-time-series.md **URL**: /docs/v1/selfmanaged/tutorials/time-series/gluonts-time-series/ **Weight**: 2 === PAGE: https://www.union.ai/docs/v1/selfmanaged/tutorials/time-series/time-series-forecaster-comparison === --- **Source**: tutorials/time-series/time-series-forecaster-comparison.md **URL**: /docs/v1/selfmanaged/tutorials/time-series/time-series-forecaster-comparison/ **Weight**: 3 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations === # Integrations Union supports integration with a variety of third-party services and systems. ## Connectors Union.ai supports [the following connectors out-of-the-box](./connectors/_index). If you don't see the connector you need below, have a look at **Connectors > Creating a new connector**. | Agent | Description | |-------|-------------| | [SageMaker connector](./connectors/sagemaker-inference-connector/_index) | Deploy models and create, as well as trigger inference endpoints on AWS SageMaker. | | [Airflow connector](./connectors/airflow-connector/_index) | Run Airflow jobs in your workflows with the Airflow connector. | | [BigQuery connector](./connectors/bigquery-connector/_index) | Run BigQuery jobs in your workflows with the BigQuery connector. | | [ChatGPT connector](./connectors/chatgpt-connector/_index) | Run ChatGPT jobs in your workflows with the ChatGPT connector. | | [Databricks connector](./connectors/databricks-connector/_index) | Run Databricks jobs in your workflows with the Databricks connector. | | [Memory Machine Cloud connector](./connectors/mmcloud-connector/_index) | Execute tasks using the MemVerge Memory Machine Cloud connector. | | [OpenAI Batch connector](./connectors/openai-batch-connector/_index) | Submit requests for asynchronous batch processing on OpenAI. | | [Perian connector](./connectors/perian-connector/_index) | Execute tasks on Perian Job Platform. | | [Sensor connector](./connectors/sensor/_index) | Run sensor jobs in your workflows with the sensor connector. | | [Slurm connector](./connectors/slurm-connector/_index) | Run Slurm jobs in your workflows with the Slurm connector. | | [Snowflake connector](./connectors//snowflake-connector/_index) | Run Snowflake jobs in your workflows with the Snowflake connector. | ## Subpages - **Connectors** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors === # Connectors Connectors are long-running, stateless services that receive execution requests via gRPC and initiate jobs with appropriate external or internal services. Each connector service is a Kubernetes deployment that receives gRPC requests when users trigger a particular type of task. (For example, the BigQuery connector is tiggered by the invocation of a BigQuery tasks.) The connector service then initiates a job with the appropriate service. Connectors can be run locally as long as the appropriate connection secrets are locally available, since they are spawned in-process. Connectors are designed to be scalable and can handle large workloads efficiently, and decrease load on the core system, since they run outside it. You can also test connectors locally without having to change the backend configuration, streamlining workflow development. Connectors enable two key use cases: * **Asynchronously** launching jobs on hosted platforms (e.g. Databricks or Snowflake). * Calling external **synchronous** services, such as access control, data retrieval, or model inferencing. This section covers all currently available connectors: * [Airflow connector](./airflow-connector/_index) * [BigQuery connector](./bigquery-connector/_index) * [OpenAI ChatGPT connector](./chatgpt-connector/_index) * [OpenAI Batch connector](./openai-batch-connector/_index) * [Databricks connector](./databricks-connector/_index) * [Memory Machine Cloud connector](./mmcloud-connector/_index) * [Perian connector](./perian-connector/_index) * [Sagemaker connector](./sagemaker-inference-connector/_index) * [File sensor connector](./sensor/_index) * [Slurm connector](./slurm-connector/_index) * [Snowflake connector](./snowflake-connector/_index) * **Connectors > DGX connector** ## Creating a new connector If none of the existing connectors meet your needs, you can implement your own connector. There are two types of connectors: **async** and **sync**. * **Async connectors** enable long-running jobs that execute on an external platform over time. They communicate with external services that have asynchronous APIs that support `create`, `get`, and `delete` operations. The vast majority of connectors are async connectors. * **Sync connectors** enable request/response services that return immediate outputs (e.g. calling an internal API to fetch data or communicating with the OpenAI API). > [!NOTE] > While connectors can be written in any programming language since they use a protobuf interface, > we currently only support Python connectors. > We may support other languages in the future. ### Async connector interface specification To create a new async connector, extend the `AsyncConnectorBase` and implement `create`, `get`, and `delete` methods. These methods must be idempotent. - `create`: This method is used to initiate a new job. Users have the flexibility to use gRPC, REST, or an SDK to create a job. - `get`: This method retrieves the job resource (job ID or output literal) associated with the task, such as a BigQuery job ID or Databricks task ID. - `delete`: Invoking this method will send a request to delete the corresponding job. For an example implementation, see the [BigQuery connector code](https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-bigquery/flytekitplugins/bigquery/connector.py). ### Sync connector interface specification To create a new sync connector, extend the `SyncConnectorBase` class and implement a `do` method. This method must be idempotent. - `do`: This method is used to execute the synchronous task, and the worker in Union.ai will be blocked until the method returns. For an example implementation, see the [ChatGPT connector code](https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-openai/flytekitplugins/openai/chatgpt/connector.py). ### Testing your connector locally To test your connector locally, create a class for the connector task that inherits from [`AsyncConnectorExecutorMixin`](https://github.com/flyteorg/flytekit/blob/1bc8302bb7a6cf4c7048a7f93627ee25fc6b88c4/flytekit/extend/backend/base_connector.py#L354). This mixin can handle both asynchronous tasks and synchronous tasks and allows Union to mimic the system's behavior in calling the connector. For testing examples, see the **Connectors > BigQuery connector > Local testing** and **Connectors > Databricks connector > Local testing** documentation. ## Enabling a connector in your Union.ai deployment To enable a connector in your Union.ai deployment, contact the Union.ai team. ## Subpages - **Connectors > Airflow connector** - **Connectors > BigQuery connector** - **Connectors > ChatGPT connector** - **Connectors > Databricks connector** - **Connectors > Memory Machine Cloud connector** - **Connectors > OpenAI Batch connector** - **Connectors > Perian connector** - **Connectors > SageMaker connector** - **Connectors > Sensor connector** - **Connectors > Slurm connector** - **Connectors > Snowflake connector** - **Connectors > DGX connector** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/airflow-connector === # Airflow connector [Apache Airflow](https://airflow.apache.org) is a widely used open source platform for managing workflows with a robust ecosystem. Union.ai provides an Airflow plugin that allows you to run Airflow tasks as Union.ai tasks. This allows you to use the Airflow plugin ecosystem in conjunction with Union.ai's powerful task execution and orchestration capabilities. > [!NOTE] > The Airflow connector does not support all [Airflow operators](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/operators.html). > We have tested many, but if you run into issues, > please [file a bug report](https://github.com/flyteorg/flyte/issues/new?assignees=&labels=bug%2Cuntriaged&projects=&template=bug_report.yaml&title=%5BBUG%5D+). ## Installation To install the plugin, run the following command: `pip install flytekitplugins-airflow` This plugin has two components: * **Airflow compiler:** This component compiles Airflow tasks to Union.ai tasks, so Airflow tasks can be directly used inside the Union.ai workflow. * **Airflow connector:** This component allows you to execute Airflow tasks either locally or on a Union.ai cluster. > [!NOTE] > You don't need an Airflow cluster to run Airflow tasks, since flytekit will > automatically compile Airflow tasks to Union.ai tasks and execute them on the Airflow connector. ## Example usage For an example query, see **Connectors > Airflow connector > Airflow Connector Example Usage Union** ## Local testing Airflow doesn't support local execution natively. However, Union.ai compiles Airflow tasks to Union.ai tasks, which enables you to test Airflow tasks locally in flytekit's local execution mode. > [!NOTE] > In some cases, you will need to store credentials in your local environment when testing locally. ## Subpages - **Connectors > Airflow connector > Airflow Connector Example Usage Union** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/airflow-connector/airflow-connector-example-usage-union === --- **Source**: integrations/connectors/airflow-connector/airflow-connector-example-usage-union.md **URL**: /docs/v1/selfmanaged/integrations/connectors/airflow-connector/airflow-connector-example-usage-union/ **Weight**: 1 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/bigquery-connector === # BigQuery connector ## Installation To install the BigQuery connector, run the following command: This connector is purely a spec. Since SQL is completely portable, there is no need to build a Docker container. ## Example usage For an example query, see **Connectors > BigQuery connector > Bigquery Connector Example Usage Union** ## Local testing To test the BigQuery connector locally, create a class for the connector task that inherits from [AsyncConnectorExecutorMixin](https://github.com/flyteorg/flytekit/blob/1bc8302bb7a6cf4c7048a7f93627ee25fc6b88c4/flytekit/extend/backend/base_connector.py#L354). This mixin can handle asynchronous tasks and allows the SDK to mimic the system's behavior in calling the connector. For more information, see **Connectors > Creating a new connector > Testing your connector locally**. > [!NOTE] > In some cases, you will need to store credentials in your local environment when testing locally. ## Subpages - **Connectors > BigQuery connector > Bigquery Connector Example Usage Union** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/bigquery-connector/bigquery-connector-example-usage-union === --- **Source**: integrations/connectors/bigquery-connector/bigquery-connector-example-usage-union.md **URL**: /docs/v1/selfmanaged/integrations/connectors/bigquery-connector/bigquery-connector-example-usage-union/ **Weight**: 1 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/chatgpt-connector === # ChatGPT connector ## Installation To install the ChatGPT connector, run the following command: ```shell $ pip install flytekitplugins-openai ``` ## Example usage For an example query, see **Connectors > ChatGPT connector > Chatgpt Connector Example Usage Union** ## Local testing To test the ChatGPT connector locally, create a class for the connector task that inherits from [SyncConnectorExecutorMixin](https://github.com/flyteorg/flytekit/blob/1bc8302bb7a6cf4c7048a7f93627ee25fc6b88c4/flytekit/extend/backend/base_connector.py#L304). This mixin can handle synchronous tasks and allows the SDK to mimic the system's behavior in calling the connector. For more information, see **Connectors > Creating a new connector > Testing your connector locally**. > [!NOTE] > In some cases, you will need to store credentials in your local environment when testing locally. ## Subpages - **Connectors > ChatGPT connector > Chatgpt Connector Example Usage Union** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/chatgpt-connector/chatgpt-connector-example-usage-union === --- **Source**: integrations/connectors/chatgpt-connector/chatgpt-connector-example-usage-union.md **URL**: /docs/v1/selfmanaged/integrations/connectors/chatgpt-connector/chatgpt-connector-example-usage-union/ **Weight**: 1 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/databricks-connector === # Databricks connector Union.ai can be integrated with the [Databricks](https://www.databricks.com/) service, enabling you to submit Spark jobs to the Databricks platform. ## Installation The Databricks connector comes bundled with the Spark plugin. To install the Spark plugin, run the following command: ```shell $ pip install flytekitplugins-spark ``` ## Example usage For an example query, see **Connectors > Databricks connector > Databricks Connector Example Usage Union** ## Local testing To test the Databricks connector locally, create a class for the connector task that inherits from [AsyncConnectorExecutorMixin](https://github.com/flyteorg/flytekit/blob/1bc8302bb7a6cf4c7048a7f93627ee25fc6b88c4/flytekit/extend/backend/base_connector.py#L354). This mixin can handle asynchronous tasks and allows the SDK to mimic the system's behavior in calling the connector. For more information, see **Connectors > Creating a new connector > Testing your connector locally**. > [!NOTE] > In some cases, you will need to store credentials in your local environment when testing locally. ## Subpages - **Connectors > Databricks connector > Databricks Connector Example Usage Union** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/databricks-connector/databricks-connector-example-usage-union === --- **Source**: integrations/connectors/databricks-connector/databricks-connector-example-usage-union.md **URL**: /docs/v1/selfmanaged/integrations/connectors/databricks-connector/databricks-connector-example-usage-union/ **Weight**: 1 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/mmcloud-connector === # Memory Machine Cloud connector [MemVerge](https://memverge.com/) [Memory Machine Cloud](https://www.mmcloud.io/) (MMCloud)โ€”available on AWS, GCP, and AliCloudโ€”empowers users to continuously optimize cloud resources during runtime, safely execute stateful tasks on spot instances, and monitor resource usage in real time. These capabilities make it an excellent fit for long-running batch workloads. Union.ai can be integrated with MMCloud, allowing you to execute Union.ai tasks using MMCloud. ## Installation To install the connector, run the following command: ```shell $ pip install flytekitplugins-mmcloud ``` To get started with Memory Machine Cloud, see the [Memory Machine Cloud user guide](https://docs.memverge.com/MMCloud/latest/User%20Guide/about). ## Example usage For an example query, see **Connectors > Memory Machine Cloud connector > Mmcloud Connector Example Usage Union** ## Local testing To test the MMCloud connector locally, create a class for the connector task that inherits from [AsyncConnectorExecutorMixin](https://github.com/flyteorg/flytekit/blob/1bc8302bb7a6cf4c7048a7f93627ee25fc6b88c4/flytekit/extend/backend/base_connector.py#L354). This mixin can handle asynchronous tasks and allows the SDK to mimic the system's behavior in calling the connector. For more information, see **Connectors > Creating a new connector > Testing your connector locally**. > [!NOTE] > In some cases, you will need to store credentials in your local environment when testing locally. ## Subpages - **Connectors > Memory Machine Cloud connector > Mmcloud Connector Example Usage Union** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/mmcloud-connector/mmcloud-connector-example-usage-union === --- **Source**: integrations/connectors/mmcloud-connector/mmcloud-connector-example-usage-union.md **URL**: /docs/v1/selfmanaged/integrations/connectors/mmcloud-connector/mmcloud-connector-example-usage-union/ **Weight**: 1 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/openai-batch-connector === # OpenAI Batch connector The Batch API connector allows you to submit requests for asynchronous batch processing on OpenAI. You can provide either a JSONL file or a JSON iterator, and the connector handles the upload to OpenAI, creation of the batch, and downloading of the output and error files. ## Installation To use the OpenAI Batch connector, run the following command: ```shell $ pip install flytekitplugins-openai ``` ## Example usage For an example query, see **Connectors > OpenAI Batch connector > Openai Batch Connector Example Usage Union** ## Local testing To test an connector locally, create a class for the connector task that inherits from [SyncConnectorExecutorMixin](https://github.com/flyteorg/flytekit/blob/1bc8302bb7a6cf4c7048a7f93627ee25fc6b88c4/flytekit/extend/backend/base_connector.py#L304) or [AsyncConnectorExecutorMixin](https://github.com/flyteorg/flytekit/blob/1bc8302bb7a6cf4c7048a7f93627ee25fc6b88c4/flytekit/extend/backend/base_connector.py#L354). These mixins can handle synchronous and synchronous tasks, respectively, and allow the SDK to mimic the system's behavior in calling the connector. For more information, see **Connectors > Creating a new connector > Testing your connector locally**. ## Subpages - **Connectors > OpenAI Batch connector > Openai Batch Connector Example Usage Union** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/openai-batch-connector/openai-batch-connector-example-usage-union === --- **Source**: integrations/connectors/openai-batch-connector/openai-batch-connector-example-usage-union.md **URL**: /docs/v1/selfmanaged/integrations/connectors/openai-batch-connector/openai-batch-connector-example-usage-union/ **Weight**: 1 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/perian-connector === # Perian connector The Perian connector enables you to execute Union.ai tasks on the [Perian Sky Platform](https://perian.io/). Perian allows the execution of any task on servers aggregated from multiple cloud providers. To get started with Perian, see the [Perian documentation](https://perian.io/docs/overview) and the [Perian connector documentation](https://perian.io/docs/flyte-getting-started). ## Example usage For an example, see **Connectors > Perian connector > Example Union** ## Connector setup Consult the [PERIAN connector setup guide](https://perian.io/docs/flyte-setup-guide). ## Subpages - **Connectors > Perian connector > Example Union** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/perian-connector/example-union === --- **Source**: integrations/connectors/perian-connector/example-union.md **URL**: /docs/v1/selfmanaged/integrations/connectors/perian-connector/example-union/ **Weight**: 1 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/sagemaker-inference-connector === # SageMaker connector The SageMaker connector allows you to deploy models, and create and trigger inference endpoints. You can also fully remove the SageMaker deployment. ## Installation To use the SageMaker connector, run the following command: ```shell $ pip install flytekitplugins-awssagemaker ``` ## Example usage For an example query, see **Connectors > SageMaker connector > Sagemaker Inference Connector Example Usage Union** ## Local testing To test an connector locally, create a class for the connector task that inherits from [SyncConnectorExecutorMixin](https://github.com/flyteorg/flytekit/blob/1bc8302bb7a6cf4c7048a7f93627ee25fc6b88c4/flytekit/extend/backend/base_connector.py#L304) or [AsyncConnectorExecutorMixin](https://github.com/flyteorg/flytekit/blob/1bc8302bb7a6cf4c7048a7f93627ee25fc6b88c4/flytekit/extend/backend/base_connector.py#L354). These mixins can handle synchronous and synchronous tasks, respectively, and allow the SDK to mimic the system's behavior in calling the connector. For more information, see **Connectors > Creating a new connector > Testing your connector locally**. ## Subpages - **Connectors > SageMaker connector > Sagemaker Inference Connector Example Usage Union** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/sagemaker-inference-connector/sagemaker-inference-connector-example-usage-union === --- **Source**: integrations/connectors/sagemaker-inference-connector/sagemaker-inference-connector-example-usage-union.md **URL**: /docs/v1/selfmanaged/integrations/connectors/sagemaker-inference-connector/sagemaker-inference-connector-example-usage-union/ **Weight**: 1 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/sensor === # Sensor connector ## Example usage For an example query, see **Connectors > Sensor connector > File Sensor Example Union** ## Subpages - **Connectors > Sensor connector > File Sensor Example Union** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/sensor/file-sensor-example-union === --- **Source**: integrations/connectors/sensor/file-sensor-example-union.md **URL**: /docs/v1/selfmanaged/integrations/connectors/sensor/file-sensor-example-union/ **Weight**: 1 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/slurm-connector === # Slurm connector ## Installation To install the Slurm connector, run the following command: ```shell $ pip install flytekitplugins-slurm ``` ## Example usage For an example query, see **Connectors > Slurm connector > Slurm Connector Example Usage Union** ## Local testing To test the Slurm connector locally, create a class for the connector task that inherits from [AsyncConnectorExecutorMixin](https://github.com/flyteorg/flytekit/blob/1bc8302bb7a6cf4c7048a7f93627ee25fc6b88c4/flytekit/extend/backend/base_connector.py#L354). This mixin can handle asynchronous tasks and allows the SDK to mimic the system's behavior in calling the connector. For more information, see **Connectors > Creating a new connector > Testing your connector locally**. > [!NOTE] > In some cases, you will need to store credentials in your local environment when testing locally. ## Subpages - **Connectors > Slurm connector > Slurm Connector Example Usage Union** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/slurm-connector/slurm-connector-example-usage-union === --- **Source**: integrations/connectors/slurm-connector/slurm-connector-example-usage-union.md **URL**: /docs/v1/selfmanaged/integrations/connectors/slurm-connector/slurm-connector-example-usage-union/ **Weight**: 1 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/snowflake-connector === # Snowflake connector Union.ai can be seamlessly integrated with the [Snowflake](https://www.snowflake.com) service, providing you with a straightforward means to query data in Snowflake. ## Installation To use the Snowflake connector, run the following command: ```shell $ pip install flytekitplugins-snowflake ``` ## Example usage For an example query, see **Connectors > Snowflake connector > Snowflake Connector Example Usage Union** ## Local testing To test the Snowflake connector locally, create a class for the connector task that inherits from [AsyncConnectorExecutorMixin](https://github.com/flyteorg/flytekit/blob/1bc8302bb7a6cf4c7048a7f93627ee25fc6b88c4/flytekit/extend/backend/base_connector.py#L354). This mixin can handle asynchronous tasks and allows the SDK to mimic the system's behavior in calling the connector. For more information, see **Connectors > Creating a new connector > Testing your connector locally**. > [!NOTE] > In some cases, you will need to store credentials in your local environment when testing locally. ## Subpages - **Connectors > Snowflake connector > Snowflake Connector Example Usage Union** === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/snowflake-connector/snowflake-connector-example-usage-union === --- **Source**: integrations/connectors/snowflake-connector/snowflake-connector-example-usage-union.md **URL**: /docs/v1/selfmanaged/integrations/connectors/snowflake-connector/snowflake-connector-example-usage-union/ **Weight**: 1 === PAGE: https://www.union.ai/docs/v1/selfmanaged/integrations/connectors/dgx-connector === # DGX connector You can run workflows on the [NVIDIA DGX platform](https://www.nvidia.com/en-us/data-center/dgx-platform/) with the DGX connector. ## Installation To install the DGX connector and have it enabled in your deployment, contact the Union.ai team. ## Example usage ```python from typing import List import union from flytekitplugins.dgx import DGXConfig dgx_image_spec = union.ImageSpec( base_image="my-image/dgx:v24", packages=["torch", "transformers", "accelerate", "bitsandbytes"], registry="my-registry", ) DEFAULT_CHAT_TEMPLATE = """ {% for message in messages %} {% if message['role'] == 'user' %} {{ '<<|user|>> ' + message['content'].strip() + ' <>' }} {% elif message['role'] == 'system' %} {{ '<<|system|>>\\n' + message['content'].strip() + '\\n<>\\n\\n' }} {% endif %} {% endfor %} {% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %} """.strip() @union.task(container_image=dgx_image_spec, cache_version="1.0", cache=True) def form_prompt(prompt: str, system_message: str) -> List[dict]: return [ {"role": "system", "content": system_message}, {"role": "user", "content": prompt}, ] @union.task( task_config=DGXConfig(instance="dgxa100.80g.8.norm"), container_image=dgx_image_spec, ) def inference(messages: List[dict], n_variations: int) -> List[str]: import torch import transformers from transformers import AutoTokenizer print(f"gpu is available: {torch.cuda.is_available()}") model = "mistralai/Mixtral-8x7B-Instruct-v0.1" tokenizer = AutoTokenizer.from_pretrained(model) pipeline = transformers.pipeline( "text-generation", tokenizer=tokenizer, model=model, model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True}, ) print(f"{messages=}") prompt = pipeline.tokenizer.apply_chat_template( messages, chat_template=DEFAULT_CHAT_TEMPLATE, tokenize=False, add_generation_prompt=True, ) outputs = pipeline( prompt, num_return_sequences=n_variations, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95, return_full_text=False, ) print(f'generated text={outputs[0]["generated_text"]}') return [output["generated_text"] for output in outputs] @union.workflow def wf( prompt: str = "Explain what a Mixture of Experts is in less than 100 words.", n_variations: int = 8, system_message: str = "You are a helpful and polite bot.", ) -> List[str]: messages = form_prompt(prompt=prompt, system_message=system_message) return inference(messages=messages, n_variations=n_variations) ``` === PAGE: https://www.union.ai/docs/v1/selfmanaged/api-reference === # Reference This section provides the reference material for all Union.ai APIs, SDKs and CLIs. To get started, add `union` to your project ```shell $ uv add union ``` This will install the Union and Flytekit SDKs and the `union` CLI. ### ๐Ÿ”— **Flytekit SDK** The Flytekit SDK provides the core Python API for building Union.ai workflows and apps. ### ๐Ÿ”— **Union SDK** The Union SDK provides additional Union.ai-specific capabilities, on top of the core Flytekit SDK. ### ๐Ÿ”— **Union CLI** The Union CLI is the command-line interface for interacting with your Union instance. ### ๐Ÿ”— **Uctl CLI** The Uctl CLI is an alternative CLI for performing administrative tasks and for use in CI/CD environments. ## Subpages - **LLM context documents** - **Union CLI** - **Uctl CLI** - **Flytekit SDK** - **Union SDK** === PAGE: https://www.union.ai/docs/v1/selfmanaged/api-reference/flyte-context === # LLM context documents The following documents provide a LLM context for authoring and running Flyte/Union workflows. They can serve as a reference for LLM-based AI assistants to understand how to properly write, configure, and execute Flyte/Union workflows. * **Full documentation content**: The entire documentation (this site) for Union.ai version 1.0 in a single text file. * ๐Ÿ“ฅ [llms-full.txt](/_static/public/llms-full.txt) * **Concise context document**: A concise overview of Flyte 1.0 concepts. * ๐Ÿ“ฅ [llms-concise.txt](/_static/public/llms-concise.txt) You can then add either or both to the context window of your LLM-based AI assistant to help it better understand Flyte/Union development. === PAGE: https://www.union.ai/docs/v1/selfmanaged/api-reference/union-cli === # Union CLI The `union` CLI is the main tool developers use to interact with Union.ai on the command line. ## Installation The recommended way to install the union CLI outside a workflow project is to use [`uv`](https://docs.astral.sh/uv/): ```shell $ uv tool install union ``` This will install the `union` CLI globally on your system [as a `uv` tool](https://docs.astral.sh/uv/concepts/tools/). ## Configure the `union` CLI To configure the `union` CLI to connect to you Union.ai BYOC or Self-managed instance, run the following command: ```shell $ union create login --host ``` where `` is the URL of your Union.ai instance. These command will create the file `~/.union/config.yaml` with the configuration information to connect to the Union.ai instance. See **Getting started > Local setup** for more details. ## Overriding the configuration file location By default, the `union` CLI will look for a configuration file at `~/.union/config.yaml`. You can override this behavior to specify a different configuration file by setting the `UNION_CONFIG ` environment variable: ```shell export UNION_CONFIG=~/.my-config-location/my-config.yaml ``` Alternatively, you can always specify the configuration file on the command line when invoking `union` by using the `--config` flag: ```shell $ union --config ~/.my-config-location/my-config.yaml run my_script.py my_workflow ``` ## `union` CLI configuration search path The `union` CLI will check for configuration files as follows: First, if a `--config` option is used, it will use the specified config file. Second, the config files pointed to by the following environment variables (in this order): * `UNION_CONFIG` * `UNIONAI_CONFIG` * `UCTL_CONFIG` Third, the following hard-coded locations (in this order): Third, the following hard-coded locations (in this order): * `~/.union/config.yaml` * `~/.uctl/config.yaml` If none of these are present, the CLI will raise an error. ## `union` CLI commands Entrypoint for all the user commands. ```shell union [OPTIONS] COMMAND [ARGS]... ``` ### Options - `-v`, `--verbose` Show verbose messages and exception traces. - `-k`, `--pkgs ` Dot-delineated python packages to operate on. Multiple may be specified (can use commas, or specify the switch multiple times). Please note that this option will override the option specified in the configuration file, or environment variable. - `-c`, `--config ` Path to config file for use within container. --- ### `backfill` The backfill command generates and registers a new workflow based on the input launchplan to run an automated backfill. The workflow can be managed using the UI and can be canceled, relaunched, and recovered. > - `launchplan` refers to the name of the Launchplan. > - `launchplan_version` is optional and should be a valid version for a Launchplan version. ```shell union backfill [OPTIONS] LAUNCHPLAN [LAUNCHPLAN_VERSION] ``` #### Options - `-p`, `--project ` Project for workflow/launchplan. Can also be set through envvar `FLYTE_DEFAULT_PROJECT`. **Default:** `flytesnacks` - `-d`, `--domain ` Domain for workflow/launchplan, can also be set through envvar `FLYTE_DEFAULT_DOMAIN`. **Default:** `'development'` - `-v`, `--version ` Version for the registered workflow. If not specified, it is auto-derived using the start and end date. - `-n`, `--execution-name ` Create a named execution for the backfill. This can prevent launching multiple executions. - `--dry-run` Just generate the workflow - do not register or execute. **Default:** `False` - `--parallel`, `--serial` All backfill steps can be run in parallel (limited by max-parallelism) if using `--parallel`. Else all steps will be run sequentially (`--serial`). **Default:** `False` - `--execute`, `--do-not-execute` Generate the workflow and register, do not execute. **Default:** `True` - `--from-date ` Date from which the backfill should begin. Start date is inclusive. - `--to-date ` Date to which the backfill should run until. End date is inclusive. - `--backfill-window ` Timedelta for number of days, minutes, or hours after the from-date or before the to-date to compute the backfills between. This is needed with from-date / to-date. Optional if both from-date and to-date are provided. - `--fail-fast`, `--no-fail-fast` If set to true, the backfill will fail immediately if any of the backfill steps fail. If set to false, the backfill will continue to run even if some of the backfill steps fail. **Default:** `True` - `--overwrite-cache` Whether to overwrite the cache if it already exists. **Default:** `False` #### Arguments - `LAUNCHPLAN` Required argument. - `LAUNCHPLAN_VERSION` Optional argument. --- ### `build` This command can build an image for a workflow or a task from the command line, for fully self-contained scripts. ```shell union build [OPTIONS] COMMAND [ARGS]... ``` #### Options - `-p`, `--project ` Project to register and run this workflow in. Can also be set through envvar `FLYTE_DEFAULT_PROJECT`. **Default:** `flytesnacks` - `-d`, `--domain ` Domain to register and run this workflow in, can also be set through envvar `FLYTE_DEFAULT_DOMAIN`. **Default:** `'development'` - `--destination-dir ` Directory inside the image where the tar file containing the code will be copied to. **Default:** `'.'` - `--copy-all` [Deprecated, see `--copy`] Copy all files in the source root directory to the destination directory. You can specify `--copy all` instead. **Default:** `False` - `--copy ` Specifies how to detect which files to copy into the image. `all` will behave as the deprecated copy-all flag, `auto` copies only loaded Python modules. **Default:** `'auto'` **Options:** `all | auto` - `-i`, `--image ` Multiple values allowed. Image used to register and run. **Default:** `'cr.union.ai/union/unionai:py3.11-latest' (Serverless), 'cr.flyte.org/flyteorg/flytekit:py3.9-latest' (BYOC)` - `--service-account ` Service account used when executing this workflow. - `--wait`, `--wait-execution` Whether to wait for the execution to finish. **Default:** `False` - `--poll-interval ` Poll interval in seconds to check the status of the execution. - `--dump-snippet` Whether to dump a code snippet instructing how to load the workflow execution using UnionRemote. **Default:** `False` - `--overwrite-cache` Whether to overwrite the cache if it already exists. **Default:** `False` - `--envvars`, `--env ` Multiple values allowed. Environment variables to set in the container, of the format `ENV_NAME=ENV_VALUE`. - `--tags`, `--tag ` Multiple values allowed. Tags to set for the execution. - `--name ` Name to assign to this execution. - `--labels`, `--label ` Multiple values allowed. Labels to be attached to the execution of the format `label_key=label_value`. - `--annotations`, `--annotation ` Multiple values allowed. Annotations to be attached to the execution of the format `key=value`. - `--raw-output-data-prefix`, `--raw-data-prefix ` File Path prefix to store raw output data. Examples are `file://`, `s3://`, `gs://` etc., as supported by fsspec. If not specified, raw data will be stored in the default configured location in remote or locally to the temp file system. - `--max-parallelism ` Number of nodes of a workflow that can be executed in parallel. If not specified, project/domain defaults are used. If 0, then it is unlimited. - `--disable-notifications` Should notifications be disabled for this execution. **Default:** `False` - `-r`, `--remote` Whether to register and run the workflow on a Union deployment. **Default:** `False` - `--limit ` Use this to limit the number of entities to fetch. **Default:** `50` - `--cluster-pool ` Assign newly created execution to a given cluster pool. - `--execution-cluster-label`, `--ecl ` Assign newly created execution to a given execution cluster label. - `--fast` Use fast serialization. The image wonโ€™t contain the source code. **Default:** `False` --- ### `build.py` Build an image for [workflow|task] from build.py ```shell union build build.py [OPTIONS] COMMAND [ARGS]... ``` --- ### `cache` Cache certain artifacts from remote registries. ```shell union cache [OPTIONS] COMMAND [ARGS]... ``` #### `model-from-hf` Create a model with NAME from HuggingFace REPO ```shell union cache model-from-hf [OPTIONS] REPO ``` ##### Options - `--artifact-name ` Artifact name to use for the cached model. Must only contain alphanumeric characters, underscores, and hyphens. If not provided, the repo name will be used (replacing '.' with '-'). - `--architecture ` Model architecture, as given in HuggingFace config.json, For non transformer models use XGBoost, Custom etc. - `--task ` Model task, E.g, `generate`, `classify`, `embed`, `score` etc refer to VLLM docs, `auto` will try to discover this automatically - `--modality ` Modalities supported by Model, E.g, `text`, `image`, `audio`, `video` etc refer to VLLM Docs - `--format ` Model serialization format, e.g safetensors, onnx, torchscript, joblib, etc - `--model-type ` Model type, e.g, `transformer`, `xgboost`, `custom` etc. Model Type is important for non-transformer models.For huggingface models, this is auto determined from config.json['model_type'] - `--short-description ` Short description of the model - `--force ` Force caching of the model, pass `--force=1/2/3...` to force cache invalidation - `--wait` Wait for the model to be cached. - `--hf-token-key ` Union secret key with hugging face token - `--union-api-key ` Union secret key with admin permissions - `--cpu ` Amount of CPU to use for downloading, (optionally) sharding, and caching hugging face model - `--gpu ` Amount of GPU to use for downloading (optionally) sharding, and caching hugging face model - `--mem ` Amount of Memory to use for downloading, (optionally) sharding, and caching hugging face model - `--ephemeral-storage ` Amount of Ephemeral Storage to use for downloading, (optionally) sharding, and caching hugging face model - `--accelerator ` The accelerator to use for downloading, (optionally) sharding, and caching hugging face model. **Options:**: `nvidia-l4`, `nvidia-l4-vws`, `nvidia-l40s`, `nvidia-a100`, `nvidia-a100-80gb`, `nvidia-a10g`, `nvidia-tesla-k80`, `nvidia-tesla-m60`, `nvidia-tesla-p4`, `nvidia-tesla-p100`, `nvidia-tesla-t4`, `nvidia-tesla-v100` - `--shard-config ` The engine to shard the model with. A yaml configuration file conforming to [`remote.ShardConfig`](../union-sdk/packages/union.remote#unionremoteshardconfig). - `-p`, `--project ` Project to operate on - `-d`, `--domain ` Domain to operate on **Default:** `development` - `--help` Show this message and exit. ### `create` Create a resource. ```shell union create [OPTIONS] COMMAND [ARGS]... ``` #### `api-key` Manage API keys. ```shell union create api-key [OPTIONS] COMMAND [ARGS]... ``` ##### `admin` Create an api key. ```shell union create api-key admin [OPTIONS] ``` ###### Options - `--name ` Required Name for API key. --- #### `artifact` Create an artifact with NAME. ```shell union create artifact [OPTIONS] NAME ``` ##### Options - `--version ` Required Version of the artifact. - `-p`, `--partitions ` Partitions for the artifact. - `--short-description ` Short description of the artifact. - `-p`, `--project ` Project to operate on. **Default:** `functools.partial(, previous_default='default')` - `-d`, `--domain ` Domain to operate on. **Default:** `'development'` - `--from_float ` Create an artifact of type (float). - `--from_int ` Create an artifact of type (int). - `--from_str ` Create an artifact of type (str). - `--from_bool ` Create an artifact of type (bool). - `--from_datetime ` Create an artifact of type (datetime). - `--from_duration ` Create an artifact of type (duration). - `--from_json ` Create an artifact of type (struct). - `--from_dataframe ` Create an artifact of type (parquet). - `--from_file ` Create an artifact of type (file). - `--from_dir ` Create an artifact of type (dir). ###### Arguments - `NAME` Required argument. --- #### `login` Log into Union. On Union Serverless run: `union create login --serverless` On Union BYOC run: `union create login --host UNION_TENANT` ```shell union create login [OPTIONS] ``` ##### Options - `--auth ` Authorization method to use. **Options:** `device-flow | pkce` - `--host ` Host to connect to. Mutually exclusive with serverless. - `--serverless` Connect to serverless. Mutually exclusive with host. --- #### `secret` Create a secret with NAME. ```shell union create secret [OPTIONS] NAME ``` ##### Options - `--value ` Secret value. Mutually exclusive with value_file. - `-f`, `--value-file ` Path to file containing the secret. Mutually exclusive with value. - `--project ` Project name. - `--domain ` Domain name. ###### Arguments - `NAME` Required argument. --- #### `workspace` Create workspace. ```shell union create workspace [OPTIONS] CONFIG_FILE ``` ###### Arguments - `CONFIG_FILE` Required argument. --- #### `workspace-config` Create workspace config at CONFIG_FILE. ```shell union create workspace-config [OPTIONS] CONFIG_FILE ``` ##### Options - `--init ` Required. **Options:** `base_image` ###### Arguments - `CONFIG_FILE` Required argument. --- ### `delete` Delete a resource. ```shell union delete [OPTIONS] COMMAND [ARGS]... ``` #### `api-key` Manage API keys. ```shell union delete api-key [OPTIONS] COMMAND [ARGS]... ``` ##### `admin` Delete api key. ```shell union delete api-key admin [OPTIONS] ``` ###### Options - `--name ` Required Name for API key. --- #### `login` Delete login information. ```shell union delete login [OPTIONS] ``` --- #### `secret` Delete secret with NAME. ```shell union delete secret [OPTIONS] NAME ``` ##### Options - `--project ` Project name. - `--domain ` Domain name. ###### Arguments - `NAME` Required argument. --- #### `workspace` Delete workspace with NAME. ```shell union delete workspace [OPTIONS] NAME ``` ##### Options - `--project ` Project name. - `--domain ` Domain name. ###### Arguments - `NAME` Required argument. --- ### `deploy` Deploy a resource. ```shell union deploy [OPTIONS] COMMAND [ARGS]... ``` #### `apps` Deploy application on Union. ```shell union deploy apps [OPTIONS] COMMAND [ARGS]... ``` ##### Options - `-p`, `--project ` Project to run deploy. **Default:** `flytesnacks` - `-d`, `--domain ` Domain to run deploy. **Default:** `'development'` - `-n`, `--name ` Application name to start. --- #### `build.py` Deploy application in build.py. ```shell union deploy apps build.py [OPTIONS] COMMAND [ARGS]... ``` --- ### `execution` The execution command allows you to interact with Unionโ€™s execution system, such as recovering/relaunching a failed execution. ```shell union execution [OPTIONS] COMMAND [ARGS]... ``` ##### Options - `-p`, `--project ` Project for workflow/launchplan. Can also be set through envvar `FLYTE_DEFAULT_PROJECT`. **Default:** `flytesnacks` - `-d`, `--domain ` Domain for workflow/launchplan, can also be set through envvar `FLYTE_DEFAULT_DOMAIN`. **Default:** `'development'` - `--execution-id ` Required The execution id. --- #### `recover` Recover a failed execution. ```shell union execution recover [OPTIONS] ``` --- #### `relaunch` Relaunch a failed execution. ```shell union execution relaunch [OPTIONS] ``` --- ### `fetch` Retrieve Inputs/Outputs for a Union execution or any of the inner node executions from the remote server. The URI can be retrieved from the UI, or by invoking the get_data API. ```shell union fetch [OPTIONS] FLYTE-DATA-URI (format flyte://...) DOWNLOAD-TO Local path (optional) ``` ##### Options - `-r`, `--recursive` Fetch recursively, all variables in the URI. This is not needed for directories as they are automatically recursively downloaded. ###### Arguments - `FLYTE-DATA-URI (format flyte://...)` Required argument. - `DOWNLOAD-TO Local path (optional)` Optional argument. --- ### `get` Get a single or multiple remote objects. ```shell union get [OPTIONS] COMMAND [ARGS]... ``` #### `api-key` Manage API keys. ```shell union get api-key [OPTIONS] COMMAND [ARGS]... ``` ##### `admin` Show existing API keys for admin. ```shell union get api-key admin [OPTIONS] ``` --- #### `apps` Get apps. ```shell union get apps [OPTIONS] ``` ##### Options - `--name ` - `--project ` Project name. - `--domain ` Domain name. --- #### `launchplan` Interact with launchplans. ```shell union get launchplan [OPTIONS] LAUNCHPLAN-NAME LAUNCHPLAN-VERSION ``` ##### Options - `--active-only`, `--scheduled` Only return active launchplans. - `-p`, `--project ` Project for workflow/launchplan. Can also be set through envvar `FLYTE_DEFAULT_PROJECT`. **Default:** `flytesnacks` - `-d`, `--domain ` Domain for workflow/launchplan, can also be set through envvar `FLYTE_DEFAULT_DOMAIN`. **Default:** `'development'` - `-l`, `--limit ` Limit the number of launchplans returned. ###### Arguments - `LAUNCHPLAN-NAME` Optional argument. - `LAUNCHPLAN-VERSION` Optional argument. --- #### `secret` Get secrets. ```shell union get secret [OPTIONS] ``` ##### Options - `--project ` Project name. - `--domain ` Domain name. --- #### `workspace` Get workspaces. ```shell union get workspace [OPTIONS] ``` ##### Options - `--name ` - `--project ` Project name. - `--domain ` Domain name. - `--show-details` Show additional details. --- ### `info` ```shell union info [OPTIONS] ``` --- ### `init` Create Union-ready projects. ```shell union init [OPTIONS] PROJECT_NAME ``` ##### Options - `--template