Overview
The Union.ai architecture consists of two components, referred to as planes — the control plane and the data plane.
Control plane
The control plane:
- Runs within the Union.ai AWS account.
- Provides the user interface through which users can access authentication, authorization, observation, and management functions.
- Is responsible for placing executions onto data plane clusters and performing other cluster control and management functions.
Data plane
Union.ai operates one control plane for each supported region, which supports all data planes within that region. You can choose the region in which to locate your data plane. Currently, Union.ai supports the us-west
, us-east
, eu-west
, and eu-central
regions, and more are being added.
Data plane nodes
Union.ai operator
The Union.ai hybrid architecture lets you maintain ultimate ownership and control of your data and compute infrastructure while enabling Union.ai to handle the details of managing that infrastructure.
Management of the data plane is mediated by a dedicated operator (the Union.ai operator) resident on that plane. This operator is designed to perform its functions with only the very minimum set of required permissions. It allows the control plane to spin up and down clusters and provides Union.ai’s support engineers with access to system-level logs and the ability to apply changes as per customer requests. It does not provide direct access to secrets or data.
In addition, communication is always initiated by the Union.ai operator in the data plane toward the Union.ai control plane, not the other way around. This further enhances the security of your data plane.
Union.ai is SOC-2 Type 2 certified. A copy of the audit report is available upon request.
Registry data
Registry data is comprised of:
- Names of workflows, tasks, launch plans, and artifacts
- Input and output types for workflows and tasks
- Execution status, start time, end time, and duration of workflows and tasks
- Version information for workflows, tasks, launchplans, and artifacts
- Artifact definitions
This type of data is stored in the control plane and is used to manage the execution of your workflows. This does not include any workflow or task code, nor any data that is processed by your workflows or tasks.
Execution data
Execution data is comprised of::
- Event data
- Workflow inputs
- Workflow outputs
- Data passed between tasks (task inputs and outputs)
This data is divided into two categories: raw data and literal data.
Raw data
Raw data is comprised of:
- Files and directories
- Dataframes
- Models
- Python-pickled types
These are passed by reference between tasks and are always stored in an object store in your data plane. This type of data is read by (and may be temporarily cached) by the control plane as needed, but is never stored there.
Literal data
- Primitive execution inputs (int, string… etc.)
- JSON-serializable dataclasses
These are passed by value, not by reference, and may be stored in the Union.ai control plane.
Data privacy
If you are concerned with maintaining strict data privacy, be sure not to pass private information in literal form between tasks.