/ better AI pipelines by design

Powerful AI orchestration on Kubernetes

For developers tasked with managing AI, ML, and data workflows in production, the challenges extend well beyond orchestrating DAGs. Union addresses these complexities by providing a scalable MLOps platform, designed to reduce costs and foster unmatched collaboration among team members, all backed by a Kubernetes-powered infrastructure.

Union optimizes resources across teams and implements cost-effective strategies that can reduce expenses by up to 66%. Moreover, it’s engineered to fit within your own cloud ecosystem, ensuring a robust and tailored infrastructure that scales with your technical demands.

Powerful AI Orchestration on Kubernetes
Powerful DAGs, observability & cost-efficient engineering
/ bring your own compute (BYOC)

Flexibility, security, observability & cost-efficient engineering

Union is a fully-managed platform deployed in your VPC. Get built-in dashboards, live logging, and task-level resource monitoring, enabling users to identify resource bottlenecks and simplifying the debugging process, resulting in optimized infrastructure and faster experimentation.

Maintain data locality and existing cloud vendor pricing. Union is multi-account and multi-cloud ready, and is available using AWS and GCP credits.

/ do more with less

Use fractional GPUs & target specific accelerators

Need for GPUs vary by workloads. Targeting specific accelerators with Union is as simple as adding an annotation on a function. Moreover, increase utilization and reduce cost by allocating multiple tasks on a single GPU, ensuring strict memory isolation.

Users of Union often need multiple GPU Pools for different use cases, ranging from training, fine tuning, batch inference and more. You are able to leverage Nvidia GPUs, Google TPUs, AWS Silicon and other accelerators to optimize performance, cost and availability.

Seamlessly adapt to diverse computing needs with an array of GPU Pools tailored for various applications, including training, LLM fine-tuning, batch inference, and beyond. Harness the power of Nvidia GPUs, Google TPUs, AWS Silicon, and other cutting-edge accelerators to maximize performance and minimize costs.

Use fractional GPUs & target specific accelerators
@task
def my_simple_task() -> pd.DataFrame:
  ...
  
@task(limits=Resources(mem="64Gi", gpu=1),
           accelerator=A100.partition_2g_10gb)
def train_model(data: pd.DataFrame) -> MLPClassifier:
  ...

@task(config=Spark(
        spark_conf={
            "spark.driver.memory": "1000M",
            "spark.executor.memory": "1000M",
            "spark.executor.cores": "1",
            "spark.executor.instances": "2",
            "spark.driver.cores": "1",
        }
    ))
def spark_task(partitions: int) -> float:
  ...
/ scale up on demand

Declarative infrastructure

Use Union’s declarative infrastructure to express your requirements and leave the infrastructure provisioning, configuring, and scaling on us. Ray, Spark, Dask, distributed training all through a single platform!

/ bring your entire organization together on the same platform

Track lineage & build event driven workflows

Automatically track end-to-end lineage between workflows and teams with Artifacts, a data catalog and model registry built on top of workflow inputs and outputs.

Seamlessly automate downstream workflows (such as model training) in response to the completion of upstream workflows (such as data processing) with Triggers.

Track lineage & build event driven workflows
/ better performance

Accelerated datasets, faster executions

Dramatically boost the performance of big file reads with Accelerated Datasets, which can reduce time to completion for some workflows by more than 90%.

The core engine at Union has been fine tuned and optimized for faster executions. Users experience drastically improved performance of certain workflows by up to 95%.

The Union orchestration partner network

Available now on the AWS Marketplace

Available now on the AWS Marketplace

Read announcement
Available soon on the GCP Marketplace

Available now on the GCP Marketplace

Read announcement
Member of the Nvidia Inception Program

Member of the Nvidia Inception Program

Read announcement
/ the better replacement for Airflow & Kubeflow

Purpose-built for lineage-aware pipeline orchestration

Bring your own Airflow code (BYOAC) and take advantage of modern AI orchestration features—out of the box! Get full reproducibility, audibility, experiment tracking, cross-team task sharing, compile-time error checking, and automatic artifact capture.

Explore features
Airflow
Union
Versioning

Easily experiment and iterate in isolation with versioned tasks and workflows.

Multi-tenancy

A centralized infrastructure for your team and organization, enables multiple users to share the same platform while maintaining their own distinct data and configurations.

Type checking

Strongly typed inputs and outputs can simplify data validation and highlight incompatibilities between tasks making it easier to identify and troubleshoot errors before launching the workflow.

Caching

Caching the output of task executions can accelerate subsequent executions and prevent wasted resources.

Data lineage

As a data-aware platform, it can simplify rollbacks and error tracking.

Immutability

Immutable executions help ensure reproducibility by preventing any changes to the state of an execution.

Recovery

Rerun only failed tasks in a workflow to save time, resources, and more easily debug.

Human-in-the-loop

Enable human intervention to supervise, tune and test workflows - resulting in improved accuracy and safety.

Intra-task checkpointing

Checkpoint progress within a task execution in order to save time and resources in the event of task failure.

Reproducibility

With every task versioned and every dependency set is captured, making it easy to share workflows across teams and reproduce results.

/ AI orchestration: the essential fabric for rapid Data, ML, & AI development

The best teams choose Union & Flyte

Across Data, ML, and AI, Flyte has established a stellar reputation as the most scalable AI orchestrator. It manages and executes workflows with over 10,000 CPUs and tens of thousands of pipelines, all powered by Python code. Union brings the powerful Flyte platform to your team in a managed environment, so you don’t have to set it up. Discover why the Flyte-powered Union is a game-changer

Faster time-to-market

In today’s fast-paced business environment, the ability to quickly develop and deploy machine learning models can be the difference between success and failure.

Union helps businesses accelerate their ML projects by automating many of the processes involved in model development and deployment, reducing the time and effort required to get models into production.

View Union features

Scalable ML workflows

Scaling machine learning efforts can be challenging due to the need for specialized infrastructure, in-house expertise in distributed systems management, and tools to handle large-scale data processing and model training.

Union enables reproducibility, observability at the workflow, task, and data level, and provides plugins for model deployment and distributed model training tools and frameworks.

Read MethaneSAT case study

Reduce ML technical debt

Without standardized operations and processes in place, many teams struggle to promote models to production resulting in sunk costs and wasted compute resources.

Union enables more efficient and accurate workflows through automated validation and optimization throughout the development and deployment process.

Read ML use case

Integrate with existing tooling

Whether you are working with ML frameworks like TensorFlow and PyTorch, or using tools like Jupyter notebooks and Apache Spark, Union is designed with an extensible plugin system that spans both data science and infrastructure stacks.

This allows users to leverage the power of a managed platform without disrupting existing processes.

View Agents

Globally trusted & tested

10
K+
Community Members
1
M+
Monthly Downloads
30
+
Fortune 100 Companies

Join our developer community

“Union Cloud solves our operational complexity problems across diverse workloads, whether it is running data cleaning & pre-processing workflows or protein structure ML predictions for low-volume, high-complexity scientific workloads to large-scale scientific simulations. Additionally, the platform can drive down the relative cost of protein production by orders of magnitude. With Union Cloud as our standardized workflow orchestration platform, we can stop managing our own systems and infrastructure, and instead focus on antibody discovery and development.”

Alex Ford, Head of Data Platform at AbCellera Biologics

“As engineers, a lot of this might be table stakes for us. But for data scientists, being able to get [financial analytics] up and running on Flyte™ and getting all of this stuff for free has been a really big win for them.”

Dylan Wilder, Engineering Manager at Spotify

“We’ve migrated about 50% of all training pipelines over to Flyte™ from Kubeflow. In several cases, we saw an 80% reduction in boilerplate between workflows and tasks vs. the Kubeflow pipeline and components. Overall, Flyte™ is a far simpler system to reason about with respect to how the code actually executes, and it’s more self-serve for our research team to handle.”

Rahul Mehta, ML Infrastructure/Platform Lead at Theorem LP

“Flyte has this concept of immutable transformation — it turns out the executions cannot be deleted, and so having immutable transformation is a really nice abstraction for our data-engineering stack.”

Jeev Balakrishnan, Software Engineer at Freenome

“It’s not an understatement to say that Flyte™ is really a workhorse at Freenome!”

Jeev Balakrishnan, Software Engineer at Freenome

“To my great surprise, the migration to Flyte™ was as smooth and easy as the development of our initial active learning pipeline in Airflow had been painful: It literally took just a few weeks to revamp our platform’s main pipeline entirely, to the delight of users and developers alike.”

Jennifer Prendki, Founder and CEO of Alectio

“Workflow versioning is quite important: When it comes to productionizing a pipeline, there are only a few platforms that provide this kind of versioning. To us, it’s critical to be able to roll back to a certain workflow version in case there is a bug introduced into our production pipeline.”

Pradithya Aria Pura, Principal Software Engineer at Gojek

“Because we are a spot-based company, a lot of our workflows run into the majority of issues. Thankfully, with Flyte™, we can debug and do quick iterations.”

Varsha Parthasarathy, Senior Software Engineer at Woven Planet

“The multi-tenancy that Flyte™ provides is obviously important in regulated spaces where you need to separate users and resources and things like amongst each other within the same organization.”

Jake Neyer, Software Engineer at Striveworks

A word from our dev team

Faster Airflow to Flyte migration powered by Flyte Airflow Agents
Kevin Su
Kevin Su

Faster Airflow to Flyte migration powered by Flyte Airflow Agents

Read the story
Pandera 0.19.0: Polars DataFrame Validation
Niels Bantilan
Niels Bantilan

Pandera 0.19.0: Polars DataFrame Validation

Read the story
The Essential Role of Vector Databases in LLMOps
Sage Elliott
Sage Elliott

The Essential Role of Vector Databases in LLMOps

Read the story
Data Engineering
Inference
Open-source
Features
Bioinformatics
Computer Vision
MLOps
AI Orchestration
Prompt Engineering
Data Quality
Data Processing
Model Training
LLMs
Cloud
Events
Machine Learning
Company
Feature
Podcast
Article
Press
Community