/ experience Union: streamline your AI with advanced orchestration

Production-Grade Data & ML Infrastructure without Hiring A Team

As a managed Flyte solution, Union empowers ML, Data, and BioTech teams to deliver increased value. Union streamlines your workflow for 10x productivity by eliminating infrastructure constraints and complex setup processes.

Available now on the AWS Marketplace

Available now on the AWS Marketplace

Read announcement

How Warner Bros. Discovery Keeps Its Media Streams Flowing

Read case study
Event: Mar 7, 10:00am PST

User Experimentation for AI Applications | Timothy Chan - Statsig

View event details
/ enable your code for production

Three steps to streamline & scale

Union is an easy-to-use orchestration platform for Data, ML, and BioTech teams. Utilize your existing code with added benefits of versioning, data lineage, and more, all powered by Kubernetes—no extensive learning required.

1

Supercharge your code

Turn your ETL and ML code into scalable tasks and workflows.

Move the slider to see the Flyte™ Decorators that identify your code as tasks and workflows.

After
Before
Code after
Code before
2

Get ready to execute

Union is an out-of-the-box experience for machine learning engineers and data scientists who need to deliver at scale and often without any help from IT or infrastructure teams.

3

Observe & optimize

As your workflows become more complex, it becomes increasingly important to gain deeper insights into their performance.

Union’s built-in dashboards, logging, and task-level resource monitoring enables users to identify resource bottlenecks, long execution times, and simplify the debugging process, resulting in optimized resources and faster experimentation.

How Data, AI, & ML orchestration works

AI orchestration is the process of managing and automating tasks such as data preparation, model training, deployment, monitoring, and updating. As your operations scale up, orchestration becomes the essential fabric that runs your ML workflows and pipelines. Advanced orchestrators like Union manage your infrastructure and simplify integrations with various frameworks and platforms.

1. Create a Workflow

Begin by defining the workflow for your ML project. This involves outlining the sequence of tasks for the project, including data collection, preprocessing, model training, evaluation, and deployment. With Union, you can conveniently use Python to write and run your workflows.

2. Manage Resources

Use declarative statements in your Python code to manage resources and Union allocates the computational resources for each task. This includes hardware resources such as CPUs, GPUs, and memory, ensuring that each stage of your pipeline has the resources it needs, thereby minimizing wastage.

Snowflake, SQAlchemy, DoltHub, BigQuery, Hive and DuckDBDatabricks, AWS Athena and Apache SparkPandera and Great ExpectationsVaex, Polars and FlyteKuneflow, Dask, Ray, Dask, HuggingFace, DeepSpeed, UnionML, AWS Sagemaker and JaxW&B and WhylogsOnnx, BentoML, AWS Sagemaker and BananaWhylogs and MLFlowKuneflow, Dask, Ray, Dask, HuggingFace, DeepSpeed, AWS Sagemaker and Jax
Extract
Transform
Validate
Explore
Train
Test
Deploy
Monitor
Retrain

3. Schedule & Execute

By leveraging Kubernetes, orchestrators like Union can efficiently schedule, version, and execute containerized tasks. This fully automated approach not only ensures reproducibility but also enhances scalability.

4. Monitor & Observe

Once a workflow is deployed, advanced orchestrators like Union offer dashboards, logs, and task-level monitoring. These tools are essential for troubleshooting and provide valuable insights into resource utilization.

Write your Python code locally, execute it remotely

Enjoy the freedom to write Python code that runs both locally and remotely in your Kubernetes cluster. Take advantage of full parallelization and utilization of all Kubernetes nodes without creating Docker files or writing YAML.

1. Run Local
2. Scale Remote
3. Deploy
import transformers as tr
from datasets import load_dataset

from flytekit import task
from flytekit.types.directory import FlyteDirectory

@task
def train(
    model_id: str,
    dataset_id: str,
    dataset_name: str,
) -> FlyteDirectory:

    # authenticate
    hh.login(token="...")

    # load the dataset, model, and tokenizer
    dataset = load_dataset(dataset_id, dataset_name)
    model = tr.AutoModelForCausalLM.from_pretrained(model_id, ...)
    tokenizer = tr.AutoTokenizer.from_pretrained(model_id, ...)

    # prepare dataset
    dataset = dataset["train"].shuffle().map(tokenizer, ...)

    # define and run the trainer
    trainer = tr.Trainer(model=model, train_dataset=dataset, ...)
    print("Training model")
    trainer.train()

    # save and return model directory
    output_path = "./model"
    print("Saving model")
    trainer.model.save_pretrained(output_path)
    return FlyteDirectory(path=output_path)
import transformers as tr
from datasets import load_dataset

from flytekit import task, ImageSpec, Resources
from flytekit.types.directory import FlyteDirectory


image_spec = ImageSpec(
    name="llm_training",
    registry="ghcr.io/unionai-oss",
    requirements="requirements.txt",
    python_version="3.9",
    cuda="11.7.1",
    env={"VENV": "/opt/venv"},
)

@task(
    cache=True,
    cache_version="0",
    requests=Resource(mem="100Gi", cpu="32", gpu="8"),
    container_image=image_spec,
)
def train(
    model_id: str,
    dataset_id: str,
    dataset_name: str,
) -> FlyteDirectory:
    ...
@task(...)
def train(...) -> FlyteDirectory:
    ...

@task(container_image=image_spec)
def deploy(model_dir: FlyteDirectory, repo_id: str) -> str:
    model_dir.download()
    hh.login(token="...")
    
    # upload readme and model files
    api = hh.HfApi()
    repo_url = api.create_repo(repo_id, exist_ok=True)
    readme = "..."
    api.upload_file(
        path_or_fileobj=BytesIO(readme.encode()),
        path_in_repo="README.md",
        repo_id=repo_id,
    )
    api.upload_folder(
        repo_id=repo_id,
        folder_path=model_dir.path,
    )
    return str(repo_url)

@workflow
def train_and_deploy(
    model_id: str,
    dataset_id: str,
    dataset_name: str,
    repo_id: str,
) -> str:
    model_dir = train(
        model_id=model_id,
        dataset_id=dataset_id,
        dataset_name=dataset_name,
    )
    return deploy(model_dir=model_dir, repo_id=repo_id)
$ pyflyte run llm_training.py train \
    --model_id EleutherAI/pythia-70m \
    --dataset_id togethercomputer/RedPajama-Data-V2 \
    --dataset_name sample

Running Execution on local.
Map: 100%|████████████| 1050391/1050391
Training model
{'train_runtime': 4.5401, ...}
100%|███████████████████| 100/100
Saving model
file:///var/folders/4q/frdnh9l10h53gggw1m59gr9m0000gp/T/flyte-f2qjyme6/raw/a888e295fefbdae4023ec2b35e53edcb

$ ls /var/folders/4q/frdnh9l10h53gggw1m59gr9m0000gp/T/flyte-f2qjyme6/raw/a888e295fefbdae4023ec2b35e53edcb

config.json
generation_config.json
model.safetensors
pytorch_model.bin
special_tokens_map.json
tokenizer.json
tokeknizer_config.json
training_args.json
$ pyflyte run --remote llm_training.py train \
    --model_id meta-llama/Llama-2-7b-hf \
    --dataset_id togethercomputer/RedPajama-Data-V2 \
    --dataset_name default

Running Execution on Remote.
Image ghcr.io/unionai-oss/llm_training:5quiCD_S3VoDsP0Sr3ZWIA.. found. Skip building.

[✔] Go to https://org.unionai.cloud/console/projects/flytesnacks/domains/development/executions/fe661d1127e84438bb8e to see execution in the console.
Scale
$ pyflyte run --remote llm_training.py train_and_deploy \
    --model_id meta-llama/Llama-2-7b-hf \
    --dataset_id togethercomputer/RedPajama-Data-V2 \
    --dataset_name default \
    --repo_id unionai/Llama-2-7b-hf-finetuned

Running Execution on Remote.
Image ghcr.io/unionai-oss/llm_training:5quiCD_S3VoDsP0Sr3ZWIA.. found. Skip building.
Image ghcr.io/unionai-oss/llm_training:5quiCD_S3VoDsP0Sr3ZWIA.. found. Skip building.

[✔] Go to https://org.unionai.cloud/console/projects/flytesnacks/domains/development/executions/fe661d1127e84438bb8e to see execution in the console.
Deploy

Get started with Union

Union is immediately available for AWS and GCP. Union offers free trials for qualified users. To get started with Union check out the documentation or sign up to be an early user below.

Get a demo

An open platform for your entire team & stacks

Union is designed for data and ML teams who don’t want the overhead of maintaining and managing Flyte™ deployments, setting up Kubernetes infrastructures, and provisioning security and data policies.

Data Engineer
Union.ai

for Data Engineers

Union, as a data orchestrator, empowers DataOps and data engineering in a modern data stack by providing advanced automation capabilities. Data and analytics professionals can leverage Union to create, deploy, and run fully automated and reproducible end-to-end data pipelines.

“Union runs my data pipelines and even connects with Apache Airflow.”

Data Scientist
Union.ai

for Data Scientists

Data science teams benefit from Union’s data science capabilities, which enable efficient data science workflows. With advanced automation features, Union allows data scientists to create, deploy, and execute end-to-end pipelines that are fully automated and reproducible.

“From my research work to effective collaboration on one platform.”

ML Engineer
Union.ai

for ML Engineers & MLOps

ML engineers need a streamlined and scalable approach to their machine learning workflows. With its comprehensive set of tools and efficient design, Union enables ML engineers to easily build, deploy, and manage complex workflows, accelerating their development and delivering more robust and accurate models.

“Deploying and monitoring all my workflows around the clock.”

MLOps/DevOps
Union.ai

for DevOps & Engineering

DevOps and engineers are responsible for managing a wide range of tools, frameworks, and services to build and maintain end-to-end data ecosystems. This can be a complex and time-consuming task, requiring significant expertise and resources. An orchestrator like Union greatly simplifies this process by providing a unified platform for managing and automating data and ML workflows, enabling DevOps and engineers to streamline their workflows and focus on delivering more value to their organization.

“Running our Kubernetes cluster while using Union to manage our data plane.”

Overcome the complex challenges of building scalable ML products

Union accelerates data processing and machine learning for businesses in every industry. It’s built on the trusted open-source project Flyte™, and combines the power and efficiency of Kubernetes with enhanced observability and enterprise features, all fully managed in your cloud account. Data and ML teams can more easily collaborate on optimized infrastructure, boosting their velocity.

Accelerate Experimentation

Break down siloed teams & infrastructure

When data and ML teams work with distributed tooling and infrastructure, communication and collaboration can become difficult. Siloed teams use different tools, processes, data formats, and infrastructure, which can lead to delays, errors, or even scrapping projects due to a lack of alignment.

With Union, simplify the process of sharing work across teams and environments with reusable tasks, versioned workflows, and an extensible plugin system.

Get a demo
Any Cloud

Infrastructure doesn’t have to be difficult

On-prem, hybrid cloud, multi-cloud, multi-region, the options today are endless for choosing the right infrastructure for your projects. These choices offer flexibility to users, but the use of multiple clouds can lead to issues with data consistency, networking, security, and service integrations. This can result in the failure of ML projects and the breakage of infrastructure and applications.

With Union, it is simple to consume resources and services across clouds in one unified platform.

Get a demo
Increased Control

Cost optimization for complex workflows

When infrastructure becomes distributed across different providers and instances, it can be a daunting task to track and forecast usage and spend. It is all too normal to see this lack of visibility leading to tremendous compute costs from underutilized resources with little understanding of when and how it happened.

Union provides real-time visibility and monitoring at the workflow and task level as well as a resource dashboard that includes all of your projects.

Get a demo
RBAC

Increase control & oversight across teams and projects

Without proper governance, teams might not adhere to consistent standards or regulatory requirements for data management, model development, testing or deployment.

With Union, you can simplify the management and security of your platform through enterprise-grade Role-Based Access Control—allowing you to scope individual users to view and execute on specific Projects and Workflows.

Get a demo

Data, ML, research & production

These fine companies among many others create data and ML products with Union’s Flyte™ engine.