Performance

Performance

Optimized performance

Run complex AI workloads with unparalleled performance, scale, and efficiency. Achieve millisecond-level execution times with reusable containers. Scale out to multiple regions, clusters, and clouds as needed for resource availability, scale or compliance.

Reusable containers

Accelerate GenAI-style workloads by pre-loading models and data into “warm” execution environments. Leverage actors across multiple workflows to amortize container startup cost across many executions.

Flexible environment specification 

Just like Flyte tasks, actor environments allow you to customize infrastructure and dependencies. Load a custom model onto a GPU and use torch to run on-demand predictions with low latency.

Startup and teardown logic

Define operations which should take place before and after your main code logic runs. Download and load an LLM into memory as the actor comes online so subsequent prompts run quickly. Perform teardown operations like entering chat history into a database to monitor and improve performance over time.

Integration with Map Tasks

Run actors inside map tasks to speed up executions and use fewer resources. Instead of spinning up one environment per job (as is the case with regular map tasks), mapping over Actors allows you spread the cost of starting each environment over multiple jobs.

Multi-cluster & multi-cloud

Scale out to multiple clusters

Go beyond the limits of a single cluster while maintaining one access point. Union automatically load balances across multiple clusters, allowing you to run workloads spanning hundreds of compute nodes and tens of thousands of concurrent jobs.

Leverage hard-to-find GPU resources

Route workloads to different cloud providers based on GPU availability or pricing. A team running primarily on AWS can use Union to send GPU-based workloads to a cluster running in GCP. Union currently supports AWS and GCP, with Azure coming soon.

Isolate important workloads

Union lets you create hard physical and network isolation between normal jobs and mission-critical workloads. Run development, staging, and production workflows in separate AWS accounts or GCP projects as suggested by the AWS and GCP cloud architecture frameworks.

Optimized parallelism

Execute workflows in parallel

Union allows you to map over entire workflows, dramatically raising the ceiling for the number of parallel jobs. Union supports workflows with nested parallelism of up to 100,000 tasks. Mapping over workflows allows users to write more logical code, with each sub-workflow greedily moving forward to completion and handling errors and retries independently.

Full-workflow caching

Leverage full-workflow caching to dramatically increase efficiency and execution speed. When using nested parallelism, caching the results of each sub-workflow ensures that no computation is repeated unnecessarily.

Accelerated datasets

Perform large data reads in seconds, not minutes. Accelerated datasets are pre-mounted to each compute node, dramatically reducing startup time for tasks which would otherwise need to download data over the network before running.

Managed compute plugins

Run large data processing jobs on Spark, Dask, or Ray and distributed training jobs on Torch Elastic or Tensorflow natively on Union. Framework-specific logs and metrics are available in hosted Spark History Server, Ray Dashboard, and Dash Dashboard. Switch between running Spark jobs on Union and hosted providers such as Databricks with a single-line code change.

Build

Build

Delightful to build on

Shorten the development loop from hours to seconds while writing production-ready code. See runtime logs directly in the UI and perform line-by-line debugging of remote jobs in a browser-based IDE. Automate and reactively trigger workflows across the platform.

Image builder

Union’s image builder makes pulling in ad-hoc dependencies seamless. Just add your dependency and run your workflow. Union will build your image in the cloud and automatically pull it down into your execution. Previously-built images are automatically cached, so subsequent runs will skip building.

Self-serve secrets

Create and manage secrets on demand. Define secrets directly in the SDK and avoid the hassle of interfacing with cloud provider secrets managers. Secrets can be scoped at an organization-wide level or for specific projects. Union securely stores secrets in your cloud provider’s secrets manager, in your VPC.

UI-based logging

Union displays runtime logs directly in the UI, avoiding the need to authenticate and search through third party logging services. 

Interactive tasks

When a remotely-running job fails, Union can automatically freeze the memory state and attach a browser-based VS Code debugger to the remote environment. You can then step through the code line-by-line, debug the failure, then continue the execution. Observe what is actually happening at runtime and avoid spending costly cycles trying to replicate remote environments.

Artifacts

A registry for models and data across the organization

Track important task and workflow outputs as artifacts to make them searchable, discoverable, and consumable by other teams. Store important runtime information such as hyperparameter values using model cards. Use Union’s UI to easily search across all the artifacts in a given project.

Enable separation of concerns between teams

Set up your workflows to automatically consume the latest version of an upstream team’s outputs using artifact queries. Maintain reproducibility while modularizing workflows into team-specific functions, allowing teams to iterate independently of each other. 

Understand lineage across the AI lifecycle

When you use artifacts to stitch workflows together, lineage is automatically tracked across the meta-workflow. Trace a given model’s predictions all the way back to the precursor datasets on which the model was trained.

Reactive workflows

Reactive workflows let you automatically trigger workflows in response to external events. Artifacts form the “contracts” between workflows, so a downstream team can automate their workflows in response to an upstream team’s output becoming available. Cross-workflow lineage is automatically tracked, and everything is versioned, giving you deep visibility into historical operations.

Efficiency

Efficiency

Unparalleled efficiency

Boost ROI by enabling teams to access the resources they need while sharing underlying infrastructure. Execute long-running jobs on spot instances with checkpointing and automatic preemption recovery. Provision specific resources for each task within a workflow and tune them over time with comprehensive observability.

Flexible infrastructure

Union enables you to efficiently utilize fractional GPUs, harness cost-effective Spot instances, and seamlessly integrate custom silicon like TPUs. Streamline your AI workflows to achieve peak performance with optimal resource allocation.

Efficient autoscaling

Union integrates latest-generation cluster management with Karpenter to optimize your resource allocation and scalability. This setup enables you to automatically adjust capacity and performance, ensuring operational efficiency. Experience streamlined management and reduced costs, letting you focus on innovation without infrastructure constraints.

Scale to zero

Automatically scale your operations to zero, effectively minimizing resource wastage by avoiding underutilized nodes. This capability ensures you only pay for the cloud resources you actually use, enhancing overall efficiency. Optimize your costs and environmental footprint seamlessly with Union's smart scaling.

Task-level monitoring

Leverage task-level monitoring for precise tracing of CPU, memory, and GPU usage per task. This feature allows you to right-size resources efficiently, ensuring optimal allocation based on actual workload demands. Streamline your operations and enhance cost efficiency by tailoring resource use to each specific task.

Cluster-wide observability

Union enhances your operational oversight with cluster-wide observability, allowing you to easily monitor resource consumption by different teams over time. This comprehensive visibility helps you optimize resource allocation and improve efficiency across your organization. Gain control and insight into your operations, ensuring each team uses resources effectively.

Enterprise

Enterprise

Enterprise ready

Leverage a robust platform that meets rigorous standards for security, compliance, and operational reliability. Map specific teams or individuals to specific projects with granular role-based access controls. Deploy Union into your cloud so your data and code never leave your VPC.

Deployed in your cloud

Union can be seamlessly deployed into your cloud environment, enabling you to maintain stringent security and data locality standards while benefiting from a managed platform. This integration allows you to leverage Union's powerful features without compromising on control or compliance. Enhance your operations with robust security and efficient data management, tailored to your organizational needs.

Role-based access control

Union provides robust role-based access control (RBAC), enabling you to precisely manage team access to specific projects while efficiently sharing underlying resources. This feature ensures that sensitive data and operations are safeguarded, maintaining organizational integrity. Customize permissions seamlessly, aligning with your team's structure and workflow requirements.

Global regions

Union's deployment across global regions ensures you can meet diverse regulatory requirements and optimize costs and resource availability, including GPUs. This geographical flexibility allows you to position your services closer to your data sources and users, enhancing performance and compliance. Tailor your deployment strategy with Union to maximize efficiency and adhere to local standards effortlessly.

High availability

Union ensures high availability for your mission-critical workloads, providing a highly-resilient system that allows you to operate with confidence. This managed Flyte offering minimizes downtime and ensures your operations continue seamlessly, even under demanding conditions. Rely on Union for stability and peace of mind, knowing your crucial processes are always supported.

Secure

Union enhances your operational security by adhering to SOC 2 Type II compliance, ensuring rigorous oversight and management of data protection. This certification reflects Union’s commitment to high standards of security and confidentiality, giving you confidence in the integrity of your data. Rely on Union for a secure, managed platform that meets stringent industry requirements and safeguards your sensitive information.

Flyte experts

Leverage the full potential of your AI operations with Union, benefiting directly from the expertise of the team that created Flyte. This managed offering provides you with unparalleled insights and support, ensuring you maximize the capabilities of your AI platform. Work with the originators of Flyte to optimize, innovate, and excel in your AI initiatives.