July 28, 2025

•

Min Read

Introducing Flyte 2.0: Dynamic, Crash-Proof, Resource-Aware AI Orchestration

Ketan Umare

This article charts the most pressing problems that orchestration can solve, and introduces the vision for Flyte 2.0. We’ll publish technical deepdives on Flyte’s capabilities over coming weeks.

<div class="button-group is-center"><a class="button" target="_blank" href="http://www.union.ai/beta">Try now with private Union beta</a></div>

Flyte 2.0 is available today as part of the Union.ai 2.0 private beta.

Why it’s time for Flyte 2.0

“You can’t hardcode a fixed path for exploring complex topics, as the process is inherently dynamic and path-dependent.” — Anthropic engineering team

Flyte has been the open-source AI/ML workflow orchestrator of choice for 8 years, being downloaded tens of millions of times. We’ve succeeded by obsessing over what developers want (or, more often, what they don’t want). This has helped us build an orchestrator that outperforms old-school orchestrators like Sagemaker and Airflow on developer happiness. And now, our obsession has revealed where orchestration is at a breaking point.

As organizations scale their machine learning, LLM, and agentic workloads, the cracks in legacy orchestration frameworks are becoming impossible to ignore. Teams are blocked not by their models or data, but by the tools meant to coordinate them.

Through research and development, we’ve measured the most pressing needs for engineers building AI systems today. Across industries, their pain points were strikingly consistent:

These issues aren’t theoretical. They’re actively slowing down innovation in AI and ML, and innovators are recognizing that orchestration must fix them. But in the race to build long-running, sustainable AI systems, this presents an opportunity to set yourself and your team apart.

“Bringing together tools and data with a bunch of models—fine-tuning and calling them in the chain of thought—that’s the biggest skill that would move people forward.” — OpenAI product team

Flyte 2.0 is fully dynamic, crash-proof, resource-aware AI orchestration.

With the growing complexity of AI systems, choosing the right orchestration layer is foundational to success. Flyte 2.0 is engineered to address the practical challenges teams encounter when building, scaling, and maintaining production-grade AI workflows.

Let’s see how.

Need to learn a DSL → Pure Python authoring

Flyte 2.0 lets you author workflows in actual Python without learning a new DSL. You can write, test, and version your workflows locally in pure Python, then run them at scale in the cloud. Migrating existing scripts is simple, and there's no need to rewrite logic just to fit a framework.

*Dramatically simplified DevEx (tasks simply call tasks)*

Difficult debugging → Streamlined debugging

Debugging is fast and intuitive with Flyte 2.0's built-in visibility and control. You can observe execution state, logs, and failures at every step, catch errors as they happen, and even rerun workflows interactively in a live debugger.

Dynamically rerun with more memory after OOM

Limits of static execution → Dynamic, on-the-fly orchestration

Flyte 2.0 delivers fully dynamic workflows that adapt in real time. From branching logic and loops to dynamic resource allocation, your AI systems and agents can make decisions on the fly at runtime.

*You still have the option of static DAGs, too.

Plan and execute agentic workflows at runtime

Hard to fan out computation → Efficient, scalable workflows

Flyte 2.0 handles large task fanout and parallelism with ease, scaling with you as you grow your AI systems. Use native multi-container pipelines. Autoscaling (including scale-to-zero) means you only pay for the compute infra you need.

*Seamlessly scale out to thousands of containers*

Brittle pipelines → Crash-proof pipelines

Flyte 2.0 is built for resilience, so pipelines that fail can recover and continue where they left off. With built-in caching, retries, and support for custom error handling, Flyte ensures that failures are isolated and never bring down your entire workflow.

Self-healing workflows automatically recover from failures

Best practices hard to implement → Guardrails for rapid iteration

Flyte 2.0 makes it easy to stick to software development best practices to facilitate rapid iteration. It automatically versions your executions, supports multi-tenancy in dev/stage/prod environments, and provisions infrastructure only when needed, keeping costs and complexity low.

*Flyte provides native multi-tenancy through projects and domains*

Flyte 2.0 is a fully functional agent runtime

For those of you paying attention, you may have realized that dynamic, crash-proof, long-running workflows are exactly what you need to code truly agentic agents. That makes Flyte 2.0 a fully functional agent runtime.

“AI orchestration […] is the key to developing reliable enterprise agentic AI systems.” — NVIDIA enterprise AI team

We promise Flyte isn’t the 101st agent framework. Instead, it integrates with any agent framework and delivers durability and resource-awareness, all open-source.

Get started

You can experience Flyte 2.0 today for free as part of our Union.ai 2.0 private beta.

Union.ai is the AI orchestration platform designed to bridge the gap between experimentation and production. Built on the same core engine as Flyte 2.0, Union adds enterprise features like real-time inference, observability, SSO, and RBAC. Customers can run it securely in their own cloud and get dedicated support from the team who built Flyte.

You can preview Flyte 2.0 OSS by installing the SDK.

Over coming months, we’ll regularly provide development updates and share invites to office hours with our product team on LinkedIn and in our Slack community.

Conclusion

We want to end with a sincere thanks to everyone in the Flyte open source community who has helped bring this project to life.

From sharing ideas and reporting issues to contributing code and championing Flyte within your teams, we are eternally grateful to call you friends. Flyte 2.0 is a reflection of that shared effort. It's a big step forward for AI infrastructure, and we're genuinely excited to see how you’ll use it to build the next generation of AI systems. Thanks for being on this journey with us.

<div class="button-group is-center"><a class="button" target="_blank" href="http://www.union.ai/beta">Try now with private Union beta</a></div>

Introducing Flyte 2.0: Dynamic, Crash-Proof, Resource-Aware AI Orchestration

Why it’s time for Flyte 2.0

Flyte 2.0 is fully dynamic, crash-proof, resource-aware AI orchestration.

Need to learn a DSL → Pure Python authoring

Difficult debugging → Streamlined debugging

Limits of static execution → Dynamic, on-the-fly orchestration

Hard to fan out computation → Efficient, scalable workflows

Brittle pipelines → Crash-proof pipelines

Best practices hard to implement → Guardrails for rapid iteration

Flyte 2.0 is a fully functional agent runtime

Get started

Conclusion

More from Union.

What Changes When Experiment Tracking Is Native to the Orchestrator?

Union.ai on Nebius: Orchestrating the Future of AI Workloads in the Cloud

Flyte vs. Ray vs. Flyte + Ray: Choosing the Right Tool for Distributed AI Workflows

Get updates on new features and releases

Solutions

Resources

Company