Sara Gawlinski

Introducing the Modern AI Orchestration Resource Hub

Summary

Artificial intelligence pipelines are infrastructure intensive. They may entail machine- or deep-learning models that are more iterative...

Artificial intelligence pipelines are infrastructure intensive. They may entail machine- or deep-learning models that are more iterative than traditional data workflows; consume many compute-intensive resources like GPUs; and integrate with complex frameworks and tools for model training, feature engineering, distributed training and monitoring. 

When Union.ai interviewed engineers about building, deploying and managing AI pipelines, they told us about pain points and challenges orchestrating the processes in their workflows — specifically, employing their current orchestration platforms and homegrown solutions. 

That prompted us to create resources for teams trying to migrate workflows on the Airflow orchestrator to our Flyte orchestration engine. Those resources include the Airflow vs Flyte cheat sheet and the Flyte Airflow provider. We’re also sharing more tutorials and blogs for those with AI use cases, including How to Serve ML Models with Banana, Building FlyteGPT on Flyte and LangChain, and Introducing the FlyteCallback for HuggingFace

That’s why the Union team is excited to share the launch of our modern AI orchestration resource hub as a way to consolidate and share insights and content with anyone working to productionize and scale AI pipelines. Here are just a few stories from our users as they navigated the many complex and tempestuous nature of AI pipelines. 

Back in 2017 to 2018, Gojek was using a small abstraction on top of Airflow for its ML pipelines. While the platform was able to handle 1000+ DAGs, the abstraction wasn’t the easiest to use. Pradithya Aria Pura from the Data Science Platform team described a less-than-ideal development experience, unscalability of scheduler, missing DAGs, and infrastructure overhead. To address these hurdles, Gojek evaluated other workflow orchestrators and ultimately landed on Flyte. “Some of the things we love so far is the dynamic workflow and flow control,” Pura said. “It’s quite powerful for building the pipeline. When it comes to productionizing a pipeline, there are only a few platforms that provide versioning which is critical for when we want to rollback a workflow version if a bug was introduced into the pipeline”. 

Alectio is a DataPrepOps company providing an MLOps platform for data-centric AI. The company initially used Airflow to manage its workflows, which repeated the same operations with different versions of the model and subsets of data. These workflows required extensive versioning and stability throughout their iterations. Jennifer Prendki, founder and CEO of Alectio said: “Airflow was the natural option for data engineering and we constantly used to run into problems where processes didn’t scale or errors that couldn’t be tracked, repeated or debugged. We would run into problems with versioning capabilities. Airflow doesn’t seem to play nicely with the communication of data from one block to another. For a data-centric AI approach, this was a disaster.” Alectio decided to start evaluating other orchestration platforms and chose Flyte. According toPrendki, “Flyte was not only better than Airflow, but it was better than any other option on the marke.t”  

Our goal for the modern AI orchestration resource hub is to help demystify the challenges you might be facing and provide you with insights and helpful resources from engineers that have already experienced it. In this space, you’ll discover an array of invaluable content, from in-depth blog posts, conceptual videos, enlightening user stories, checklists and workshops, all designed to equip you with the knowledge you need to tackle your AI use cases and turn your ideas into production-ready workflows. Be sure to bookmark this page and check back often, as we will continue to update with new posts and upcoming events. 

AI Orchestration