I entered the world of data science and machine learning almost 10 years ago, enamored with the idea that statistical models enable us to obtain insights about the world and foresight into it.
Coming from the fields of biology and public health, this was particularly appealing, since both disciplines operate squarely within the realmof complex adaptive systems: making progress in these fields increasingly requires both clever experimental design and large-scaledata processing.
At the time, many of the tools that we know and love today – like Pandas, Pytorch, and Tensorflow – were non-existent, in their infancy, or yet to displace the incumbents of the day in a big way.
The ML Tooling Explosion
One set of tools that was inaccessible to many DS and ML practitioners were those that enabled me to scale, deploy, and monitor models. Sure, there were adjacent tools in the software engineering space, like virtual compute infrastructure, database services, and application monitoring tools, but I’d often have to heavily customize them if I wanted to tailor them to specific ML use cases.
I’ve been in teams where we had to roll our own hyperparameter optimization, data quality, and model serving solutions not because we wanted to, but because the higher-level frameworks and standardized tooling were simply not there yet.
Today, however, we have the opposite problem: it seems like every week there’s a new machine learning model, dataset, or training environment coming out of both big and small research labs. Similarly, we’ve seen the rise of MLOps as a specialization of DevOps, and an explosion in the number of libraries that aim to bridge the gap between research and production. This includes tools to annotate data, train models, deploy them to production, and monitor their behavior in the wild.
With all of the different choices one must make to compose together an ML stack, I’ve been thinking a lot about what a standard interface for machine learning on the web might look like, much like what the HTTP methods are to the HTTP protocol.
So what would it look like if ML-driven web apps are as easy to write as regular web apps?
Before going further in this exploration, I’d like to elaborate a little more on the problem underlying this question.
Bridging the Gap between Research and Production
The growing ML ecosystem has led to the increased complexity, friction, and boilerplate associated with building and maintaining a machine learning stack. There are many ways to to combine all of these different tools together, and this in turn leads to the following challenges:
- Porting code from a research setting to production is often time consuming.
- The skills needed to train a performant model are different from those needed to deploy and maintain that model in production.
- The data-science-to-engineering hand-off of models often leads to duplication of effort and the need to re-implement logic in multiple places.
Which leads me to a category of solutions that I think hold a lot of promise…
Functional, Web-native Frameworks for Machine Learning
Is it possible to define a standard set of functions for machine learning that can be reused in many different contexts, from model training to prediction and beyond?
As an answer to this question, today I’m super excited to announce UnionML, an open source Python framework that empowers data scientists and ML engineers to easily build and deploy ML microservices!
Built on top of Flyte, UnionML provides a high-level functional interface for expressing common ML operations as Python functions that are automatically bundled into microservices. It empowers you to abstract away the infrastructure and MLOps tools needed to support your model both in the lab and production so that you can focus more on curating the best dataset and training the best model for the problem at hand.
Our vision with UnionML is to make building ML apps as easy as building regular web apps.
Simply install UnionML via pip:
Along with our friends Scikit-learn and Pandas:
Then we define an app.py file:
And that’s it 🙌 ! By defining these four functions, you have a minimal UnionML app!
With the first minor release of UnionML, we focused on the core use cases of offline model training and offline/online prediction. You can train models locally in the IDE of your choice, scale them up with a Flyte cluster, and serve them via a FastAPI web endpoint or AWS lambda function.
UnionML aims to provide the simplest implementation of the complete ML stack, but we also want to integrate with other projects that provide more feature-complete pieces to the ML puzzle:
UnionML is in its early days, and like with any open source project, the community is the project, so if you’re interested in contributing, we’re calling on you to: