Industry:
Media & Entertainment
Use Case:
Data

How Warner Bros. Discovery Keeps Its Media Streams Flowing

The company

Created by a merger of WarnerMedia and Discovery Inc. in 2022, Warner Bros. Discovery is growing ahead of schedule. The streaming company added 1.6 million subscribers and announced that it expects to turn a profit in 2023, two years ahead of its guidance to investors.

Getting there means putting WBD’s data about audience behavior to work engaging and retaining each of its 95.8 million subscribers with personalized offers and viewing suggestions. And productizing all that data requires lots of machine learning.

Keeping WBD’s streams on course is Machine Learning Platform Engineer Frank Shen, who works with several teams of data scientists to productize their models.

“We have different models for different purposes,” Shen said. “For customer lifecycles, we can predict the churn of active customers before their next subscription starts. We can put personalized messaging on their screens: ‘Hey, do you want to take this offer? We’re gonna give you this incentive.’ Besides those products, the company uses our scientists’ models to forecast our revenues, our subscription numbers — all kinds of things.”

Another major product area is WBD’s recommendation systems, Shen said, which perform tasks like personalizing what each viewer sees on the “rail” of suggested titles.

Each workflow entails up to 500 features, including viewership, subscriptions and metadata. Tracking those dynamically for almost 100 million active users runs into terabytes of data.

The challenge

In addition to customer products and recommendations, different WBD data science teams work on areas that include personalization, marketing, reporting and growth, Shen said. And each team works with its own tool sets. “Data scientists tend to do their work in notebooks, whether it’s Databricks or Jupyter or SageMaker notebooks,” he said, “and each has a development environment, integration testing, and product and production environments. We have to help them build a platform to automate deployment of their code using our CI/CD process so their products can be used in the production environment.”
WBD’s ML engineers also help reduce duplication of effort by different developers: “We’ll help use shared libraries and modules so they don’t have to duplicate their efforts. And for feature engineering, they can use shared features.”

The engineering team was using Airflow orchestration to run its ML workflows, but the platform presented a number of challenges. “Airflow is a good tool for data engineering, but it’s not perfect for machine learning workflows,” Shen said. “First of all, data scientists’ jobs use a lot of Python modules, and integrating their pure Python into Airflow is a challenge.

“Developing locally was another challenge. Data scientists would develop using notebooks, and notebooks are not compatible with Airflow. So they have to copy and paste their code into the Airflow system, which is a huge pain in the ass. They can’t debug locally, so they have to deploy it to Airflow and see if it works. If not, then they have to do it over again.”

The solution

To eliminate duplication and streamline compatibility, WBD turned to Flyte for orchestration. Flyte’s Python support immediately closed gaps between local development and deployment, Shen said. “Now data scientists start developing locally on their machines not using notebook, but using pure Python and just adding the annotation provided by Flyte. It works everywhere; even if you develop in the notebook, you can just port that function over to Flyte.

“That’s the number one benefit,” he continued “The second is you can compose workflows; one workflow can call other workflows, and you can chain them together and reuse them. Workflow compositions aren’t possible in Airflow.”

Another benefit comes from Flyte’s foundation in Kubernetes. “There’s also the benefit of using the standard cloud computing resources versus maintaining the Airflow resources ourselves. You get scalability managed for you as well — with Airflow, you have to have a dedicated DevOps team to achieve the same benefits.”

The results

Shen said the transition to Flyte has been rapid and effective. “I’ve been working with Flyte for no more than six months,” he said, “and we have already productized several models. And we’re in the process of migrating all of the Airflow training pipelines into Flyte.”

Flyte has empowered Shen and the other ML engineers, he said. “A couple of us are sufficient to support a whole bunch of data scientists because we actually develop reusable Flyte workflows in our platform and publish them so all the teams can benefit. For example, XGBoost is very popular with data scientists for shallow prediction. And we’ve written reusable XGBoost workflows. The other data scientists don’t have to do it over again; they just need to call ‘import my workflow’ and plug in all the configuration parameters their needs. That’s why they don’t need a whole lot of support!”

Shen credits the Flyte community with speeding the transition to Flyte. “With any open-source product, there’s a learning curve, but Flyte’s learning curve is not so high. And we have very good support from the Flyte community. When I post my questions there, I always get a response within a day or two. And those responses are very helpful. And sometimes they identify a need or a gap in their product, so they will quickly have a PR and release a beta version right for us. That’s why we can succeed so quickly: because of their support.”

How much time and effort has WBD saved thanks to Flyte? “There’s no serious study,” Shen said, “but my personal experience says there’s been at least 30% cost savings in efficiency.”