MLOps aka Operational AI (Part-1)

2 min readMay 24, 2021

--

Let's start with a brief story…

The story is all about the struggle of two teams in Production. One is Data Science and another is Data Engineering. The problem is, how to automate the complex, iterative nature of a Machine Learning pipeline in the production environment, as well as bring a flavor of SDLC in AI/ML practice. In other words, how to operationalize AI in Industry?

Enter the world of MLOps!. It's a multi-disciplinary subject from Machine Learning to Data Engineering to Devops to Infrastructure.

Building Blocks of MLOps

The origins of MLOps go back to 2015 from a paper entitled “Hidden Technical Debt in Machine Learning Systems.”
https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf

The above diagram depicts two things:

There are so many important peripheral components apart from writing ML code.
ML code is a small part of the overall journey.

This is wonderful and great!. But, can we get a better flow and a quick insight into what each step contains and the problem it is trying to solve?

Here you go…

We will see all details in the subsequent posts but let us understand how it is different from DevOps and what are the challenges of productionizing ML models.

How is it different from DevOps?

In DevOps, we have code versioning. In MLOps, we have Data versioning, manage parameters, metadata, logs, and finally the model.
In DevOps or most of the software projects, the build time is quite irrelevant. In MLOps or Machine Learning, the build process is quite compute-intensive, which takes hours to weeks to train even on GPU.
In DevOps, the software doesn't degrade. In MLOps, the model degrades due to various types of drifts - Concept & Data.

What are the Challenges of productionizing ML models?

Time to Production
Model Management
Lack of Resources
Governance challenge
Changing data and business impact on the model.
Organizing ML experiments
Debugging model training jobs
Production deployment — Batch or online inference
Continuous monitoring for drift and retraining
Autoscaling ML Inference for hosted models

Okay! that's it in this post. Let's dive into various tools used in MLOps in the next post (Part2).

Reference

MLOps aka Operational AI (Part-1)

Written by AI & Data Engineering