MLOps aka Operational AI (Part-1)

Let's start with a brief story…

The story is all about the struggle of two teams in Production. One is Data Science and another is Data Engineering. The problem is, how to automate the complex, iterative nature of a Machine Learning pipeline in the production environment, as well as bring a flavor of SDLC in AI/ML practice. In other words, how to operationalize AI in Industry?

Enter the world of MLOps!. It's a multi-disciplinary subject from Machine Learning to Data Engineering to Devops to Infrastructure.

Building Blocks of MLOps

The origins of MLOps go back to 2015 from a paper entitled “Hidden Technical Debt in Machine Learning Systems.”
https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf

The above diagram depicts two things:

  1. There are so many important peripheral components apart from writing ML code.
  2. ML code is a small part of the overall journey.

This is wonderful and great!. But, can we get a better flow and a quick insight into what each step contains and the problem it is trying to solve?

Here you go…

A typical steps in MLOps

We will see all details in the subsequent posts but let us understand how it is different from DevOps and what are the challenges of productionizing ML models.

How is it different from DevOps?

  1. In DevOps, we have code versioning. In MLOps, we have Data versioning, manage parameters, metadata, logs, and finally the model.
  2. In DevOps or most of the software projects, the build time is quite irrelevant. In MLOps or Machine Learning, the build process is quite compute-intensive, which takes hours to weeks to train even on GPU.
  3. In DevOps, the software doesn't degrade. In MLOps, the model degrades due to various types of drifts - Concept & Data.

What are the Challenges of productionizing ML models?

  1. Time to Production
  2. Model Management
  3. Lack of Resources
  4. Governance challenge
  5. Changing data and business impact on the model.
  6. Organizing ML experiments
  7. Debugging model training jobs
  8. Production deployment — Batch or online inference
  9. Continuous monitoring for drift and retraining
  10. Autoscaling ML Inference for hosted models

Okay! that's it in this post. Let's dive into various tools used in MLOps in the next post (Part2).

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store