MLOps aka Operational AI (Part-3)

3 min readJun 5, 2021

In this article, we will see the E2E flow in operationalizing AI on popular open-source tools. The idea is to make the architecture or toolsets more cloud-agnostic.

A typical Machine Learning lifecycle is broadly categorized into 3 classes:

Data Processing (Data Engineer Domain)
Model Training (Data Scientist Domain)
Model Serving (Data Engineer and DevOps Domain)

Data Processing:

In this stage, the focus is on building Data Workflow Management, Data Lineage, Data Labelling, Data Quality, and Feature generation.

Model Training:

In this stage, the focus is on Model Experiment tracking, Distributed training, Metadata Management, and Hyperparameter optimization.

Model Serving:

In this stage, the focus is on Packaging, Monitoring, Model explainability, and server audit trails.

The below flow diagram is broken into:

Business Understanding and Data handling
Feature Engineering (Custom Feature Store with Versioning using Deltalake time travel feature)
Model Training, Evaluation, and Validation
Model Management (Model Registry, Experimentation, Versioning, Artifacts, and Metadata tracking)
Model Explainability.
Model Deployment and Serving (Packaging, Scaling, throughput, and Monitoring)

The preferred list of tools are as below:

Workflow Management: Apache Airflow 2.0 / Prefect
Storage: Any object or block storage like HDFS, S3, or Azure Blob
Database: Deltalake (Cloud agnostic)
Data Quality: Deequ/ Great Expectation (Cloud agnostic)
Feature Store: Custom Build on Deltalake (Cloud agnostic)/ FEAST
Model Management: MLflow (Cloud agnostic)/ Weights & Biases
Versioning: Git, GitLab
CICD: Jenkins/ Github Action
Model Serving on Kubernetes — Seldon (Recommended)/ KFServing (Cloud agnostic)
Model Monitoring — Seldon Alibi Detect (Cloud agnostic)

That's it!. In the next post, we will take some deep-dive on the model training area, challenges, and how to solve through MLOps practices.

Reference:

Docker Tips, Tricks, and Tutorials

Like you, I'm super protective of my inbox, so don't worry about getting spammed. You can expect a few emails per month…

nickjanetakis.com

Hands-On Guide To Weights and Biases (Wandb) | With Python Implementation

Everything in Data Science begins with the given data to experiment with and a big amount of time is usually spent on…

analyticsindiamag.com

Machine Learning Model Serving Overview

TLDR; I’m looking for a way to provide Data Scientists with tools to deploy a growing number of models independently…

medium.com

Getting Started - alibi-detect 0.6.2 documentation

Alibi Detect is an open source Python library focused on outlier, adversarial and drift detection. The package aims to…

docs.seldon.io

Tech - Seldon

Take your machine learning projects from proof to production.

www.seldon.io

Introducing Airflow 2.0

Apache Airflow was created by Airbnb's Maxime Beauchemin as an open-source project in late 2014. It was brought into…

www.astronomer.io

Picking A Kubernetes Orchestrator: Airflow, Argo, and Prefect

Over the summer, Arthur has been hard at work building our new microservice-based platform. The new platform is a…