MLOps aka Operational AI (Part-3)
In this article, we will see the E2E flow in operationalizing AI on popular open-source tools. The idea is to make the architecture or toolsets more cloud-agnostic.
A typical Machine Learning lifecycle is broadly categorized into 3 classes:
- Data Processing (Data Engineer Domain)
- Model Training (Data Scientist Domain)
- Model Serving (Data Engineer and DevOps Domain)
In this stage, the focus is on building Data Workflow Management, Data Lineage, Data Labelling, Data Quality, and Feature generation.
In this stage, the focus is on Model Experiment tracking, Distributed training, Metadata Management, and Hyperparameter optimization.
In this stage, the focus is on Packaging, Monitoring, Model explainability, and server audit trails.
The below flow diagram is broken into:
- Business Understanding and Data handling
- Feature Engineering (Custom Feature Store with Versioning using Deltalake time travel feature)
- Model Training, Evaluation, and Validation
- Model Management (Model Registry, Experimentation, Versioning, Artifacts, and Metadata tracking)
- Model Explainability.
- Model Deployment and Serving (Packaging, Scaling, throughput, and Monitoring)
The preferred list of tools are as below:
- Workflow Management: Apache Airflow 2.0 / Prefect
- Storage: Any object or block storage like HDFS, S3, or Azure Blob
- Database: Deltalake (Cloud agnostic)
- Data Quality: Deequ/ Great Expectation (Cloud agnostic)
- Feature Store: Custom Build on Deltalake (Cloud agnostic)/ FEAST
- Model Management: MLflow (Cloud agnostic)/ Weights & Biases
- Versioning: Git, GitLab
- CICD: Jenkins/ Github Action
- Model Serving on Kubernetes — Seldon (Recommended)/ KFServing (Cloud agnostic)
- Model Monitoring — Seldon Alibi Detect (Cloud agnostic)
That's it!. In the next post, we will take some deep-dive on the model training area, challenges, and how to solve through MLOps practices.
Docker Tips, Tricks, and Tutorials
Like you, I'm super protective of my inbox, so don't worry about getting spammed. You can expect a few emails per month…
Hands-On Guide To Weights and Biases (Wandb) | With Python Implementation
Everything in Data Science begins with the given data to experiment with and a big amount of time is usually spent on…
Machine Learning Model Serving Overview
TLDR; I’m looking for a way to provide Data Scientists with tools to deploy a growing number of models independently…
Getting Started - alibi-detect 0.6.2 documentation
Alibi Detect is an open source Python library focused on outlier, adversarial and drift detection. The package aims to…
Introducing Airflow 2.0
Apache Airflow was created by Airbnb's Maxime Beauchemin as an open-source project in late 2014. It was brought into…
Picking A Kubernetes Orchestrator: Airflow, Argo, and Prefect
Over the summer, Arthur has been hard at work building our new microservice-based platform. The new platform is a…