MLOps aka Operational AI (Part-3)
--
In this article, we will see the E2E flow in operationalizing AI on popular open-source tools. The idea is to make the architecture or toolsets more cloud-agnostic.
A typical Machine Learning lifecycle is broadly categorized into 3 classes:
- Data Processing (Data Engineer Domain)
- Model Training (Data Scientist Domain)
- Model Serving (Data Engineer and DevOps Domain)
Data Processing:
In this stage, the focus is on building Data Workflow Management, Data Lineage, Data Labelling, Data Quality, and Feature generation.
Model Training:
In this stage, the focus is on Model Experiment tracking, Distributed training, Metadata Management, and Hyperparameter optimization.
Model Serving:
In this stage, the focus is on Packaging, Monitoring, Model explainability, and server audit trails.
The below flow diagram is broken into:
- Business Understanding and Data handling
- Feature Engineering (Custom Feature Store with Versioning using Deltalake time travel feature)
- Model Training, Evaluation, and Validation
- Model Management (Model Registry, Experimentation, Versioning, Artifacts, and Metadata tracking)
- Model Explainability.
- Model Deployment and Serving (Packaging, Scaling, throughput, and Monitoring)
The preferred list of tools are as below:
- Workflow Management: Apache Airflow 2.0 / Prefect
- Storage: Any object or block storage like HDFS, S3, or Azure Blob
- Database: Deltalake (Cloud agnostic)
- Data Quality: Deequ/ Great Expectation (Cloud agnostic)
- Feature Store: Custom Build on Deltalake (Cloud agnostic)/ FEAST
- Model Management: MLflow (Cloud agnostic)/ Weights & Biases
- Versioning: Git, GitLab
- CICD: Jenkins/ Github Action
- Model Serving on Kubernetes — Seldon (Recommended)/ KFServing (Cloud agnostic)
- Model Monitoring — Seldon Alibi Detect (Cloud agnostic)
That's it!. In the next post, we will take some deep-dive on the model training area, challenges, and how to solve through MLOps practices.
Reference: