ML Inference & MLflow Integration

This page walks through the ML inference logic applied to the Silver Delta table using a registered Isolation Forest model. MLflow enables tracking, versioning, and evaluation — making this pipeline reproducible and production-ready.

Step 1: Registering the Model in MLflow

The Isolation Forest model was trained on Silver table features and logged to MLflow. It was registered to enable versioned inference.

Model registry showing version control and model ownership.

Step 2: MLflow Run Metadata

This batch inference run logs metrics, parameters, and artifacts such as predictions and evaluation results.

Overview of MLflow experiment run tracking all pipeline metadata.

Step 3: Tracked Metrics & Artifacts

MLflow automatically tracked anomaly score distributions and total events scored for audit and monitoring.

MLflow tracked metrics: 1020 events scored, 0.80 average anomaly score.

Step 4: Output to Delta Gold Table

Scored records are written to `gold_events_scored` with metadata like timestamp, run ID, and prediction flags.

Preview of enriched Delta table showing scored events with anomaly scores.

Step 5: Anomaly Score Distribution

Visualizations highlight the distribution of prediction scores and flagged anomalies.

Confusion matrix and KDE plot used to define decision threshold for scoring.

Step 6: 📊 Summary of Inference Run

Records Scored: 1,020
Model Version: v3 (registered in MLflow)
Avg Anomaly Score: 0.80
Threshold Used: -0.5
Gold Table: gold_events_scored

Step 7: Delta Lake Audit History

Using `DESCRIBE HISTORY`, each ML run is auditable via versioned Delta Lake metadata.

Delta Lake history logs every inference write, including schema version and run ID.