This pipeline demonstrates a production-ready streaming system that ingests clickstream events from Kafka, processes and stores them in Delta Lake (Bronze → Silver → Gold), and performs anomaly detection using MLflow.
This architecture is built on Databricks and leverages the medallion (Bronze/Silver/Gold) architecture pattern to progressively refine streaming event data.
Diagram: Real-time events flow from Kafka → Delta Bronze → Silver → MLflow → Gold Delta Table
The pipeline is orchestrated with Databricks Workflows, which includes separate tasks for ingesting Kafka events, transforming data, running batch inference, and writing scored outputs to Gold.
Workflow DAG: Kafka ingestion → Transformation → ML Inference → Gold Output
MLflow powers the model lifecycle management and batch inference logic. This includes:
avg_anomaly_score
, run_id
, and sample_predictions.json
DESCRIBE HISTORY