System Architecture

This pipeline demonstrates a production-ready streaming system that ingests clickstream events from Kafka, processes and stores them in Delta Lake (Bronze → Silver → Gold), and performs anomaly detection using MLflow.

🧱 End-to-End Data Flow

This architecture is built on Databricks and leverages the medallion (Bronze/Silver/Gold) architecture pattern to progressively refine streaming event data.

Kafka Delta Streaming MLflow Diagram

Diagram: Real-time events flow from Kafka → Delta Bronze → Silver → MLflow → Gold Delta Table

Orchestrated with Databricks Workflows

The pipeline is orchestrated with Databricks Workflows, which includes separate tasks for ingesting Kafka events, transforming data, running batch inference, and writing scored outputs to Gold.

Databricks Workflow DAG

Workflow DAG: Kafka ingestion → Transformation → ML Inference → Gold Output

MLflow Integration

MLflow powers the model lifecycle management and batch inference logic. This includes:

System Highlights