Data Mining and Machine Learning Series

Big Data Stream Processing and Iterative Learning

20^th November 2019, 11:00
Gautam Pal
XJTLU

Abstract

Big data streaming is a process in which data is quickly processed in order to extract real-time insights. The dataset on which processing is done is the data in motion before even storing into a datastore. Big data streaming is ideally a speed-focused approach wherein a continuous stream of data is processed. When the application cannot wait until the entire dataset being collected, streaming approach can help training the model iteratively with available dataset at hand. Training set can adapt much faster to match with the new updates in raw dataset by re-training iteratively.

This talk explores big data stream ingestion techniques through Apache Kafka, processing in mini-batches and visualizing approaches through Apache Spark and Splunk. The talk will present a method for streaming machine learning.