COMP529

Big Data Analysis

Aims

To introduce the student to middleware​ often used in Big Data analytics.

​To introduce the student to implementing algorithms using such middleware.

Syllabus

​​Week 1: Introduction to Big Data, motivating real-world applications and assumed dependencies (including discussion on Operating System)

 

Week 2: Setting up Middleware for batch analytics with a specific focus on installing Hadoop and running a Map-Reduce job.

Week 3: Introduction to Probabilistic Modelling of large datasets (eg Latent Dirichlet Allocation).

Week 4: Scalable algorithms for analysing large datasets (ie Bayesian Network algorithms).

Week 5: Porting such algorithms to Hadoop.

Week 6: Real-world applications of batch analytics.

Week 7: Setting up Middleware for Streaming Analytics with a specific focus on installing, IBM’s Infosphere Streams and adding a streaming operator.

Week 8: Introduction to Sequential Bayesian Inference.

Week 9: Algorithms for analysing streaming data (eg Kalman filter).

Week 10: Porting such algorithms to Streams.

Week 11: Real-world applications of streaming analytics.


Week 12: Beyond separate batch and streaming analytics.


Recommended Texts

Hadoop: The Definitive Guide, Tom White, ​ISBN-10: 1449311520, Third Edition. 2012.

Learning Outcomes

​Understanding of algorithmic approaches for handling batch and streaming analysis.

​Understanding of middleware that can be used to enable algorithms to scale up to analysis of large datasets.

Understanding of the impact of the middleware on how algorithms a​re articulated.

Learning Strategy

Lectures

Tutorials

Private study

Formal Lectures: Students will be expected to attend three hours of formal lectures in a typical week plus one hour supervised tutorial.

Private study: In a typical week students will be expected to devote six hours of unsupervised time to private study. The time allowed per week for private study will typically include three hours of time for reflection and consideration of lecture material and background reading, and three hours for completion of practical exercises.​