Networks and Distributed Computing Series

Some Probabilistic and Combinatorial Techniques in Data Stream Analysis

15th July 2021, 12:00 add to calender
Conrado Martinez
Universitat Politecnica de Catalunya

Abstract

I will give in this talk a brief account of some elegant and mathematically sound techniques
and algorithms that provide practical solutions to fundamental problems in data streaming. These include cardinality estimation (counting the number of distinct elements in a stream), drawing random samples, finding heavy hitters (elements of high relative frequency) and top-$k$ most frequent elements, estimating the similarity of two streams, etc.

In the talk I will focus in one or two of the problems above, give a few well-known examples (the list from which we can select is huge, for example, Probabilistic Counting, LogLog and HyperLogLog, KMV, SpaceSaving, Frequent, StickySampling, Adaptive Sampling, \ldots) and I will also present some of the results of my joint research with A. Helmi, J. Lumbroso, J. Montes, G. Solera, A. Viola and J. Wang, in recent years.
add to calender (including abstract)