Networks and Distributed Computing Series
Some Probabilistic and Combinatorial Techniques in Data Stream Analysis
15th July 2021, 12:00
Conrado Martinez
Universitat Politecnica de Catalunya
Abstract
I will give in this talk a brief account of some elegant and mathematically sound techniques
and algorithms that provide practical solutions to fundamental problems in data streaming. These include cardinality estimation (counting the number of distinct elements in a stream), drawing random samples, finding heavy hitters (elements of high relative frequency) and top-$k$ most frequent elements, estimating the similarity of two streams, etc.
In the talk I will focus in one or two of the problems above, give a few well-known examples (the list from which we can select is huge, for example, Probabilistic Counting, LogLog and HyperLogLog, KMV, SpaceSaving, Frequent, StickySampling, Adaptive Sampling, \ldots) and I will also present some of the results of my joint research with A. Helmi, J. Lumbroso, J. Montes, G. Solera, A. Viola and J. Wang, in recent years.
Ashton Street, Liverpool, L69 3BX
United Kingdom
Call the department
+44 (0)151 795 4275