The course presents the mathematical and algorithmical ways of dealing with very large data sets. The need to process such large datasets is ubiquitous now, and arises in Internet search, machine learning, network traffic monitoring, scientific computing, signal processing, and other areas.
In such datasets, even linear complexity is simply not feasible. This invalidates many classical approaches to algorithm and data structure design, and creates the need to represent the data in a fashion that preserves some selected structural properties, but otherwise is highly compressed, and the need for new algorithmmic approaches.
__Example 1:__
High-dimensional bit-vectors (think of: for each user of a social network, we store which items they liked/disliked). Surprisingly, instead of storing each bit-vector explicitly (which would take gigabytes _per user_), we can design relevant sketches. Each sketch takes logarithmic space, and we can query bit-vectors for similarity in logarithmic time!
__Example 2:__
Approximate matrix multiplication: we can multiply matrices much faster, if we are fine with output being a close approximation (in some sense) of the actual product.
__Note:__
This will be a rather *math heavy* course. However, the mathematical tools used will be rather basic (basic probability and linear algebra).
This course should be interesting enough for PhD students.
Materials from 2019 are available here (keep in mind those will be heavily changed): https://github.com/izulin/uwrbigdata