Algorithms and Models for Big Data (Fall 2014)

Instructor: Periklis A. Papakonstantinou []
Teaching assistant: Yuanxi Dai
Lectures: Monday 1.30-4.05pm (3x45min) at TX112
Office hours: Monday 11am-12pm (or by appointment), FIT 4-608-7

Dec 26
The third problem-set is out

Start summarizing your term-project research in a single deliverable document. The deadline is in two weeks!


testing properties of very big objects, computation over data-streams, finite metric embeddings and high-dimensional geometry, inference and statistics.

Emphasis in the interface between mathematical analysis and engineering realization.
For more details see course outline.


In your online course registration system you have received a login and a password to read through the class project descriptions.

The topics of the projects are formulated by Alexandr Andoni, Yuval Filmus, Guang Yang, David Woodruff, and the instructor of this course. Some of these questions were well known before in the literature whereas many others are new research directions, some are small & new research projects (appropriate for a class project), and others are based on new ideas building upon existing ones. All of them are adjusted and made appropriate for a term project for an advanced undergraduate class. The questions that have not appeared before themselves constitute topics that can only be discussed confidentially with the students taking this class (and only them).

By clicking on the link below you automatically agree that you will not discuss questions and their variants without the written permission of the person who supervises the research project.

I agree take me to the term project description page

* click on the lecture date for reading material

Date Topic
Finish up the proof of Bourgain's theorem
Streaming, sketching, and finite metric embeddings
Basic technqiues: Dimensionality reduction (Johnson-Lindenstrauss), Bourgain's theorem. Full proof details and algorithmic implications.
Introduction to finite metric embeddings and applications
Sketching a stream, the AMS sketch for frequency moment estimation
Hashing and its likes for efficient randomized computation over data-streams
More details about testing graph properties
Introduction to streaming models and algorithms
Finish up the proof details of KKL
Back to Property Testing: Szemeredi's regularity lemma and applications

Discussion & Case-Studies: Examples from real-world applications
how does theory bind to practice & shortcomings of the current theory
Introduction to hypercontractivity.
The Kahn-Kalai-Linial inequality: influences, noise operators, and their relations, start with the proof of KKL
Full details on Masrour-Kushilevitz (Goldreich-Levin)
Finish up the BLR test analysis and discussion
Estimating fourier coefficients; learning boolean functions; concentrated specta; a discussion on the Masrour-Kushilevitz algorithm: its application domain and limitations
Elements of Fourier Analysis on the Boolean Cube
The BLR test and its analysis
Measure concentration: working examples on Chernoff-Hoeffding, concentration for pairwise independence and beyond
Introduction to Property Testing
Sep 22
What do we mean by "Big Data"?
What is this class about: high-level overview of property testing, streaming, fourier analysis, high-dimensional geometry.
First topic: Basic inequalities in discrete probability
Readings for the 13th lecture:
  • Read chapters 15 from Jiri Matousek's textbook

next lecture: wrap-up, what was this course about and what is Big Data Science
Readings for the 12th lecture:
  • Read chapters 15 from Jiri Matousek's textbook

next lecture: finite metric embeddings and streaming algorihms
Readings for the 11th lecture:
  • Read the excellent survey by Piotr Indyk [link] and the even more excellent overview by Nati Linial [link]

next lecture: embedding every metric space into ell_p and dimensionality reduction
Readings for the 10th lecture:
  • Read chapter 5-6 from Amit Chakrabarti's lecture notes on streaming algorithms

next lecture: introduction to finite metric embeddings
Readings for the 9th lecture:
  • Read chapters 3-5 from Amit Chakrabarti's lecture notes on streaming algorithms

next lecture: sketches and algorithms over data-streams
Readings for the 8th lecture:
  • Reading material is based on in-class lecture (ask for lecture notes)
  • An elementrary proof of Szemeredi's regularity lemma by Adam Marcus can be found here
    A series of very nice blog-posts (both on Szemeredi's lemma and Szemeredi's theorem) can be found in the Terrence Tao's blog-posts, and a more high-level exposition for property testing in Luca Trevisan's blog-post
  • Read the first 2 chapters from Amit Chakrabarti's lecture notes on streaming algorithms

next lecture: continue with basic Streaming Algorithms
Readings for the 7th lecture:
  • Same as for the previous lecture. I'll give references for Szemeredi's Regularity Lemma

next lecture: begin with Streaming Algorithms
Readings for the 6th lecture:
  • From O'Donnel's text: the entire Chapter 9 (check the previous chapters for prerequisites as necessary)

next lecture: finish up the proof of KKL -- Szemeredi's regularity lemma and applications to property testing
Readings for the 5th lecture:
  • By now you should know in detail (most important parts covered in class -- but you should study all): Chapter 1-3 from Ryan O'Donnel's text, and Chapter 1 from the measure concentration text
Readings for the 4th lecture:
  • The 3rd chapter from Ryan O'Donnel's textbook (see handouts)

next lecture: the Goldreich-Levin algorithm
Readings for the 3rd lecture:
  • The 1st chapter from Ryan O'Donnel's textbook (see handouts)

next lecture: learning boolean functions
Readings for the 2nd lecture:

next lecture: intro to Fourier Analysis on the Boolean Cube