Geometry and Algorithms Program
Oct 29-31, Friend Center 006, Princeton University, Princeton NJ.
Wednesday, Oct 29
8:15-9:00 breakfast and registration (outside Friend 006)
9:00-10:10 Piotr Indyk, Survey on Compressed Sensing
10:10-10:50 break
10:50-11:25 Stephen Smale, Understanding Patterns in Data hi-res video low-res video
11:25-12:00 Santosh Vempala, Isotropic Principal Components hi-res video low-res video
12:00-2:00 lunch (Friend Center convocation room)
2:00-2:35 James Lee, On the geometry of graphs and L_1 embeddings hi-res video low-res video
2:35-3:10 Joel Tropp, Column subset selection, matrix factorization, and eigenvalue optimization hi-res video low-res video
3:10-3:45 Assaf Naor, The Myth of Metric Dimension Reduction hi-res video low-res video
3:45-4:25 break
4:25-5:00 Yury Makarychev, Lower bounds for Sherali Adams via local-global metrics hi-res video low-res video
5:00-5:35 Robi Krauthgamer, Metric Embeddings As Computational Primitives hi-res video low-res video
Thursday, Oct 30
8:30-9:00 breakfast (outside Friend 006)
9:00-9:35 Guy Kindler, Can cubic tiles be sphere-like? hi-res video low-res video
9:35-10:10 Leonard Schulman, Contraction and Expansion of Convex Sets hi-res video low-res video
10:10-10:50 break
10:50-11:25 Venkat Guruswami, Explicit Euclidean sections from expander codes hi-res video low-res video
11:25-12:00 Ben Recht, Exact Low-rank Matrix Completion via Convex Optimization hi-res video low-res video
12:00-2:00 lunch (Friend Center convocation room)
2:00-2:35 Partha Niyogi, Manifold Learning: A geometric perspective on learning theory and algorithms hi-res video low-res video
2:35-3:10 Sanjoy Dasgupta, Open problems in learning low dimensional representations hi-res video low-res video
3:10-3:45 Anna Gilbert, Applications of Compressed Sensing to Biology hi-res video low-res video
3:45-4:25 break
4:25-5:35 Rump session: open problems, brief announcements, etc
Hi-res: Akavia Alon Arora Khot Magen Naor
Low-res: Akavia Alon Arora Khot Magen Naor
6:15 Banquet at Palmer House (map)
Friday, Oct 31
8:30-9:00 breakfast (outside Friend 006)
9:00-9:35 Yuval Rabani, Explicit construction of a small epsilon-net for linear threshold functions hi-res video low-res video
9:35-10:10 Hamed Hatami, Graph norms and Sidorenko’s conjecture hi-res video low-res video
10:10-10:40 break
10:40-11:00 Navin Goyal, Learning Convex Bodies is Hard (short talk) hi-res video low-res video
11:00-11:35 Gideon Schechtman, Local Versions of Dimension Reduction hi-res video low-res video
11:35-12:10 Ilan Newman, Online embedding of metrics hi-res video low-res video
12:10-1:10 lunch (Friend Center convocation room)
1:10-1:45 Prasad Raghavendra, A Simple SDP Gap Instance for Unique Games hi-res video low-res video
1:45-2:20 Luis Rademacher, Expanders via Random Spanning Trees hi-res video low-res video
2:20-2:55 Moritz Hardt, Rounding Parallel Repetitions of Unique Games hi-res video low-res video
2:55-3:30 Alexandr Andoni, Hardness of Nearest Neighbor under L_infinity hi-res video low-res video
Abstracts
Isotropic Principal Components
Santosh Vempala
We present an extension of standard Principal Component Analysis (PCA) that appears to uncover interesting directions when PCA does not. We apply this to the problem of unraveling a mixture of arbitrary Gaussians, and give an algorithm that succeeds well beyond the known threshold for inter-Gaussian separation. We mention other potential applications.
On the geometry of graphs with forbidden minors
James Lee
I will give a structure theorem for the geometries induced on families of graphs that forbid a fixed minor, and discuss implications for the L_1 embedding/multi-flow conjecture.
Column subset selection, matrix factorization, and eigenvalue optimization
Joel Tropp
Given a fixed matrix, the problem of column subset selection requests a column submatrix that has favorable spectral properties. Most research from the algorithms and numerical linear algebra communities focuses on a variant called rank-revealing {\sf QR}, which seeks a well-conditioned collection of columns that spans the (numerical) range of the matrix. The functional analysis literature contains another strand of work on column selection whose algorithmic implications have not been explored. In particular, a celebrated result of Bourgain and Tzafriri demonstrates that each matrix with normalized columns contains a large column submatrix that is exceptionally well conditioned. Unfortunately, standard proofs of this result cannot be regarded as algorithmic.
This paper presents a randomized, polynomial-time algorithm that produces the submatrix promised by Bourgain and Tzafriri. The method involves random sampling of columns, followed by a matrix factorization that exposes the well-conditioned subset of columns. This factorization, which is due to Grothendieck, is regarded as a central tool in modern functional analysis. The primary novelty in this work is an algorithm, based on eigenvalue minimization, for constructing the Grothendieck factorization. These ideas also result in an approximation algorithm for the $(\infty, 1)$ norm of a matrix, which is generally {\sf NP}-hard to compute exactly. As an added bonus, this work reveals a surprising connection between matrix factorization and the famous {\sc maxcut} semidefinite program.
Metric Embeddings As Computational Primitives
Robi Krauthgamer
Metric embedding has become an indispensable algorithmic tool, and in fact may be viewed as a computational primitive. This talk is intended to explore the boundaries of this approach, by considering richer host spaces or extending the computational model. I will survey recent progress in these directions, geared towards describing open questions.
[Based on joint work with Alex Andoni.]
Can cubic tiles be sphere-like?
Guy Kindler
The unit cube tiles R^d by Z^d, in the sense that its translations by
vectors from Z^d cover R^d. It is natural to ask what is the minimal
surface area of a body that tiles R^d by Z^d. The volume of any such
body should clearly be at least 1, and therefore its surface area must
be at least that of a unit volume ball, which of order sqrt(d). The
surface area of the cube, however, is of order d, and no better tiling
was known. In this work we use a random construction to show that the
optimal surface area is indeed of order sqrt(d), namely there exist
bodies that tile R^d as a cube would, but have sphere-like surface
areas.
Tiling problems were considered for well over a century, but this
particular tiling problem was also recently considered in computer
science because of its relations with the Unique Games conjecture and
with the Parallel Repetition problem. Indeed, our current result
follows from the same idea that was used recently by Raz in his
counter example for the strong Parallel Repetition conjecture.
Joint work with Ryan O’Donnell, Anup Rao, and Avi Wigderson.
Expander codes over reals and explicit Euclidean sections
Venkat Guruswami
Classical results from the 1970’s state that a random subspace of R^N of dimension N/2 has “distortion” O(1) w.h.p where the distortion is measured by sqrt{N} times the largest ratio of the L_2 to L_1 norms of a vector in the subspace. Similar randomized geometric constructions arise in areas like dimensionality reduction and
compressed sensing. Explicit constructions of such objects, however, seem very hard to come by.
Low distortion subspaces are of interest as they lead to good compressed sensing matrices and error-correcting codes over reals. The talk will describe constructions inspired by expander codes that can be used to produce subspaces with low distortion either explicitly or with sublinear randomness. The talk will also touch upon the connection to compressed sensing, and a near-linear time iterative signal recovery algorithm for our expander based construction.
Based on joint works with James Lee, Alexander Razborov, and Avi Wigderson
Exact Low-rank Matrix Completion via Convex Optimization
Ben Recht
How can we infer answers from a survey that is only partially filled out? Suppose we ask a large collection of individuals a series of questions. We collect some data but, unfortunately, many questions are left unanswered. Is it possible for us to make an educated guess about what the missing answers should be? How can we make such a guess? In general, with no additional assumptions, this is impossible. However, if we assume that we can arrange all of the answers into a low rank matrix, there is often a unique assignment for the missing entries.
In this talk, I will discuss very general settings in which one can perfectly recover all of the missing entries of a low-rank matrix from a sufficiently large random subset of entries by solving a convex programming problem. Out of all of the matrices whose entries agree with the given subset, this program finds the matrix with the minimum nuclear norm (also known as the Ky-Fan norm or the Schatten 1-norm). I will show that if the number of sampled entries is greater than C n^{1.25} r log n for some positive numerical constant C, then with very high probability, most n x n matrices of rank r are perfectly recovered by the nuclear norm heuristic. The techniques for proving the main results build upon geometric ideas from the
the recent literature on compressed sensing together with probabilistic tools such as the powerful techniques of Bourgain and of Rudelson for bounding norms of operators between Banach spaces. These results illustrate a general program for perfectly reconstructing geometric objects from very limited information.
Manifold Learning: A geometric perspective on learning theory and algorithms
Partha Niyogi
We consider the following learning-theoretic question. Suppose you are
given a collection of data points (a point cloud) sampled from a
Riemannian submanifold of Euclidean space. What geometric or
topological properties of this manifold can you provably learn
(estimate) from such randomly sampled points? We show how to learn the
laplace operator, the heat kernel, the harmonic forms, and the
homology from random samples. The analysis of the algorithmic
framework requires us to relate the geometry of a suitable
data-derived random graph (or complex) to the geometry of
the underlying manifold. We will consider the implications of this
point of view for machine learning, statistics, and numerical analysis.
Open problems in learning low dimensional representations
Sanjoy Dasgupta
In the past decade, one of the most exciting and fruitful research directions in machine learning has been finding ways to exploit “intrinsic low dimensionality” of data that are superficially high dimensional. Numerous models and algorithms have been proposed and empirically shown to hold promise, but the underlying theory remains largely underdeveloped.
There are several broad types of “intrinsic low dimensionality” that repeatedly arise in learning scenarios. I will give examples of these and explain why they are not adequately captured by existing popular dimensionality notions. I’ll then describe core algorithmic/statistical tasks on such data, along with some of failings of the heuristics by which they are currently handled.
Explicit construction of a small epsilon-net for linear threshold functions
Explicit construction of a small epsilon-net for linear threshold functions
Yuval Rabani
We give explicit constructions of epsilon nets for linear threshold
functions on the binary cube and on the unit sphere. The size of
the constructed nets is polynomial in the dimension and 1/epsilon.
To the best of our knowledge no such constructions were previously
known. Our results match, up to the exponent of the polynomial,
the bounds that are achieved by probabilistic arguments.
As a corollary we also construct subsets of the binary cube that
have size polynomial in the dimension and covering radius of
$n/2 – c\sqrt{n\log n}$, for any constant c. This improves upon the
well known construction of dual BCH codes that only guarantee
covering radius of $\n/2 – c\sqrt{n}$.
Our work is motivated by questions of constructing interesting
geometric objects, known to exist via probabilistic arguments.
We will discuss some of this motivation.
This is joint work with Amir Shpilka.
Graph norms and Sidorenko’s conjecture.
Hamed Hatami
I will prove some results in the direction of answering a question of
Lovasz about the norms defined by certain combinatorial structures:
Inspired by the similarity of the definitions of $L_p$, Schatten, and
Gowers norms, we introduce and study a wide class of norms containing
these, as well as many other norms. It will be proven that every norm
in this class must satisfy a Holder type inequality. Then using this
condition we will verify the Hanner inequality with certain
parameters for this class of norms. This will give a unified proof for
the result of Hanner and a result of Tomczak-Jaegermann, and more
importantly generalizes them to a wider class of norms which includes
Gowers norms. In a different direction we will show an application of
the above mentioned Holder type inequality to a conjecture of
Sidorenko about subgraph densities. I will discuss many related open
problems.
Learning Convex Bodies is Hard.
Navin Goyal
The problem we are really interested in is whether the volume of a convex body in dim n be determined from polynomially many independent uniformly random points from the body. [Note that we do not have the power of membership queries, which is used crucially in the well-known volume estimation algorithms.] We are only concerned with the number of samples, and not the complexity issue of computing the volume from these samples. We show that it is not possible to learn convex bodies in this model using polynomially many random samples. This shows that one cannot compute the volume by first learning the body.
Our construction of hard distribution on convex bodies is quite simple, and may be of independent interest.
Joint work with Luis Rademacher.
Online embedding of metrics
Ilan Newman
In the online setting, one has to produce an embedding of an input
metric that is exposed bit by bit in the fixed host space. No changes
to previous decisions are allowed. The objective is to minimize the
multiplicative distortion.
To the best of our knowledge, no systematic study of online metric
embeddings were studied so far. We initiate such a study, and show
several positive and negative results in this direction.
This is a joint work with Yuri Rabinovich.
Expanders via Random Spanning Trees
Luis Rademacher
Motivated by the problem of routing reliably and scalably in a
graph, we introduce the notion of a {\em splicer}, the union of
spanning trees of a graph. We prove that for any bounded-degree
$n$-vertex graph, the union of two {\em random} spanning trees
approximates the expansion of every cut of the graph to within a
factor of $O(\log n)$. For the random graph $G_{n,p}$, for $p> c\log{n}/n$,
two spanning trees give an expander.
This is suggested by the case of the complete graph, where we prove that two
random spanning trees give an expander. The construction of the
splicer is elementary — each spanning tree can be produced
independently using an algorithm by Aldous and Broder: a random walk in the graph
with edges leading to previously unvisited vertices included in the tree.
A second important application of splicers is to graph
sparsification where the goal is to approximate every cut (and more
generally the quadratic form of the Laplacian) using only a small
subgraph of the original graph. Benczur-Karger \cite{BK} as well as
Spielman-Srivastava \cite{SS} have shown sparsifiers with $O(n \log
n/\eps^2)$ edges that achieve approximation within factors $1+\eps$
and $1-\eps$. Their
methods, based on independent sampling of edges, need $\Omega(n\log
n)$ edges to get any approximation (else the subgraph could be
disconnected) and leave open the question of linear-size
sparsifiers. Splicers address this question for random graphs by
providing sparsifiers of size $O(n)$ that approximate every cut to
within a factor of $O(\log n)$.
This is joint work with Navin Goyal and Santosh Vempala.
Hardness of Nearest Neighbor under L_infinity
Alex Andoni
Recent years have seen a significant increase in our understanding of
high-dimensional nearest neighbor search (NNS) for distances like the
L_1 and L_2 norms. By contrast, our understanding of the L_infinity
norm is now where it was 10 years ago.
In FOCS’98, Indyk proved the following unorthodox result: there is a
data structure (in fact, a decision tree) of polynomial size, which
achieves O(log log d) approximation for NNS in the d-dimensional
L_infinity metric. More generally, the data structure has space
O(n^r), for any r>1, for approximation O(log_r log d).
Our lower bound indicates that Indyk’s unconventional bound might in
fact be optimal. Specifically, we show a lower bound for the
asymmetric communication complexity of NNS under L_infinity, which
proves that this space/approximation trade-off is optimal for decision
trees and for data structures with constant cell-probe complexity.
In the talk, I will start out with an intuitive “information” view of
Indyk’s result, which will also help us understand where the lower
bound comes from.
This is joint work with Dorian Croitoru (MIT) and Mihai Patrascu
(MIT).




