Now that we’ve had a glimpse of what it means to analyze data sets in different dimensions, we should take a little detour to consider really high dimensional data. In the discussion of regression, I suggested using your intuition about planes in three-dimensional space to understand hyperplanes in higher dimensions. This is a great way to get around the fact that we can’t visualize dimensions higher than three, but we still have to be careful to understand how geometry in higher dimensions is different from geometry in two or three dimensions. As it turns out, a number of aspects of higher dimensional geometry are quite counter-intuitive and this is a major component of what’s called *the curse of dimensionality*.

From the perspective of regression, the most important thing about high dimensional data is that the more dimensions there are, the easier it is to find a hyperplane that is close to all the points and the less meaningful such a hyperplane is. We know that for any two points in a plane, there is a line that goes through both of them. Similarly for higher dimensions, if the number of data points is equal to the number of dimensions then there will always be a hyperplane that contains all the data points. Moreover, if there are fewer points than dimensions, there will be infinitely many such hyperplanes. Keep in mind that when all the points lie in the hyperplane, any error in one of those points will be amplified as hyperplane gets farther from it. (Think of it like a lever.) This means that any answer given by regression on such a data set should be viewed with suspicion.

As it turns out, data sets where the number of data points is less than the number of dimensions are quite common. For example, one of the current (as I write this) Kaggle competitions has 283 data points in the training set and around 28,000 dimensions (7,000 features extracted from each of four handwriting samples from each person.)

The second problem with high dimensional data is that because of the way distance is calculated, there are fewer points near any given data point, and a lot of points of roughly the same distance. For example, the ratio between the volume of a (high dimensional) ball of radius one and a high dimensional ball of radius two very quickly gets larger and larger as the dimension increases. In other words, the volume in the ball gets more and more concentrated near its outer boundary. You can check this by working out the volume formula for higher dimensional balls, but I won’t make you – Instead, lets consider a more concrete example.

We’ll start with a fairly simple picture: Take a square and in each corner draw a circle so that the four circles are as large as possible, just barely touching each other. In the middle of these four circles, draw a fifth circle (shown in blue in the Figure below) so that it just touches the four corner circles. It’s not too hard to calculate the radius of this middle ball if you remember the two-dimensional distance formula, but lets not worry about that right now. (Feel free to work it out on your own.)

We can do something very similar in three dimensions: Take a cube and put a sphere in each corner so that they just barely touch each other, as in the middle picture below. There are eight corners, so there are eight spheres. If we look at the cube from the side, like on the right of the picture, it looks a lot like the original square, except that each circle has become two spheres, one behind the other. We can also put a sphere between the eight corner spheres and make it as large as possible so that it just barely touches them. You can’t see the blue sphere in the middle picture because it’s behind one of the corner spheres, but you can see it on the right.

Notice that in the picture on the right, the blue sphere appears to overlap the spheres in the corners. Even though it just barely touches them, the points where they touch are just behind the corner spheres. So when we project the three dimensional configuration into two dimensions, they appear to overlap. If the square and the cube both have sides of length one then the eight corner spheres will have the same radius as the four corner circles, namely . However, the blue sphere will have slightly higher radius than the blue circle. (Again, you can calculate the exact radius using the three-dimensional distance equation.) So the corner spheres in three dimensions are farther from the center than the corner circles in two dimensions.

What would happen if we did this in dimension four? In other words, we will take a four-dimensional cube and put sixteen (higher-dimensional) spheres in its corners so that they just barely touch, then place a blue (higher dimensional) sphere between them that is just big enough to touch the sixteen corner spheres. I don’t expect you to be able to picture this, but hopefully you can believe that if we lived in four-dimensional space this would be possible. In fact, we can use what we know about two and three dimensions to say a little more.

We saw that when we looked at the three-dimensional cube from the side, the blue center sphere was slightly bigger because of the extra dimension. If we lived in four-dimensional space, we could similarly look at the four-dimensional cube from the side so that it looked like a three-dimensional cube. From this angle, we would find that the corner spheres appear to overlap the center sphere. So the blue center sphere in four dimensions is even larger than the cube in three dimensions, which in turn is larger than the circle in two dimensions. However, the side lengths of the cubes are the same in each dimension!

Again, you could calculate the radius of the center sphere in the four-dimensional cube, and we would see that it is increasing. In fact, we can calculate the radius for dimension five, six and so on. An interesting thing happens when you get to dimension sixteen: The radius of the center sphere is exactly one half, which mean that it touches the sides of the (sixteen-dimensional) cube. In dimension seventeen, its radius is even bigger and it crashes through the sides of the cube. This is, of course, very counter-intuitive from the two- and three-dimensional pictures and it serves to illustrate the *curse of dimensionality*: While we can use our intuition from two and three dimensions to understand some aspects of higher dimensional geometry, there are also a lot of ways that our intuition can steer us wrong.

There is nothing we can do to eliminate this curse, but there are things we can do to minimize it. In particular, we should always look for ways to check that the methods that work well in low dimensions continue to be effective in higher dimensions. Having a solid understanding of the geometry behind different techniques can help us to both predict where these kinds of issues will arise, and find ways to double check that our results are still valid.

Pingback: Visualization and Projection | The Shape of Data

Pingback: Nearest Neighbors Classification | The Shape of Data

Pingback: The shape of data | spider's space

Pingback: Experimentelle Datenästhetik | Well explained intro to data analysis (not visulization) by Jesse Johnson

Nice intuitive explanation, tnx! A different, yet related and intuitive approach to explain the curse of dimensionality is by focussing on an important effect caused by this curse, namely overfitting. This is explained nicely in http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/

Thanks, Vincent. That’s a nice explanation, and a good point about the relationship between the number of data point and the number of dimensions. Also, nice figures.

Pingback: Nearest Neighbors Classification | 数据化学

Pingback: Neural networks, linear transformations and word embeddings | The Shape of Data

IIRC, it’s actually R^9 when it touches and R^10 when it protrudes.

Agreeing with EMF