Monthly Archives: March 2013

General regression and over fitting

In the last post, I discussed the statistical tool called linear regression for different dimensions/numbers of variables and described how it boils down to looking for a distribution concentrated near a hyperplane of dimension one less than the total number … Continue reading

Posted in Modeling, Regression | 14 Comments

The geometry of linear regression

In this post, we’ll warm up our geometry muscles by looking at one of the most basic data analysis techniques: linear regression. You’ve probably encountered it elsewhere, but I want to think about it from the point of view of … Continue reading

Posted in Modeling, Regression | 28 Comments

Filling in the gaps – Probability Distributions

In the last post, I discussed how one can analyze a data set from the point of view of geometry, by thinking of each data point as coordinates in a high dimensional space. The thing is, when we’re analyzing these … Continue reading

Posted in Introduction | 9 Comments

What is data?

In this post, we’ll start working on understanding how computers and their programmers actually go about analyzing large, high-dimensional data sets. I’ll start by describing three different ways that one can think about data, each of which suggests a different … Continue reading

Posted in Introduction | 10 Comments

Different names for data analysis

Welcome to the Shape of Data blog. Over the next few months, I plan to write a number of posts illustrating how understanding the geometry behind data analysis can lead to deeper insights and a more intuitive understanding of the … Continue reading

Posted in Introduction | 12 Comments