Category Archives: Modeling

Continuous Bayes’ Theorem

Bayes’ Rule is one of the fundamental Theorems of statistics, but up until recently, I have to admit, I was never very impressed with it. Bayes’ gives you a way of determining the probability that a given event will occur, or … Continue reading

Posted in Modeling | 2 Comments

Genetic algorithms and symbolic regression

A few months ago, I wrote a post about optimization using gradient descent, which involves searching for a model that best meets certain criteria by repeatedly making adjustments that improve things a little bit at a time. In many situations, this works … Continue reading

Posted in Modeling, Regression | 1 Comment

Configuration Spaces and the Meaning of Probability

I recently finished reading Nate Silver’s book The Signal and the Noise, which has gotten me thinking about how exactly one should interpret models/probability distributions, and the predictions they make. (If you’ve read this book or plan to read it, … Continue reading

Posted in Modeling | 8 Comments


The subject of this weeks post is probably one of the most polarizing algorithms in the data world: It seems that most experts either swear by K-means or absolutely hate it. The difference of opinion boils down to one of … Continue reading

Posted in Modeling, Unsupervised learning | 8 Comments

Mixture models

In the last few posts, we’ve been looking at algorithms that combine a number of simple models/distributions to form a single more complex and sophisticated model. With both neural networks and decision trees/random forests, we were interested in the classification … Continue reading

Posted in Modeling | 7 Comments

Principal Component Analysis

Now that we’ve gotten a taste of the curse of dimensionality, lets look at another potential problem with the basic form of regression we discussed a few posts back. Notice that linear/least squares regression always gives you an answer, whether or … Continue reading

Posted in Modeling | 22 Comments

The curse of dimensionality

Now that we’ve had a glimpse of what it means to analyze data sets in different dimensions, we should take a little detour to consider really high dimensional data. In the discussion of regression, I suggested using your intuition about … Continue reading

Posted in Modeling | 10 Comments

General regression and over fitting

In the last post, I discussed the statistical tool called linear regression for different dimensions/numbers of variables and described how it boils down to looking for a distribution concentrated near a hyperplane of dimension one less than the total number … Continue reading

Posted in Modeling, Regression | 14 Comments

The geometry of linear regression

In this post, we’ll warm up our geometry muscles by looking at one of the most basic data analysis techniques: linear regression. You’ve probably encountered it elsewhere, but I want to think about it from the point of view of … Continue reading

Posted in Modeling, Regression | 28 Comments