Monthly Archives: April 2013

Data Normalization

In the last post, on nearest neighbors classification, we used the “distance” between different pairs of points to decide which class each new data point should be placed into. The problem is that there are different ways to calculate distance … Continue reading

Posted in Normalization/Kernels | 3 Comments

Nearest Neighbors Classification

Before we dive into nearest neighbor classification, I want to point out a subtle difference between the regression algorithm that I discussed a few posts back and what I will write about today. The goal of regression was to find … Continue reading

Posted in Classification | 14 Comments

Visualization and Projection

One of the common themes that I’ve emphasized so far on this blog is that we should try to analyze high dimensional data sets without being able to actually “see” them. However, it is often useful to visualize the data … Continue reading

Posted in Visualization | 3 Comments

Principal Component Analysis

Now that we’ve gotten a taste of the curse of dimensionality, lets look at another potential problem with the basic form of regression we discussed a few posts back. Notice that linear/least squares regression always gives you an answer, whether or … Continue reading

Posted in Modeling | 22 Comments

The curse of dimensionality

Now that we’ve had a glimpse of what it means to analyze data sets in different dimensions, we should take a little detour to consider really high dimensional data. In the discussion of regression, I suggested using your intuition about … Continue reading

Posted in Modeling | 10 Comments