Category Archives: Feature extraction

Convolutional neural networks

Neural networks have been around for a number of decades now and have seen their ups and downs. Recently they’ve proved to be extremely powerful for image recognition problems. Or, rather, a particular type of neural network called a convolutional … Continue reading

Posted in Classification, Feature extraction | 3 Comments


I recently read an interesting Wired story about Chris McKinlay (a fellow alum of Middlebury College), who used a clustering algorithm to understand the pool of users on the dating site OkCupid (and successfully used this information to improve his … Continue reading

Posted in Clustering, Feature extraction | 16 Comments

Case Study 6: Digital images

In the last two posts, I described how we could generalize the notion of “tokens” that we first saw when analyzing non-numeric census data, to time series. In this context, a token is a short snippet from a time series … Continue reading

Posted in Feature extraction | 3 Comments

Case Study 5: Wavelets

In the last post, I explained how to measure how similar two time series of the same length are using the idea of resonance from physics. The answer boiled down to calculating the dot product of the two time series, … Continue reading

Posted in Feature extraction | 6 Comments

Case study 4: Resonance and Robots

In the last few posts, I described how to turn data sets with varying amounts of structure into vector data that could be analyzed using standard data mining/machine learning algorithms. All the types of data that I looked at were … Continue reading

Posted in Feature extraction | 1 Comment

Case study 3: Free form text

In the past two posts, I’ve been looking at ways to turn “unstructured” data into vector data that can be analyzed with techniques like the ones that I’ve described elsewhere on this blog. One of the most common types of … Continue reading

Posted in Feature extraction | 5 Comments

Case study 2: Tokens in census data

In last week’s post, I presented a brief tutorial on loading and analyzing the classic IRIS data set, which consists of four length measurements from each of 150 iris flowers. That data set was relatively easy to deal with because … Continue reading

Posted in Feature extraction | 9 Comments