Category Archives: Unsupervised learning

Modularity – Measuring cluster separation

We’ve now seen a number of different clustering algorithms, each of which will divide a data set into a number of subsets. This week, I want to ask the question: How do we know if answer that a clustering algorithm … Continue reading

Posted in Clustering, Unsupervised learning | Leave a comment

Spectral clustering

In the last few posts, we’ve been studying clustering, i.e. algorithms that try to cut a given data set into a number of smaller, more tightly packed subsets, each of which might represent a different phenomenon or a different type … Continue reading

Posted in Clustering, Unsupervised learning | 9 Comments

Mapper and the choice of scale

In last week’s post, I described the DBSCAN clustering algorithm, which uses the notion of density to determine which data points in a data set form tightly packed groups called clusters. This algorithm relies on two parameters – a distance … Continue reading

Posted in Clustering, Unsupervised learning | 4 Comments

Clusters and DBScan

A few weeks ago, I mentioned the idea of a clustering algorithm, but here’s a recap of the idea: Often, a single data set will be made up of different groups of data points, each of which corresponds to a … Continue reading

Posted in Clustering, Unsupervised learning | 7 Comments

Intrinsic vs. Extrinsic Structure

At this point, I think it will be useful to introduce an idea from geometry that is very helpful in pure mathematics, and that I find helpful for understanding the geometry of data sets. This idea is difference between the … Continue reading

Posted in Unsupervised learning | 6 Comments

K-means

The subject of this weeks post is probably one of the most polarizing algorithms in the data world: It seems that most experts either swear by K-means or absolutely hate it. The difference of opinion boils down to one of … Continue reading

Posted in Modeling, Unsupervised learning | 8 Comments