Table of Contents

The first few posts to this blog will be most coherent if they are read in chronological order. A new entry will be posted each Wednesday morning. To see the latest posts in blog format, click here. To get updates when new posts appear, you can subscribe by RSS, or follow me on google+ or twitter @jejomath.

I – Introduction

  1. Different names for data analysis
  2. What is data?
  3. Filling in the gaps – Probability Distributions
  4. Configuration Spaces and the Meaning of Probability
  5. Continuous Bayes’ Theorem

II – Basic Modeling and Supervised Learning

  1. The geometry of linear regression
  2. General regression and overfitting
  3. Optimization
  4. The Curse of Dimensionality
  5. Principle Component Analysis
  6. Visualization and Projection
  7. Nearest Neighbors Classification
  8. Data Normalization
  9. K Nearest Neighbors Classification
  10. Support Vector Machines
  11. Logistic Regression
  12. Kernels
  13. Multi-class classification
  14. Genetic Algorithms and Symbolic Regression

III – Case Studies

  1. Case Study 1: The Iris Data Set
  2. Case Study 2: Tokens in Census Data
  3. Case Study 3: Free Form Text
  4. Case Study 4: Resonance and Robots
  5. Case Study 5: Wavelets
  6. Case Study 6: Digital Images

IV – Multi-level Modeling and Supervised Learning

  1. Neural Networks 1: The Neuron
  2. Neural Networks 2: Evaluation
  3. Neural Networks 3: Training
  4. Decision Trees
  5. Random Forests
  6. Mixture Models
  7. Gaussian Kernels
  8. Convolutional Neural Networks
  9. Recurrent Neural Networks
  10. Neural Networks, linear transformations and Word Embeddings
  11. The TensorFlow perspective on Neural Networks
  12. Rolling and Unrolling RNNs
  13. LSTMs

V – Unsupervised Learning

  1. K-means
  2. Intrinsic vs. Extrinsic structure
  3. Graphs and Networks
  4. Clusters and DBSCAN
  5. Mapper and the choice of scale
  6. Spectral Clustering
  7. Modularity – Measuring Cluster Separation
  8. TILO/PRC Clustering
  9. Duality and Co-clustering
  10. K-modes

VI – Big data

  1. Distributed learning
  2. PageRank
  3. MapReduce
  4. GPUs and Neural Networks

VII – Misc.

  1. Statistics Vs. Heuristics
  2. Precision, Recall and AUCs
  3. P-Values

14 Responses to Table of Contents

  1. Reader says:

    The link to the ‘Support Vector Machines’ post seems to be broken in some places.
    It should be “” but appears as “”

    Like the content by the way!

  2. Hello Jesse, Thanks for this interesting blog! Is there a way we can follow your activities or subscribe to the activity on this blog? (i.e. do you have a twitter account or some other news feed)

  3. shibaji2013 says:

    Great articles, gives a very good geometric insight of the concepts. Two you have any plans of making this into a book, that will surely help
    Can you post some reference for each blog where we can follow up with the more mathematical exposition if we like to.

    • Good question. I have been thinking about writing a book that follows the rough outline of the blog, with changes based on what I’ve learned while writing it. Part of the reason I started writing this blog was to gauge how much interest there would be in this approach to data analysis.Based on the statistics that wordpress reports, there seems to be enough to justify writing a book.

      That’s also a good suggestion about including references at the end of posts. I will try to go back through and add references where I can find good ones. Thanks for the comments!

  4. Satoshi says:

    Excellent articles. Very clear concise and intuitive explanations. Thank you very much for your effort it is very enlightening!

  5. ArthurZ says:

    This should be expanded into a book

    • Thanks! I’m hoping to turn it into a book at some point, though I’m not sure exactly when. As I write more posts, my idea of what the book should cover and how to organize it keeps changing.

  6. Hey I was wondering if you plan to write about topological data analysis / computational topology? It seems like quite a natural topic for your blog. I have been investigating this myself, but it would be great to see your thoughts on how these tools could be put to good use.

    • That’s a good question and one that I’ve been thinking about for a while. So far on this blog, I’ve been trying to focus on topics that are relatively simple and widely used. Persistence homology (which is what most people seem to mean by topological data analysis) hasn’t yet filtered out of the academic realm, and relies on fairly abstract concepts (like manifolds and homology.) Still, I’m tempted to use it as an excuse to introduce readers to these advanced ideas, and it would tie in well with other directions like manifold learning, so I will probably get to it eventually.

      By the way, I have written about the mapper/Iris clustering algorithm, which is closely related to persistence homology. I also wrote a post about presistence homology a while back on my topology blog (

  7. Pingback: How to Ace a Data Science Interview | Alya's Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s