The first few posts to this blog will be most coherent if they are read in chronological order. A new entry will be posted each Wednesday morning. To see the latest posts in blog format, click here. To get updates when new posts appear, you can subscribe by RSS, or follow me on google+ or twitter @jejomath.
I – Introduction
- Different names for data analysis
- What is data?
- Filling in the gaps – Probability Distributions
- Configuration Spaces and the Meaning of Probability
- Continuous Bayes’ Theorem
II – Basic Modeling and Supervised Learning
- The geometry of linear regression
- General regression and overfitting
- Optimization
- The Curse of Dimensionality
- Principle Component Analysis
- Visualization and Projection
- Nearest Neighbors Classification
- Data Normalization
- K Nearest Neighbors Classification
- Support Vector Machines
- Logistic Regression
- Kernels
- Multi-class classification
- Genetic Algorithms and Symbolic Regression
III – Case Studies
- Case Study 1: The Iris Data Set
- Case Study 2: Tokens in Census Data
- Case Study 3: Free Form Text
- Case Study 4: Resonance and Robots
- Case Study 5: Wavelets
- Case Study 6: Digital Images
IV – Multi-level Modeling and Supervised Learning
- Neural Networks 1: The Neuron
- Neural Networks 2: Evaluation
- Neural Networks 3: Training
- Decision Trees
- Random Forests
- Mixture Models
- Gaussian Kernels
- Convolutional Neural Networks
- Recurrent Neural Networks
- Neural Networks, linear transformations and Word Embeddings
- The TensorFlow perspective on Neural Networks
- Rolling and Unrolling RNNs
- LSTMs
V – Unsupervised Learning
- K-means
- Intrinsic vs. Extrinsic structure
- Graphs and Networks
- Clusters and DBSCAN
- Mapper and the choice of scale
- Spectral Clustering
- Modularity – Measuring Cluster Separation
- TILO/PRC Clustering
- Duality and Co-clustering
- K-modes
VI – Big data
VII – Misc.
The link to the ‘Support Vector Machines’ post seems to be broken in some places.
It should be “https://shapeofdata.wordpress.com/2013/05/14/linear-separation-and-support-vector-machines/” but appears as “https://shapeofdata.wordpress.com/2013/03/24/linear-separation-and-support-vector-machines/”
Like the content by the way!
Thanks! I’m not sure how that happened, but I think I’ve now fixed them all.
Hello Jesse, Thanks for this interesting blog! Is there a way we can follow your activities or subscribe to the activity on this blog? (i.e. do you have a twitter account or some other news feed)
If you use RSS, you can subscribe by clicking on the “Entries RSS” link under “Meta” in the right hand column. (Though it seems RSS is old fashioned these days.) Alternatively, WordPress will send you update e-mails if you create an account and click the “Follow” button, though I can understand if you don’t want to sign up for yet another account.
I also put announcements of new posts on google+ (https://plus.google.com/100068343619301713304/posts) and will start putting them on twitter too (https://twitter.com/jejomath).
Thanks, Jesse ! I subscribed to your Twitter 😉
Great articles, gives a very good geometric insight of the concepts. Two questions..do you have any plans of making this into a book, that will surely help
Can you post some reference for each blog where we can follow up with the more mathematical exposition if we like to.
Thanks!
Good question. I have been thinking about writing a book that follows the rough outline of the blog, with changes based on what I’ve learned while writing it. Part of the reason I started writing this blog was to gauge how much interest there would be in this approach to data analysis.Based on the statistics that wordpress reports, there seems to be enough to justify writing a book.
That’s also a good suggestion about including references at the end of posts. I will try to go back through and add references where I can find good ones. Thanks for the comments!
Excellent articles. Very clear concise and intuitive explanations. Thank you very much for your effort it is very enlightening!
This should be expanded into a book
Thanks! I’m hoping to turn it into a book at some point, though I’m not sure exactly when. As I write more posts, my idea of what the book should cover and how to organize it keeps changing.
Hey I was wondering if you plan to write about topological data analysis / computational topology? It seems like quite a natural topic for your blog. I have been investigating this myself, but it would be great to see your thoughts on how these tools could be put to good use.
That’s a good question and one that I’ve been thinking about for a while. So far on this blog, I’ve been trying to focus on topics that are relatively simple and widely used. Persistence homology (which is what most people seem to mean by topological data analysis) hasn’t yet filtered out of the academic realm, and relies on fairly abstract concepts (like manifolds and homology.) Still, I’m tempted to use it as an excuse to introduce readers to these advanced ideas, and it would tie in well with other directions like manifold learning, so I will probably get to it eventually.
By the way, I have written about the mapper/Iris clustering algorithm, which is closely related to persistence homology. I also wrote a post about presistence homology a while back on my topology blog (http://ldtopology.wordpress.com/2012/06/12/topological-exploration-of-data-sets-persistent-homology/).
Thanks for the link, I hadn’t seen your other blog.
I look forward to any other posts you make on the topic.
Pingback: How to Ace a Data Science Interview | Alya's Blog