In past posts, I’ve described how Recurrent Neural Networks (RNNs) can be used to learn patterns in sequences of inputs, and how the idea of unrolling can be used to train them. It turns out that there are some significant limitations to the types of patterns that a typical RNN can learn, due to the way their weight matrices are used. As a result, there has been a lot of interest in a variant of RNNs called Long Short-Term Memory networks (LSTMs). As I’ll describe below, LSTMs have more control than typical RNNs over what they remember, which allows them to learn much more complex patterns.

Continue reading

Posted in Neural Networks | 3 Comments

Rolling and Unrolling RNNs

A while back, I discussed Recurrent Neural Networks (RNNs), a type of artificial neural network in which some of the connections between neurons point “backwards”. When a sequence of inputs is fed into such a network, the backward arrows feed information about earlier input values back into the system at later steps. One thing that I didn’t describe in that post was how to train such a network. So in this post, I want to present one way of thinking about training an RNN, called unrolling.

Continue reading

Posted in Uncategorized | 2 Comments

Continuous Bayes’ Theorem

Bayes’ Rule is one of the fundamental Theorems of statistics, but up until recently, I have to admit, I was never very impressed with it. Bayes’ gives you a way of determining the probability that a given event will occur, or that a given condition is true, given your knowledge of another related event or condition. All the examples that I’ve read or heard about seemed somewhat contrived and unrelated to the sorts of data analysis I was interested in. But it turns out there’s also an interpretation of Bayes’ Theorem that’s not only much more geometric than the standard formulation, but also fits quite naturally into the types of things that I’ve been discussing on this blog. So in today’s post, I want to explain how I came to truly appreciate Bayes’ Theorem.

Continue reading

Posted in Modeling | 2 Comments

The TensorFlow perspective on neural networks

A few weeks ago, Google announced that it was open sourcing an internal system called TensorFlow that allows one to build neural networks, as well as other types of machine learning models. (Disclaimer: I work for Google.) Because TensorFlow is designed to be more general than just a neural network framework, it takes a fairly abstract perspective compared to the way we usually talk about neural networks. But (not coincidentally) this perspective is very close to what I described in my last post, with rows of neurons defining output vectors and the connections between these rows defining matrices of weights. In today’s post, I want to describe the TensorFlow perspective, explain how it matches up with the traditional way of thinking about neural networks, and explain how TensorFlow generalizes the vector and matrix approach to include more general structures called tensors.

Continue reading

Posted in Neural Networks | 1 Comment

Neural networks, linear transformations and word embeddings

In past posts, I’ve described the geometry of artificial neural networks by thinking of the output from each neuron in the network as defining a probability density function on the space of input vectors. This is useful for understanding how a single neuron combines the outputs of other neurons to form a more complex shape. However, it’s often useful to think about how multiple neurons behave at the same time, particularly for networks that are defined by successive layers of neurons. For such networks – which turn out to be the vast majority of networks in practice – it’s useful to think about how the set of outputs from each layer determine the set of outputs of the next layer. In this post, I want to discuss how we can think about this in terms of linear transformations (via matrices) and how this idea leads to a tool called word embeddings, the most popular of which is probably word2vec.

Continue reading

Posted in Neural Networks | 10 Comments

Recurrent Neural Networks

So far on this blog, we’ve mostly looked at data in two forms – vectors in which each data point is defined by a fixed set of features, and graphs in which each data point is defined by its connections to other data points. For other forms of data, notably sequences such as text and sound, I described a few ways of transforming these into vectors, such as bag-of-words and n-grams. However, it turns out there are also ways to build machine learning models that use sequential data directly. In this post, I want to describe one such approach, called a recurrent neural network.

Continue reading

Posted in Neural Networks | 5 Comments

GPUs and Neural Networks

Artificial neural networks have been around for a long time – since either the 1940s or the 1950s, depending on how you count. But they’ve only started to be used for practical applications such as image recognition in the last few years. Some of the recent progress is based on theoretical breakthroughs such as convolutional neural networks, but a much bigger factor seems to be hardware: It turns out that small neural networks aren’t that much better than many simpler machine learning algorithms. Neural networks only excel when you have much more complex data and a large/complex network. But up until recently, the available hardware simply couldn’t handle such complexity. Moore’s law helped with this, but an even bigger part has been played by a type of chip called a GPU, or Graphical Processing Unit. These were originally designed to speed up computer animations, but they can also be used for other types of processing. In some cases, GPUs can be as much as 100 times as fast as standard CPUs at certain tasks. However, it turns out you only get this speedup with a fairly narrow category of tasks, many of which happen to be necessary for processing neural networks. In this post, I want to discuss what types of task these are and why GPUs are so much faster at them.

Continue reading

Posted in Neural Networks | 4 Comments