What deep learning really means

GPUs in the cloud put the predictive power of deep neural networks within reach of every developer

1 2 Page 2
Page 2 of 2

Convolutional neural networks typically use convolutional, pooling, ReLU, fully connected, and loss layers to simulate a visual cortex. The convolutional layer basically takes the integrals of many small overlapping regions. The pooling layer performs a form of nonlinear down-sampling. ReLU layers, which we mentioned earlier, apply the nonsaturating activation function f(x) = max(0,x). In a fully connected layer, the neurons have full connections to all activations in the previous layer. A loss layer computes how the network training penalizes the deviation between the predicted and true labels, using a Softmax or cross-entropy loss for classification or an Euclidean loss for regression.

Besides image recognition, CNNs have been applied to natural language processing, drug discovery, and playing Go.

Natural language processing (NLP) is another major application area for deep learning. In addition to the machine translation problem addressed by Google Translate, major NLP tasks include automatic summarization, co-reference resolution, discourse analysis, morphological segmentation, named entity recognition, natural language generation, natural language understanding, part-of-speech tagging, sentiment analysis, and speech recognition.

In addition to CNNs, NLP tasks are often addressed with recurrent neural networks (RNNs), which include the Long Short Term Memory (LSTM) model. As I mentioned earlier, in recurrent neural networks, neurons can influence themselves, either directly, or indirectly through the next layer. In other words, RNNs can have loops, which gives them the ability to persist some information history when processing sequences -- and language is nothing without sequences. LSTMs are a particularly attractive form of RNN that have a more powerful update equation and a more complicated repeating module structure.

Running deep learning

Needless to say, deep CNNs and LSTMs often require serious computing power for training. Remember how the Google Brain team needed a couple thousand GPUs to train the new A.I. version of Google Translate? That's no joke. A training session that takes three hours on one GPU is likely to take 30 hours on a CPU. Also, the kind of GPU matters: For most deep learning packages, you need one or more CUDA-compatible Nvidia GPUs with enough internal memory to run your models.

That may mean you'll want to run your training in the cloud: AWS, Azure, and Bluemix all offer instances with GPUs as of this writing, as will Google early in 2017.

While the biggest cloud GPU instances can cost $14 per hour to run, there are less expensive alternatives. An AWS instance with a single GPU can cost less than $1 per hour to run, and the Azure Batch Shipyard and its deep learning recipes using the NC series of GPU-enabled instances run your training in a compute pool, with the small NC6 instances going for 90 cents an hour.

Yes, you can and should install your deep learning package of choice on your own computer for learning purposes, whether or not it has a suitable GPU. But when it comes time to train models at scale, you probably won't want to limit yourself to the hardware you happen to have on site.

For deeper learning

You can learn a lot about deep learning simply by installing one of the deep learning packages, trying out its samples, and reading its tutorials. For more depth, consider one or more of the following resources:

Related articles

This story, "What deep learning really means" was originally published by InfoWorld.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
1 2 Page 2
Page 2 of 2
Now read: Getting grounded in IoT