Introduction to Deep Learning: What do I need to know…?

Stacey Ronaghan
7 min readAug 14, 2018

--

What is Deep Learning?

To answer this question we should first introduce various related terms

Artificial Intelligence (AI)

AI is where computers are able to perform tasks that normally require human intelligence. True AI, or pure AI, refers to machines that are equal in intelligence to humans. However, most AI being developed today is not purely autonomous but rather a tool to expand the capabilities of its users.

Machine Learning

Machine learning is a set of techniques to get a computer to learn without being explicitly programmed. A machine learning model is trained and with more experience (i.e. more examples) it is likely to perform better.

Deep Learning

This is a subset of machine learning, programs solve tasks without being explicitly programmed. It is more specific as they train artificial neural networks.

Data Science

Data science combines various aspects of statistics, computer science, mathematics and visualization to turn data into insights and new knowledge. Data science can include machine learning but covers more aspects such as data manipulation, business intelligence and model deployment.

What is a Neural Network?

The above diagram shows a neural network — the nodes are referred to as “neurons”: this is because neural networks were loosely based on neurons in the brain.

Neurons receive input, whether it is the initial input (the features of what you are training to make a prediction with) or the output from other neurons. They then make a decision of what to pass to the next layer of neurons. The layers between the input and output are referred to as “hidden layers”.

Neurons mathematically transform the data they receive before passing them forward. Having so many transformations allow the network to learn more complex relationships between the features and prediction which other algorithms cannot easily discover.

The term “deep learning” is coined for machine learning models built with many hidden layers: deep neural networks.

Why Deep Learning?

This section discusses the advantages of deep learning over traditional machine learning techiques

1. Deep learning techniques can find non-linear relationships in data

A relationship is linear if a change in the first variable corresponds to a constant change in the second variable. A non-linear relationship means that a change in the first variable doesn’t necessarily correspond with a constant change in the second. However, they may impact each other but it appears to be unpredictable.

A quick visual example, by introducing non-linearity we can better capture the patterns in this data

2. Layers in neural networks incrementally learn more complex patterns

Multiple layers in neural networks allow for the model to learn incrementally more complex features. The following is a very simplified example to demonstrate how more complex features are discovered as the data moves along the layers:

  • The input, a handwritten letter, is passed to the model as pixels
  • The first layer in the model learns to recognize lines and edges
  • The second layer would take the edges and line information from the first layer and discover more complex shapes
  • The final layer identifies which shapes represent letters
  • The appropriate letter “B” is returned as the output, the model’s prediction

3. Feature Engineering can be less cumbersome with Deep Learning

With traditional machine learning algorithms — logistic regression, SVM, random forests — a lot of time is spent taking the raw features and transforming them into something more useful to the model.

For example, with natural language processing, a lot of words are removed, shortened or replaced (e.g. “payments”, “paying”, “paid” all get replaced with their base word “pay”) to reduce the number of inputs to the model.

These changes are called “feature engineering” and can take a lot of time. In addition, a data scientist might not be able to do this alone if they are not already familiar with the data. Therefore, a subject matter expert (SME) might be required to.

Deep learning models are able to capture some of the relationships or nuances that you explicitly have to point out with traditional algorithms, therefore this step is not always required.

3. As data sizes increase, larger neural networks outperform simpler models

The below diagram is a replication of one Andrew Ng has created in Coursera courses and at conferences:

For small data sets, traditional machine learning algorithms can be appropriate and even out perform deep learning. They are less computationally expensive so can be a preferred choice. However, the diagram highlights how many of these algorithms plateau and don’t continue to improve with increased data.

Neural networks can continue to improve as they are provided with more examples. Furthermore, larger neural networks — those with many layers — can significantly out perform simpler models. However, in order to achieve these levels of performance, a lot of additional data is required.

Why Now?

Neural Networks have been around for decades but only recently starting to gain traction; here we discuss three reasons why.

1. Computational Power

Computation power was a bottleneck; if you were trying to process complicated mathematics on thousands of examples, it often just wouldn’t complete.

Computation power is continuing to increase and prices for these resources are decreasing. Consequently, it is easier and more affordable to access hardware suitable to train deep learning models, e.g. GPUs.

2. Algorithms

Due to the ability to know build and train algorithms, increased research time has been spent on developing them. There have been many advances and breakthroughs, typically seen in unstructured examples such as image detection, speech-to-text, natural language processing.

3. Libraries

Deep learning algorithms have become easier to implement with high-level open source libraries. Previously, if you wanted to create a neural network you would have had to do so in a very low level language, limiting the number of people who could do this work.

Companies are not only developing languages to be able to reduce the barrier of entry to develop deep learning but they are making them open source to be used by a wider population. Examples include: TensorFlow, Keras, Pytorch, Caffe, fast.ai.

Disadvantages

So there are certainly many advantages of deep learning but there are also disadvantages which should be discussed

Black Box

We refer to the layers in neural networks as “hidden”, this is because the model is learning what values should be here and we don’t get to understand this. A neural network can be described as a “black box” as you cannot see inside, its inability to explain why it is making a particular decision.

For many industries this can be problematic. For example, if you’re refused a loan, you would want to know why. Also, if you’re unsure why a decision has been made, you can’t be certain it is not for an unfair reason. If there is bias in your data (e.g. sexism) or you haven’t trained on enough examples of a wide population (e.g. limited number of races included in your image collection) then the model itself could learn these biases. This would result in decisions that are incorrect, unfair, or even illegal.

Data Requirements

Andrew Ng’s diagram showing the performance advantages of neural networks also shows that these differences occur as the number of examples significantly increase. Neural networks are complicated and require lots of examples to learn complex patterns.

Sometimes enough data isn’t available and then traditional machine learning algorithms are likely to outperform neural networks.

Computation

We discussed earlier how better hardware has become available allowing for deep learning to become more popular. However, it is still a bottleneck. Not only does appropriate hardware need to be purchased and be made available state-of-the-art deep learning algorithms can still take days or even weeks to run!

Architecture Design

Although we may save time removing feature engineering, deep learning can require a large proportion of time defining the architecture. Even the most simple models will require decisions to be made on how many layers to have in the model and what types of layers these should be (RNN, Convolutional, Dense). In addition, each neuron has an “activation function” that also needs to be defined.

We can also dig even deeper; if you’re building a CNN, the Convolutional layer requires you to specify how many “filters” to use and the size of these filters. They also require the method and size for pooling.

A data scientist is likely to explore a wide variety of architectures, each taking a long time to train and evaluate.

Summary

  • Deep learning is a subset of machine learning that relate to neural networks with many layers
  • Deep learning can find complex patterns that other techniques are unable to find
  • There are different architectures that can be used depending on the data type; some of which can capture spatial or sequential relationships
  • Users should be wary of neural network’s inability to provide a human explainable reason for their predictions
  • Deep learning can provide state-of-the-art solutions but requires a lot of data to do so!

Further Reading

Additional posts that relate to deep learning that might be of interest are:

Deep Learning: Common Architectures

Deep Learning: Overview of Neurons and Activation Functions

Deep Learning: Which Loss and Activation Functions should I use?

--

--

Stacey Ronaghan

Data Scientist keen to share experiences & learnings from work & studies