A brief introduction to deep neural networks.
Some people may argue that computers are better than humans. They have access to all the data in the world, can make computations insanely fast and us humans are becoming increasingly reliant on them. However, one of the most powerful computer algorithms — artificial intelligence, has been based on how the human mind works! Can you imagine how amazing it is to be able to implement the way a human brain works into a computer? Let’s discover how artificial neural networks (ANNs) work.
Although there are many types of machine learning algorithms, such as SVM’s, Linear regression and Decision Trees, we’ll be taking a look at one of the really popular algorithms called Deep Learning. Deep learning is a subset of machine learning that has been used for many important things, from object detection to the self-driving cars that are being implemented by many companies such as Waymo and Tesla.
First of all, let’s take a look at where deep learning is in the world of AI. You can see that deep learning is a subset of machine learning. Deep learning is one algorithm that utilizes the Artificial Neural Network in order to learn from its mistakes, and it uses multi-layer neural networks to do so.
For those who have never seen how a deep learning algorithm works, you might think it’s really complicated. However, it’s just a series of nodes, weights, and biases, and it looks something like this.
Pretty much all deep neural networks are more complex than this, but they all have the same concept and can expand (as long as the computer can support the computations). Inside of a neural network, there are points called nodes, and each node will have a weight and bias. These weights and biases help the neural network make predictions and classifications. Let’s break this down a bit more.
The main framework for neural networks
In very simple terms, let’s say we are trying to plot 1 red point and 1 blue point inside of a 2D graph. The one thing the ANN will try to do is find the line that separates them the best. The way the algorithm does this is by determining how far the line is from the point, and the line either moves farther away or closer to it, depending on whether it’s classified properly or not. Behind the scenes, the weights and biases for all the nodes are changing because of this.
Here is the math going on behind one of these nodes.
Output = Wx + b
W is the weight, x is the value in the node, and b is the bias. What is essentially going on here is that the node is being multiplied by the weight and its bias in order to return an output
Oh yeah, and one more piece of terminology. Every vertical line of nodes is called a layer. In the picture of the neural network, the leftmost line of nodes is called the input layer, the centerline is the hidden layer, and the rightmost line is the output layer.
Now here’s the really cool part. In the image of the neural net above, do you see all those lines connecting the input layer nodes to the hidden layer nodes? Well, each line represents a different weight coming from each bias. This means that the algorithm has some nodes that have a bigger say in what the outcome of the node in the next layer will have. You might be asking why we have hidden layers. Well, let’s go back to the diagram of 2 points. With a classification like that, we don’t really need to put a lot of nodes because it’s really simple. However, let’s take a look at a more complex classification.
As you can see here the classification will be non-linear. We could go with a linear classification, but it will give us a lot of error. This means we have to try a different approach. This is where the hidden layers come in. In essence, the hidden layers will add a lot more non-linearity to the graph by combining the classifications from the previous layer. Here’s a good visual representation of it.
Now, let’s take a deeper look into activation functions. The activation function is used to put the range of the output between a certain range for all of the node outputs. They are the ones that convert the output from one layer into the input for the next layer. There are many activation functions, so let’s take a closer look at a few notable ones.
The Sigmoid Function
The sigmoid function is an extremely commonly used function, and it maps out any input value between 0 and 1 based on its value range. The sigmoid is pretty easy to understand and apply to a neural network.
The Hyperbolic Tangent Function
This activation function is easier to optimize as it has a range of -1 to 1, but that’s getting a little too complex. All we need to know is that this activation function is used a lot and solves a few problems with the sigmoid function.
Rectified Linear Unit Activation Function (ReLU)
This activation function is the one that has become exceedingly popular to use in the past few years. It is quite simple to apply and is 6 times better in convergence than the sigmoid function. Its drawback is that it should only be used for hidden layers, so softmax should be used for classification problems.
The softmax function turns all the inputs into probabilities based on their input values. This is really good for output because it returns probabilities of a certain classification being right, and thus gives the user a much better look into the general output and how likely everything is to be classified right.
One thing I find really cool about neural networks are how expandable they are. As long as the computer can support it, you can have thousands of nodes and many layers inside your neural network. To end off, here’s a cool video I found of an AI learning how to play snake.
Well, there’s a short introduction to how deep learning and ANNs work. Here are your main takeaways:
- Weights, biases and nodes
- Multi-layer neural networks
- Activation functions
So, now that you know what the essentials are, you shouldn’t get scared when you see something like this.
I’ll be posting soon about a more in-depth article on a project that uses deep learning. Thanks!