Introduction

Neural Networks | Introduction

A feedforward neural network also known as a multi-layer perceptron (MLP), is a neural network with at least three layers: an input layer, a hidden layer, and an output layer. A neural network consists of nodes and the connection between the nodes.

Initially, each node was a perceptron with binary output, but contemporarily they consist of activation functions where the output type varies. First, let's discuss perceptrons.

Perceptrons
A perceptron is the building block of neural networks and is sometimes referred to a single-layer perceptron to distinguish it from an MLP. A perceptron is a threshold function which is a linear classifier. The threshold function maps an input x to an output value f(x). It is the simplest feed-forward neural network. A simplified notation for the threshold function of a perceptron is as follows:  

f(x) = 0 if w · x + b ≤ 0, and f(x) = 1 if w · x + b > 0

where w is the weight and b is the bias. The bias value determines how easy it is for a neuron to output a value of 1. 

Multi-Layer Perceptron (MLP)

Image From: Murphy, K. P. (2021). Figure 16.11. *In Machine Learning: A Probabilistic Perspective*. textbook, MIT Press.

In an MLP, perceptrons are combined in a minimum of three layers. However, when the weight and biases change, the output of a perceptron changes drastically (from 0 to 1 or vice versa). Having a less drastic change in the output and having the output represent a continuous-valued function (for example a value between 0 and 1) provides more accuracy and functionality. Because of this, contemporary neural networks use non-linear activation functions. Each node is not a perceptron anymore but a node with an activation function. Some common activation functions include: sigmoid, TanH, Rectified Linear Unit (ReLU), Leaky ReLU, Exponential Linear Unit (ELU), and SoftPlus. When building a neural network, it is necessary to choose an activation function. In practice the ReLu function is most common. 

Image From: Murphy, K. P. (2021). Figure 16.16. *In Machine Learning: A Probabilistic Perspective*. textbook, MIT Press.  

At the connection between nodes, the output value is multiplied by a weight and a bias is added. The result becomes the input for the next node. MLP learn by changing the weights based on the error in the output when it is compared to the expected result. This is carried out through backpropagation (more about this will be discussed later).  

Each of the three types of layers in an MLP use activation functions but the type of layers uses a different function. The input layers uses a linear function. Typically, the nodes in the hidden layer(s) all use the same activation function. Hidden layers also require some design. The number of hidden layers and the number of nodes in each hidden layer need to be determined when building a neural network. There are some rules of thumb to follow for selecting this information, but there is some trial and error involved in seeing how well the neural network performs and adding more layers and nodes as needed.

The nodes in the output layer typically use a different activation function than that used in the hidden layers.

Backpropagation 
As mentioned, backpropagation is the algorithm used to train feedforward neural networks. Backpropagation is based on the chain rule, and it only makes sense when nonlinear activation functions are involved. The algorithm starts at the last layer and iterates backwards one layer at a time. Using the chain rule, it calculates the gradient of the loss function with respect to each weight.