Shallow Neural Networks
Neural networks are powerful tools that can model complex patterns in data. A neural network with one hidden layer is referred to as a shallow neural network. Understanding the forward and backward propagation in this network is key to building neural network models.
Overview of Neural Networks
In logistic regression, we take input features and, through a series of mathematical operations involving weights and biases, predict an output. i.e.
In a neural network with one hidden layer, we extend this idea:
Imagine we have an input vector consisting of features , and we're trying to predict an output . In a neural network, we do this by passing the input through layers of logistic regression.
Neural Network Representation
A neural network consists of an input layer, one or more hidden layers, and an output layer. Unlike the input layer, which is exposed to our training data, the hidden layers aren't directly exposed to our dataset. Here’s what happens in a two-layer neural network (where the input layer isn't counted):
- The input layer is our initial data .
- The activations represent the outputs of the hidden layer's neurons after applying the weights and biases from the input data.
- The final layer , or the output layer, is responsible for producing the predicted value .
The operations performed in the first hidden layer can be mathematically represented as:
Or, to demonstrate fully:
Here, is the weighted sum of inputs plus the bias term for the first hidden layer. is the weights matrix connecting the input layer to the first hidden layer, and is the bias vector for the first hidden layer. For our case:
- has a shape of , since we have 4 neurons in the hidden layer and 3 input features.
- has a shape of , which corresponds to the 4 neurons in the hidden layer.
The output of each neuron in the first hidden layer is then passed through an activation function, such as the sigmoid function:
Where is the activation of the first hidden layer, and it retains the shape of (4, 1), the same as .
The activations from the first hidden layer are then used as inputs to the second layer. If the network has only one hidden layer, this second layer would be the output layer. The process is similar:
Here, is the weighted sum of the activations from the first hidden layer plus the bias term for the second layer, which ultimately is the output layer for a single hidden layer neural network.
- has a shape of (1, 4), which reflects a single output neuron connected to the 4 neurons of the hidden layer.
- has a shape of (1, 1), corresponding to the single output neuron.
Finally, the output neuron also applies the sigmoid function to produce the final output :
Where is the predicted output, with a shape of (1, 1).
Each layer's output serves as the subsequent layer's input in a neural network. The activation functions introduce non-linearity, allowing the network to learn complex patterns. By adjusting the weights and biases through training, using algorithms like gradient descent, the network learns to make predictions.
Vectorizing Across Multiple Examples
When handling multiple training examples, we want to avoid slow for-loop computations on each example. Instead, we use vectorization to process all examples simultaneously. If our input data has dimensions , where is the number of features and is the number of examples, we can perform all our computations in a vectorized form:
- has dimensions .
- has the same dimensions as .
- has dimensions .
- also has dimensions .
This approach dramatically increases efficiency, particularly with large datasets.