Inceotion Netowrk (GoogleNet)
Designing a Convolutional Neural Network (CNN) traditionally involves making decisions about layer types and sizes - whether to use 3 x 3 convolutions, 5 x 5 convolutions, or max pooling layers, etc. The Inception network, introduced by Google, challenges this approach with the premise: Why not use all options simultaneously? This idea leads to a more complex network structure that also works remarkably well.
The inception module allows for the parallel application of different filter sizes and max pooling, with the results concatenated together. This approach lets the network learn and decide on the optimal combination of filters and transformations.
- Note: Same max pooling is used to ensure dimension matching. It is an unusual form of pooling with padding to make the sizes match up.
- Input/Output: The input is , and the output is , having processed through various convolutions and pooling operations.
Szegedy et al., 2014. "Going Deeper with Convolutions"
Bottleneck Layers
Addressing Computational Costs
Inception models can be computationally intensive, particularly due to operations like 5 x 5 convolutions. For example, applying 32 same filters of size 5 x 5 on an input of results in a staggering 120 million multiplications ().
Efficiency with Bottlenecks
Using 1 x 1 convolutions, known as bottleneck layers, can dramatically reduce computational cost - by a factor of almost 10 in this context - from 120 million to approximately 12.5 million multiplications:
- Process:
- Apply 16 convolutions on input.
- Apply 32 convolutions on the resulting output.
- Outcome: The bottleneck layers reduce dimensionality without significantly impacting performance.
Inception Module with Dimension Reduction
Structure of an Inception Module
This illustration shows an inception module with dimension reduction in place, reducing computational load while maintaining the model's depth and capacity for feature learning.
Example in Keras
An example of how an inception module can be implemented in the Keras deep learning framework.
GoogleNet: The Inception Network
Composition of the Network
GoogleNet consists of a series of inception modules linked together. Its inspiration stems from a meme originating from the movie "Inception", illustrating the nested, complex structure of the network.
Final Layers and Regularization
The last few layers of the network is a fully connected layer followed by a softmax layer to try to make a prediction off of a hidden layer. This is highlighted above. It helps to ensure that the features computed (Even in the hidden units/intermediate layers) are not too bad for predicting the output class of an image. This appears to have a regularizing effect on the inception network and helps prevent this network from overfitting.
Some times a Max-Pool block is used before the inception module to reduce the dimensions of the inputs. There are a 3 Sofmax branches at different positions to push the network toward its goal. and helps to ensure that the intermediate features are good enough to the network to learn. It also turns out that softmax0
and sofmax1
gives a regularization effect.
Evolution and Variants
Since its introduction, various versions of the Inception network have been developed, including Inception v2, v3, and v4, and even combinations with ResNet, demonstrating the flexibility and adaptability of the inception concept.
Reference: Szegedy et al., 2014, "Going Deeper with Convolutions"