Artificial Intelligence 🤖
Convolutional Neural Networks (CNNs)
Inception (GoogleNet)

Inceotion Netowrk (GoogleNet)

Designing a Convolutional Neural Network (CNN) traditionally involves making decisions about layer types and sizes - whether to use 3 x 3 convolutions, 5 x 5 convolutions, or max pooling layers, etc. The Inception network, introduced by Google, challenges this approach with the premise: Why not use all options simultaneously? This idea leads to a more complex network structure that also works remarkably well.

Inception Idea

The inception module allows for the parallel application of different filter sizes and max pooling, with the results concatenated together. This approach lets the network learn and decide on the optimal combination of filters and transformations.

  • Note: Same max pooling is used to ensure dimension matching. It is an unusual form of pooling with padding to make the sizes match up.
  • Input/Output: The input is 28×28×19228 \times 28 \times 192, and the output is 28×28×25628 \times 28 \times 256, having processed through various convolutions and pooling operations.

Szegedy et al., 2014. "Going Deeper with Convolutions"

Bottleneck Layers

Addressing Computational Costs

Inception models can be computationally intensive, particularly due to operations like 5 x 5 convolutions. For example, applying 32 same filters of size 5 x 5 on an input of 28×28×19228 \times 28 \times 192 results in a staggering 120 million multiplications (28×28×32×5×5×192=120M28 \times 28 \times 32 \times 5 \times 5 \times 192 = 120M).

Efficiency with Bottlenecks

Using 1 x 1 convolutions, known as bottleneck layers, can dramatically reduce computational cost - by a factor of almost 10 in this context - from 120 million to approximately 12.5 million multiplications:

  • Process:
    • Apply 16 1×11 \times 1 convolutions on 28×28×19228 \times 28 \times 192 input.
    • Apply 32 5×55 \times 5 convolutions on the resulting 28×28×1628 \times 28 \times 16 output.
  • Outcome: The bottleneck layers reduce dimensionality without significantly impacting performance.

Inception Module with Dimension Reduction

Structure of an Inception Module

Single Inception Module

This illustration shows an inception module with dimension reduction in place, reducing computational load while maintaining the model's depth and capacity for feature learning.

Example in Keras

Keras Inception Module

An example of how an inception module can be implemented in the Keras deep learning framework.

GoogleNet: The Inception Network

Composition of the Network

GoogleNet consists of a series of inception modules linked together. Its inspiration stems from a meme originating from the movie "Inception", illustrating the nested, complex structure of the network.

GoogleNet

Final Layers and Regularization

The last few layers of the network is a fully connected layer followed by a softmax layer to try to make a prediction off of a hidden layer. This is highlighted above. It helps to ensure that the features computed (Even in the hidden units/intermediate layers) are not too bad for predicting the output class of an image. This appears to have a regularizing effect on the inception network and helps prevent this network from overfitting.

Some times a Max-Pool block is used before the inception module to reduce the dimensions of the inputs. There are a 3 Sofmax branches at different positions to push the network toward its goal. and helps to ensure that the intermediate features are good enough to the network to learn. It also turns out that softmax0 and sofmax1 gives a regularization effect.

Evolution and Variants

Since its introduction, various versions of the Inception network have been developed, including Inception v2, v3, and v4, and even combinations with ResNet, demonstrating the flexibility and adaptability of the inception concept.

Reference: Szegedy et al., 2014, "Going Deeper with Convolutions"