Artificial Intelligence 🤖
Convolutional Neural Networks (CNNs)
LeNet-5

LeNet-5

LeNet-5 is a seminal Convolutional Neural Network (CNN) designed for recognizing handwritten digits in grayscale images. Developed in 1998, this model laid the groundwork for many modern CNN architectures.

Key Characteristics

  • Input: Grayscale images of size (32,32,1).
  • Parameters: Approximately 60,000.
  • Activation Functions (Original): Sigmoid and Tanh.
  • Activation Functions (Modern Implementations): Mostly ReLU.

Architectural Layout

The architecture follows a pattern that has become a standard in many contemporary CNN designs:

  1. Convolutional Layer: Extracts features from the input image.
  2. Pooling Layer: Reduces the spatial dimensions (height and width) while retaining important features.
  3. Second Convolutional Layer: Further feature extraction with increased channel depth.
  4. Second Pooling Layer: Additional reduction in spatial dimensions.
  5. Fully Connected Layer: Processes the high-level features extracted by previous layers.
  6. Second Fully Connected Layer: Continues high-level processing.
  7. Output Layer: Originally not using softmax, but in modern adaptations, softmax is often applied for multi-class classification.

LeNet-5 Architecture

In general, we see:

Conv ➡️ pool ➡️ Conv ➡️ pool ➡️ fully connected ➡️ fully connected ➡️ output

Dimensional Changes

A notable aspect of LeNet-5 is how the spatial dimensions (height nHn_H and width nWn_W) of the image decrease, while the number of channels (depth ncn_c) increases as the data passes through the network. This pattern reflects a common strategy in CNN design, where initial layers capture basic features (like edges) and subsequent layers capture increasingly complex features.

Historical Context

  • Publication: "Gradient-based learning applied to document recognition" by LeCun et al., 1998.
  • Impact: Pioneered the use of CNNs for practical applications and influenced numerous subsequent developments in deep learning.

LeNet-5's design principles continue to inspire and inform the structure of many neural networks in the field of image recognition and computer vision.