LeNet-5
LeNet-5 is a seminal Convolutional Neural Network (CNN) designed for recognizing handwritten digits in grayscale images. Developed in 1998, this model laid the groundwork for many modern CNN architectures.
Key Characteristics
- Input: Grayscale images of size (32,32,1).
- Parameters: Approximately 60,000.
- Activation Functions (Original): Sigmoid and Tanh.
- Activation Functions (Modern Implementations): Mostly ReLU.
Architectural Layout
The architecture follows a pattern that has become a standard in many contemporary CNN designs:
- Convolutional Layer: Extracts features from the input image.
- Pooling Layer: Reduces the spatial dimensions (height and width) while retaining important features.
- Second Convolutional Layer: Further feature extraction with increased channel depth.
- Second Pooling Layer: Additional reduction in spatial dimensions.
- Fully Connected Layer: Processes the high-level features extracted by previous layers.
- Second Fully Connected Layer: Continues high-level processing.
- Output Layer: Originally not using softmax, but in modern adaptations, softmax is often applied for multi-class classification.
In general, we see:
Conv ➡️ pool ➡️ Conv ➡️ pool ➡️ fully connected ➡️ fully connected ➡️ output
Dimensional Changes
A notable aspect of LeNet-5 is how the spatial dimensions (height and width ) of the image decrease, while the number of channels (depth ) increases as the data passes through the network. This pattern reflects a common strategy in CNN design, where initial layers capture basic features (like edges) and subsequent layers capture increasingly complex features.
Historical Context
- Publication: "Gradient-based learning applied to document recognition" by LeCun et al., 1998.
- Impact: Pioneered the use of CNNs for practical applications and influenced numerous subsequent developments in deep learning.
LeNet-5's design principles continue to inspire and inform the structure of many neural networks in the field of image recognition and computer vision.