AlexNet

AlexNet, named after its first author Alex Krizhevsky and co-authored by Geoffrey Hinton, represented a significant advancement in the field of deep learning, particularly for large-scale image classification tasks like the ImageNet challenge. The goal for the model was the ImageNet challenge which classifies images into 1000 classes.

Key Attributes

Input Size: 227 x 227 x 3 images.
Number of Parameters: Approximately 60 million.
Activation Function: RELU (Rectified Linear Unit).
Output: Classification into 1000 classes.
Key Innovations: Use of multiple GPUs and Local Response Normalization (LRN).

Architectural Design

AlexNet is distinguished by its deep and complex structure, which was a substantial scale-up from previous models like LeNet-5.

First Layer: 96 filters of size 11x11 with a stride of 4, reducing the dimension to 55x55.
Max Pooling: 3x3 filter with a stride of 2, reducing the volume to 27x27x96.
Second Convolution: 5x5 filter with SAME padding, maintaining the size at 27x27, but increasing the depth to 256.
Max Pooling: Again reduces the dimensions, this time to 13x13x256.
Additional Convolution Layers: Multiple 3x3 SAME padding convolutions, maintaining the size at 13x13 while adjusting depths.
Final Max Pooling: Leads to a dimension of 6x6x256.
Fully Connected Layers: 9216 inputs connected to 4096 nodes, followed by another fully connected layer.
Output Layer: A softmax layer for classification into 1000 classes.

AlexNet Architecture

Significance

Comparative Size: AlexNet's parameter count was massively higher than LeNet-5's (60 million vs. 60 thousand), signifying a leap in network complexity and capacity.
Hardware Utilization: The use of multiple GPUs was a necessity due to the limitations of GPU speeds at the time.
Local Response Normalization: While initially thought to be beneficial, later research suggested that LRN does not significantly contribute to model performance.

Impact on Computer Vision

AlexNet's success in the ImageNet challenge was an important moment in deep learning. It demonstrated the potential of deep neural networks in computer vision, particularly for large-scale image classification tasks, and played a crucial role in convincing the research community about the importance of deep learning.

Krizhevsky et al., 2012. "ImageNet classification with deep convolutional neural networks"

The advent of AlexNet marked a turning point in neural network design, shifting the focus towards deeper and more complex architectures, and paved the way for subsequent advancements in the field.

LeNet-5 VGG-16