EfficientNet

EfficientNet represents a leap in efficiently scaling convolutional neural networks (CNNs) to fit a variety of computational constraints. It's particularly advantageous for adapting computer vision algorithms to different devices with varying computational capabilities.

Scaling Neural Networks with EfficientNet

EfficientNet provides a methodical way to scale up or down CNNs based on available resources. The authors of the EfficientNet paper, Mingxing Tan and Quoc Le, observed that the three things you could do to scale things up or down are:

Resolution Scaling ( $r$ ): Adjusting the input image resolution.
Depth Scaling ( $d$ ): Altering the depth, or the number of layers, in the network.
Width Scaling ( $w$ ): Modifying the width, or the number of channels in each layer.

EfficientNet Scaling

The central question EfficientNet addresses is determining the optimal balance among these three dimensions of scaling ( $r$ , $d$ , and $w$ ) given a particular computational budget.

Compound Scaling Strategy

EfficientNet introduces compound scaling—a technique for simultaneously scaling the network's depth, width, and resolution in a balanced way:

Balanced Scaling: Rather than scaling one dimension in isolation, compound scaling increases each dimension to a degree that optimally utilizes the computational budget.
Optimizing Performance: The model achieves higher accuracy when scaled appropriately, without unnecessarily increasing computational load.

Practical Application of EfficientNet

EfficientNet is ideal for deploying CNNs across various devices:

Mobile Phones: Tailoring the network to the specific processing abilities of different smartphone models.
Edge Devices: Customizing the network size for devices with limited computational power.

Determining the Scaling Coefficients

The question is, given a particular computational budget, what's the good choice of $r$ , $d$ , and $w$ ? Should you double the resolution and leave depth with the same, or maybe you should double the depth, but leave the others the same, or increase resolution by 10 percent, increase depth by 50 percent, and width by 20 percent? What's the best trade-off between $r$ , $d$ , and $w$ , to scale up or down your neural network, to get the best possible performance within your computational budget? If you are ever looking to adapt a neural network architecture for a particular device, look at one of the open-source implementations of EfficientNet, which will help you to choose a good trade-off between $r$ , $d$ , and $w$ .

Conclusion

EfficientNet provides a framework for systematic and balanced scaling of CNNs. It enables the fine-tuning of network dimensions to match a device's computational budget, striking an optimal balance between model complexity and performance.

Reference: Tan and Le, 2019, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks"

MobileNet ConNet Practical Advice