Practical Advice for Using ConvNets

Using Open-Source Implementations

Given the intricacies involved in replicating neural network (NN) and ConvNet architectures from academic papers, it's often challenging to capture all the nuanced details, such as learning rate decay and hyperparameter tuning. Hence, utilizing open-source codes, commonly shared by researchers on GitHub, is a pragmatic approach. If you see a research paper and you want to build over it, the first thing you should do is to look for an open source implementation for this paper.

Advantages of Open-Source Implementations:

Access to pre-trained models along with their weights, which could have been the product of extensive training on powerful GPUs over several weeks.
Ability to quickly leverage the computational effort and insights of the original authors.

Transfer Learning

Transfer learning is a powerful tool where you utilize pre-trained weights instead of starting from scratch. This method is particularly beneficial when dealing with limited data.

The pretrained models might have trained on a large datasets like ImageNet, Ms COCO, or pascal and took a lot of time to learn those parameters/weights with optimized hyperparameters.

Strategies for Transfer Learning:

Initial Fine-tuning: Download an established NN architecture with pre-trained weights. Replace the final softmax layer with a new layer suited to your specific task. Initially, keep the pre-existing weights frozen and only train the new layer.
Intermediate Representations: Pre-run your dataset through the network up to a certain layer and save these intermediate representations. Then, train a simpler model on this transformed dataset, which can expedite the training process.
Advanced Fine-tuning: If you possess a substantial dataset, consider selectively unfreezing layers from the original network and retraining them on your new data, or even replacing them entirely with custom layers.
Full Network Tuning: With ample data, you can fine-tune all layers of the pre-trained network. Start with the pre-trained weights as your initial condition and continue training to adapt to your specific problem.

Data Augmentation

Augmenting your dataset can significantly enhance the performance of your ConvNet, especially in computer vision tasks.

Common Data Augmentation Techniques:

Mirroring
Random cropping
- The issue with this technique is that you might take a wrong crop.
- The solution is to make your crops big enough.
Rotation
Shearing
Local warping
Translations
Adjusting brightness/contrast
Color shifting
- For example, we add to R, G, and B some distortions that will make the image identified as the same for the human but is different for the computer.
- In practice the added value are pulled from some probability distribution and these shifts are some small. Makes your algorithm more robust in changing colors in images.
- There is an algorithm which is called PCA color augmentation that decides the shifts needed automatically.

Implementation Tips:

Use a different CPU thread to perform data augmentation to prepare batches while the model is training.
Start with established data augmentation libraries and adjust hyperparameters as necessary.

The State of Computer Vision

The amount of data available for a problem significantly influences algorithm design and the extent of hand engineering.

Guidelines for Different Data Volumes:

Large Data: Simpler algorithms and minimal hand engineering.
Moderate Data: Moderate complexity and some hand engineering.
Small Data: More complex architectures and significant hand engineering.

When we say "hand engineering" we essentially mean "Hacks", like choosing a more complex NN architecture. Because we haven't got that much data in a lot of computer vision problems, it relies a lot on hand engineering. We will see in object detection, since there is less data, a more complex NN architectures will be presented.

You also see in the papers people do things that allow you to do well on a benchmark, but that you wouldn't really use in a production system.

Benchmarks and Competitions:

Some tips for doing well on benchmarks/winning competitions:

Ensembling: Combine outputs from multiple models. It can improve accuracy but is computationally expensive and impractical for real-time applications.
- Train several networks independently and average their outputs. Merging down some classifiers.
- After you decide the best architecture for your problem, initialize some of that randomly and train them independently.
- This can give you a push by 2%
- But this will slow down your production by the number of the ensembles. Also it takes more memory as it saves all the models in the memory.
- People use this in competitions but few uses this in a real production.
Multi-crop Testing: Apply your model to multiple crops of the input data and average the results for a slight performance boost.
- Run classifier on multiple versions of test versions and average results. There is a technique called 10 crops that uses this.
- This can give you a better result in the production.
Use open-source code
- Use architectures of networks published in the literature.
- Use open-source implementations if possible.
- Use pretrained models and fine-tune on your dataset.
- they might have trained for weeks, can save us a lot of money and computation.

EfficientNet Object Localization