Region Proposals (R-CNN)

Region-based Convolutional Neural Networks (R-CNNs) are a class of algorithms pivotal in the field of object detection. They are known for their accuracy, although often at the cost of speed, especially when compared to models like YOLO (You Only Look Once).

YOLO vs. R-CNN

YOLO is celebrated for its speed - analyzing the whole image in a single evaluation.

"Our model has several advantages over classifier-based systems. It looks at the whole image at test time so its predictions are informed by global context in the image. It also makes predictions with a single network evaluation unlike systems like R-CNN which require thousands for a single image. This makes it extremely fast, more than 1000x faster than R-CNN and 100x faster than Fast R-CNN. See our paper for more details on the full system."

R-CNNs focus on processing select regions, potentially leading to more accurate but slower performance. YOLO's advantage lies in its ability to evaluate global context in the image rapidly.

R-CNN uses a segmentation algorithm to pick windows and outputs look something like this:

R-CNN Segmentation

It runs a segmentation algorithm on the image to see what could possibly be objects. It then run the classifier on the blobs you find. If for example the segmentation algorithm produces 2000 blob then we should run our classifier/CNN on top of these blobs (This is a lot less than what would have been otherwise with sliding windows).

The Evolution of R-CNNs

R-CNNs have evolved over time, with each iteration aiming to improve speed and efficiency:

R-CNN: Proposes regions and classifies each region individually. While accurate, its main drawback is its slow speed.
- Outputs label and bounding box
- Reference: [Girshik et al., 2013. "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation"]
Fast R-CNN: Improves on R-CNN by using a convolutional implementation of sliding windows to classify all proposed regions simultaneously.
- Reference: [Girshik, 2015. "Fast R-CNN"]
Faster R-CNN: Integrates a region proposal network (RPN) that uses a convolutional network to propose regions, enhancing speed and efficiency.
- Reference: [Ren et al., 2016. "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks"]
Mask R-CNN: An extension of Faster R-CNN that adds support for pixel-level segmentation, making it highly effective for tasks like instance segmentation.
- Read More (opens in a new tab)

Despite these advancements, most Faster R-CNN implementations are generally still slower than YOLO.

Andrew Ng also thinks that the idea behind YOLO is better than R-CNN because you are able to do all the things in just one time instead of two times.

One-Shot Detection Alternatives

Other algorithms have been developed that, like YOLO, perform detection in a single shot, balancing speed and accuracy:

SSD (Single Shot MultiBox Detector): Proposes a method for detecting objects in images using a single deep neural network.
- Reference: [Wei Liu et al., 2015. "SSD: Single Shot MultiBox Detector"]
R-FCN (Region-based Fully Convolutional Networks): Similar to Faster R-CNN but more efficient, focusing on object detection using region-based fully convolutional networks.
- Reference: [Jifeng Dai et al., 2016. "R-FCN: Object Detection via Region-based Fully Convolutional Networks"]

Bounding Box Predictions (YOLO)Semantic Segmentation (U-Nets)