Object Localization
Object localization is a critical step in computer vision that bridges the gap between image classification and more complex tasks like object detection and segmentation.
Image Classification
In a basic image classification task, the goal is to categorize an entire image as belonging to one of the predefined classes. Typically, there's a single, central object of interest in the image.
Classification with Localization
Classification with localization not only categorizes the image but also identifies the location of the object within the image using a bounding box. This is generally applied when a single object's position within the image is also of interest.
Object Detection
Object detection extends localization to multiple objects. The task involves detecting all objects of certain classes and their locations within an image. This is crucial for complex scenes with multiple objects, such as in autonomous driving systems.
If you're doing this for an autonomous driving application, then you might need to detect not just other cars, but maybe other pedestrians and motorcycles and maybe even other objects for example.
Semantic Segmentation
Semantic segmentation takes pixel-level classification to the forefront, labeling each pixel of the image with a category. Unlike object detection, it does not differentiate between distinct objects of the same class.
Instance Segmentation
Instance segmentation combines the fine-grained pixel-level classification of semantic segmentation with object differentiation. It not only labels each pixel but also distinguishes between different instances of the same class.
Mechanism of Localization in ConvNets
For classification with localization, a ConvNet is utilized with a Softmax layer for class prediction and additional outputs to specify the bounding box (, , , ).
The dataset should contain these four numbers with the class too. As convention, we denote the upper left as the coordinate (0,0), and at the lower right is (1,1).
Target Label in Localization
The target label vector in a classification with localization problem typically contains:
- : Probability that an object is present.
- : Bounding box center coordinates.
- : Bounding box height and width.
- Class probabilities:
Defining the target label vector in classification with localization problem:
Y = [
Pc # Prob an object is presented i.e. Is there an Obj?
bx # Bounding box
by # Bounding box
bh # Bounding box
bw # Bounding box
c1 # The classes
c2
...
]
Example (When an object is present):
Y = [
1 # Object is present
0.5
0.7
0.3
0.4
0
1
0
]
Example (When object isn't presented):
Y = [
0 # Object isn't presented
? # ? means we don't care about other values
?
?
?
?
?
?
]
Loss Function for Localization
The loss function for the we have created (Example of the square error):
In practice, various loss components such as logistic regression for , log-likelihood for class probabilities, and squared error for bounding boxes.