Landmark Detection

In some of the computer vision problems you will need to output some points. That is called landmark detection. For example, if you are working in a face recognition problem you might want some points on the face like corners of the eyes, corners of the mouth, and corners of the nose and so on. This can help in a lot of application like detecting the pose of the face.

Applications of Landmark Detection

Facial Feature Localization: Identifying specific points on a face, such as the corners of the eyes, mouth, and nose. This is essential for facial recognition systems and for applications like augmented reality (AR) filters, which rely on precise facial feature detection to apply effects correctly.

Woman's Face with Landmarks

The $Y$ shape for the face recognition problem that needs to output 64 landmarks:

Y = [
    THereIsAface # Probability of face is presented 0 or 1
    l1x,
    l1y,
    ....,
    ll64x,
    ll64y
]

You can then have the neural network to tell you where are all the key positions or the key landmarks on a face. AR augmented reality filters like the Snapchat filters use this a key building step. Another application is when you need to get the skeleton of the person using different landmarks/points in the person which helps in some applications. For example, say that in your labeled data, if $l_{1x}$ , $l_{1y}$ is the left corner of left eye, all other $l_{1x}$ , $l_{1y}$ of the other examples has to be the same.

Pose Estimation: Detecting the position and orientation of body parts, like the skeleton of a person, which can be used in sports analytics, gaming, and AR.

You can define a few key positions like the midpoint of the chest, the left shoulder, left elbow, the wrist, and so on, and have a neural network annotate key positions in the person's pose and have a neural network output all of those points.

Pose Detection

Object Localization Sliding Windows