Logistic Regression
Logistic regression is a statistical method for predicting binary classes. The output is a binary outcome that is a decision between two alternatives, like yes or no, win or lose, cat or no cat. Unlike linear regression, which predicts continuous values, logistic regression is used for predicting discrete outcomes.
Processing Images into Feature Vectors
In image processing, we convert image data from matrices to a feature vector, which is a flattened, one-dimensional array. This feature vector, with dimensions denoted as is what the model uses for training and making predictions.
Notation
- : Number of training examples.
- : Size of the input feature vector.
- : Size of the output vector (usually 1 or 0 for binary classification).
- and : The first input and output vectors in the dataset.
The input data and output data are represented as:
Which can be visualised as:
Using NumPy for Efficient Operations
Python's NumPy library is commonly used for handling these data structures efficiently. It allows for quick vectorized operations on large matrices, a common requirement in deep learning.
Each training example is a pair where is an -dimensional vector (input) and is the output. To check the dimensions in Python, you can use x.shape
, which would return the shape as . The output matrix is usually a vector, indicating binary outputs for each training example.
The Basics of Logistic Regression
In linear regression, we might use a simple equation like , but this doesn't work well for binary classification. Instead, when we have a feature vector , logistic regression predicts a binary outcome using the equation:
Here, is an dimensional vector of weights, is a real number bias term, and is our input vector. The goal is to learn the best values for and so that is as close as possible to the actual outcome . Essentially, given we want .
Sigmoid Function for Probabilities
To ensure the output is between 0 and 1, we apply the sigmoid function:
The sigmoid function squashes the output of the linear equation into a range between 0 and 1, making it possible to interpret the result as a probability.
Note that if is a large positive number, then:
and if is a large negative number, then:
Cost Function in Logistic Regression
Choosing the right cost function is critical.
To train the parameters and , it is critical to choose the right cost function. The First loss function you may think to reach out for would be the mean squared error:
Unlike linear regression, we don't use the mean squared error because it can lead to multiple local minima. Instead, we use the binary cross-entropy loss function:
This loss function is convex, so it has just one minimum, making it easier to optimize. To explain it, we can visualise that:
-
For an actual label of 1:
- Loss simplifies to
- To minimise loss, we want our predicted probability to be the largest
- biggest value is 1
-
For an actual label of 0:
- Loss simplifies to
- we want to be the largest
- we want to be as low as possible, approaching 0.
To measure the performance across the entire training set, we use the cost function:
The cost function is the average of the loss function over all training examples, where the loss function is the binary cross-entropy loss.
Gradient Descent
Gradient Descent is a method used in machine learning to find the best parameters for our model, in this case, and , which will minimize our cost function. We start by setting and to zero. The reason we prefer zero over random numbers in logistic regression is to begin with a neutral decision boundary.
With the learning rate , which controls how big a step we take in updating our parameters, we iteratively adjust and . We do this using the following update rules:
For :
And for :
We can normally denote this simply as:
Here, the derivatives and give us the direction to change and to reduce the cost, meaning they tell us how to adjust our parameters to make our model's predictions more accurate.