Logistic Regression in Python & NumPy

Logistic Regression Model Implementation in Python NumPy.

Packages

import numpy as np
import copy
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset
from public_tests import *

Problem Set

# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

General Architecture of the learning algorithm

Mathematical expression of the algorithm:

For one example x(i)x^{(i)}:

The cost is then computed by summing over all training examples:

Building algorithm

The main steps for building a Neural Network are:

  1. Define the model structure (such as number of input features)

  2. Initialize the model's parameters

  3. Loop:

    • Calculate current loss (forward propagation)

    • Calculate current gradient (backward propagation)

    • Update parameters (gradient descent)

You often build 1-3 separately and integrate them into one function we call model().

Initializing parameters

Forward and Backward propagation

Implement a function propagate() that computes the cost function and its gradient.

Forward Propagation:

  • get X

  • compute A=σ(wTX+b)=(a(1),a(2),...,a(m1),a(m))A = \sigma(w^T X + b) = (a^{(1)}, a^{(2)}, ..., a^{(m-1)}, a^{(m)})

  • calculate the cost function: J=1mi=1m(y(i)log(a(i))+(1y(i))log(1a(i)))J = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)}))

Here are the two formulas you will be using:

Optimization

  • You have initialized your parameters.

  • You are also able to compute a cost function and its gradient.

  • Now, you want to update the parameters using gradient descent.

The goal is to learn 𝑤𝑤𝑤 and 𝑏𝑏𝑏 by minimizing the cost function 𝐽𝐽𝐽. For a parameter 𝜃𝜃𝜃, the update rule is 𝜃=𝜃𝛼𝑑𝜃𝜃=𝜃−𝛼 𝑑𝜃𝜃=𝜃−𝛼 𝑑𝜃, where 𝛼𝛼𝛼 is the learning rate.

Predict

The previous function will output the learned w and b. We are able to use w and b to predict the labels for a dataset X. Implement the predict() function. There are two steps to computing predictions:

  1. Calculate 𝑌^=𝐴=𝜎(𝑤𝑇𝑋+𝑏)𝑌̂ =𝐴=𝜎(𝑤𝑇𝑋+𝑏)𝑌^=𝐴=𝜎(𝑤𝑇𝑋+𝑏)

  2. Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector Y_prediction. If you wish, you can use an if/else statement in a for loop (though there is also a way to vectorize this).

Until now:

  • Initialize (w,b)

  • Optimize the loss iteratively to learn parameters (w,b):

    • Computing the cost and its gradient

    • Updating the parameters using gradient descent

  • Use the learned (w,b) to predict the labels for a given set of examples

Merge all functions into a model

Further Analysis

Choice of learning rate

Reminder: In order for Gradient Descent to work you must choose the learning rate wisely. The learning rate 𝛼𝛼𝛼 determines how rapidly we update the parameters. If the learning rate is too large we may "overshoot" the optimal value. Similarly, if it is too small we will need too many iterations to converge to the best values. That's why it is crucial to use a well-tuned learning rate.

Interpretation

  • Different learning rates give different costs and thus different predictions results.

  • If the learning rate is too large (0.01), the cost may oscillate up and down. It may even diverge (though in this example, using 0.01 still eventually ends up at a good value for the cost).

  • A lower cost doesn't mean a better model. You have to check if there is possibly overfitting. It happens when the training accuracy is a lot higher than the test accuracy.

  • In deep learning, we usually recommend that you:

    • Choose the learning rate that better minimizes the cost function.

    • If your model overfits, use other techniques to reduce overfitting.

Last updated