Shallow Neural Network model in Python & NumPy

Neural Network model with 1 hidden layer.

Neural Network model

Defining the neural network structure

Function layer_sizes()

Define three variables:

  • n_x: the size of the input layer

  • n_h: the size of the hidden layer (set this to 4, as n_h = 4, but only for this Exercise 2)

  • n_y: the size of the output layer

def layer_sizes(X, Y):
    """
    Arguments:
    X -- input dataset of shape (input size, number of examples)
    Y -- labels of shape (output size, number of examples)
    
    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """
    n_x = X.shape[0]
    n_h = 4
    n_y = Y.shape[0]
    
    return (n_x, n_h, n_y)

Initialize the model's parameters

Function initialize_parameters()

Instructions:

  • Make sure your parameters' sizes are right. Refer to the neural network figure above if needed.

  • You will initialize the weights matrices with random values.

    • Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).

  • You will initialize the bias vectors as zeros.

    • Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.

Forward Propagation

Implement forward_propagation() using the following equations:

Instructions:

  • Check the mathematical representation of your classifier in the figure above.

  • Use the function sigmoid(). It's built into (imported) this notebook.

  • Use the function np.tanh(). It's part of the numpy library.

  • Implement using these steps:

    1. Retrieve each parameter from the dictionary "parameters" (which is the output of initialize_parameters() by using parameters[".."].

    2. Implement Forward Propagation. Compute 𝑍[1],𝐴[1],𝑍[2]𝑍[1],𝐴[1],𝑍[2]𝑍[1],𝐴[1],𝑍[2] and 𝐴[2]𝐴[2]𝐴[2] (the vector of all your predictions on all the examples in the training set).

  • Values needed in the backpropagation are stored in "cache". The cache will be given as an input to the backpropagation function.

Compute the Cost

Now that you've computed 𝐴[2]𝐴[2](in the Python variable "A2"), which contains 𝑎[2](𝑖)𝑎[2](𝑖) for all examples, you can compute the cost function as follows:

Function compute_cost() to compute the value of the cost 𝐽𝐽.

Backward Propagation

Function backward_propagation().

Figure 1: Backpropagation. Use the six equations on the right.
  • Tips:

    • To compute dZ1 you'll need to compute 𝑔[1](𝑍[1])𝑔[1]′(𝑍[1]).

    • Since 𝑔[1](.)𝑔[1](.) is the tanh activation function, if 𝑎=𝑔[1](𝑧)𝑎=𝑔[1](𝑧) then 𝑔[1](𝑧)=1𝑎2𝑔[1]′(𝑧)=1−𝑎2. So you can compute 𝑔[1](𝑍[1])𝑔[1]′(𝑍[1]) using (1 - np.power(A1, 2)).

Update Parameters

General gradient descent rule: θ=θαJθ\theta = \theta - \alpha \frac{\partial J }{ \partial \theta } where α\alpha is the learning rate and θ\theta represents a parameter.

The gradient descent algorithm with a good learning rate (converging)
The gradient descent algorithm with a bad learning rate (diverging). Images courtesy of Adam Harley.

Hint

  • Use copy.deepcopy(...) when copying lists or dictionaries that are passed as parameters to functions. It avoids input parameters being modified within the function. In some scenarios, this could be inefficient, but it is required for grading purposes.

Integration

Build 1 hidden layer neural network model in nn_model().

Test the Model

Predict with your model by building predict(). Use forward propagation to predict results.

Reminder: predictions = yprediction={1if activation>0.50otherwisey_{prediction} = \begin{cases} 1 & \text{if}\ activation > 0.5 \\ 0 & \text{otherwise} \end{cases}

Tuning hidden layer size

Interpretation:

  • The larger models (with more hidden units) are able to fit the training set better, until eventually the largest models overfit the data.

  • The best hidden layer size seems to be around n_h = 5. Indeed, a value around here seems to fits the data well without also incurring noticeable overfitting.

  • Later, you'll become familiar with regularization, which lets you use very large models (such as n_h = 50) without much overfitting.

Last updated