转载自吴恩达老师深度学习课程作业notebook

Character level language model - Dinosaurus land

Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment they are back. You are in charge of a special task. Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go berserk, so choose wisely!

Luckily you have learned some deep learning and you will use it to save the day. Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this dataset. (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs’ wrath!

By completing this assignment you will learn:

  • How to store text data for processing using an RNN
  • How to synthesize data, by sampling predictions at each time step and passing it to the next RNN-cell unit
  • How to build a character-level text generation recurrent neural network
  • Why clipping the gradients is important

We will begin by loading in some functions that we have provided for you in rnn_utils. Specifically, you have access to functions such as rnn_forward and rnn_backward which are equivalent to those you’ve implemented in the previous assignment.

import numpy as np
from utils import *
import random

1 - Problem Statement

1.1 - Dataset and Preprocessing

Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size.

data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
print (sorted(chars))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))
['\n', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
There are 19909 total characters and 27 unique characters in your data.
s = 'Hello, World \n'
print (s.lower())
print (set(s.lower()))
print (list(set(s.lower())))
print (sorted(list(set(s.lower()))))
e = enumerate(sorted(list(set(s.lower()))))
print ( e );
'''
enumerate(sequence, [start=0])
函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列,同时列出数据和数据下标,一般用在 for 循环当中
'''
print (type( e ))
for i,ch in e:print (i ,ch)
hello, world {'l', 'w', 'd', 'r', 'h', ',', '\n', ' ', 'e', 'o'}
['l', 'w', 'd', 'r', 'h', ',', '\n', ' ', 'e', 'o']
['\n', ' ', ',', 'd', 'e', 'h', 'l', 'o', 'r', 'w']
<enumerate object at 0x000001F45188C480>
<class 'enumerate'>
0 1
2 ,
3 d
4 e
5 h
6 l
7 o
8 r
9 w

The characters are a-z (26 characters) plus the “\n” (or newline character), which in this assignment plays a role similar to the <EOS> (or “End of sentence”) token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. In the cell below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26. We also create a second python dictionary that maps each index back to the corresponding character character. This will help you figure out what index corresponds to what character in the probability distribution output of the softmax layer. Below, char_to_ix and ix_to_char are the python dictionaries.

char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }
print(ix_to_char)
{0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}

1.2 - Overview of the model

Your model will have the following structure:

  • Initialize parameters
  • Run the optimization loop
    • Forward propagation to compute the loss function
    • Backward propagation to compute the gradients with respect to the loss function
    • Clip the gradients to avoid exploding gradients
    • Using the gradients, update your parameter with the gradient descent update rule.
  • Return the learned parameters

Figure 1: Recurrent Neural Network, similar to what you had built in the previous notebook "Building a RNN - Step by Step".

At each time-step, the RNN tries to predict what is the next character given the previous characters. The dataset X=(x⟨1⟩,x⟨2⟩,...,x⟨Tx⟩)X = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle})X=(x⟨1⟩,x⟨2⟩,...,x⟨Tx​⟩) is a list of characters in the training set, while Y=(y⟨1⟩,y⟨2⟩,...,y⟨Tx⟩)Y = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle})Y=(y⟨1⟩,y⟨2⟩,...,y⟨Tx​⟩) is such that at every time-step ttt, we have y⟨t⟩=x⟨t+1⟩y^{\langle t \rangle} = x^{\langle t+1 \rangle}y⟨t⟩=x⟨t+1⟩.

2 - Building blocks of the model

In this part, you will build two important blocks of the overall model:

  • Gradient clipping: to avoid exploding gradients
  • Sampling: a technique used to generate characters

You will then apply these two functions to build the model.

2.1 - Clipping the gradients in the optimization loop

In this section you will implement the clip function that you will call inside of your optimization loop. Recall that your overall loop structure usually consists of a forward pass, a cost computation, a backward pass, and a parameter update. Before updating the parameters, you will perform gradient clipping when needed to make sure that your gradients are not “exploding,” meaning taking on overly large values.

In the exercise below, you will implement a function clip that takes in a dictionary of gradients and returns a clipped version of gradients if needed. There are different ways to clip gradients; we will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to lie between some range [-N, N]. More generally, you will provide a maxValue (say 10). In this example, if any component of the gradient vector is greater than 10, it would be set to 10; and if any component of the gradient vector is less than -10, it would be set to -10. If it is between -10 and 10, it is left alone.

Figure 2: Visualization of gradient descent with and without gradient clipping, in a case where the network is running into slight "exploding gradient" problems.

Exercise: Implement the function below to return the clipped gradients of your dictionary gradients. Your function takes in a maximum threshold and returns the clipped versions of your gradients. You can check out this hint for examples of how to clip in numpy. You will need to use the argument out = ....

### GRADED FUNCTION: clipdef clip(gradients, maxValue):'''Clips the gradients' values between minimum and maximum.Arguments:gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValueReturns: gradients -- a dictionary with the clipped gradients.'''dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']### START CODE HERE #### clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)for gradient in [dWax, dWaa, dWya, db, dby]:np.clip(gradient,-maxValue,maxValue,out=gradient)### END CODE HERE ###gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}return gradients
np.random.seed(3)
dWax = np.random.randn(5,3)*10
dWaa = np.random.randn(5,5)*10
dWya = np.random.randn(2,5)*10
db = np.random.randn(5,1)*10
dby = np.random.randn(2,1)*10
gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
gradients = clip(gradients, 10)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])
gradients["dWaa"][1][2] = 10.0
gradients["dWax"][3][1] = -10.0
gradients["dWya"][1][2] = 0.2971381536101662
gradients["db"][4] = [10.]
gradients["dby"][1] = [8.45833407]

Expected output:

gradients[“dWaa”][1][2]

10.0

gradients[“dWax”][3][1]

-10.0

gradients[“dWya”][1][2]

0.29713815361

gradients[“db”][4]

[ 10.]

gradients[“dby”][1]

[ 8.45833407]

2.2 - Sampling

Now assume that your model is trained. You would like to generate new text (characters). The process of generation is explained in the picture below:

Figure 3: In this picture, we assume the model is already trained. We pass in $x^{\langle 1\rangle} = \vec{0}$ at the first time step, and have the network then sample one character at a time.

Exercise: Implement the sample function below to sample characters. You need to carry out 4 steps:

  • Step 1: Pass the network the first “dummy” input x⟨1⟩=0⃗x^{\langle 1 \rangle} = \vec{0}x⟨1⟩=0 (the vector of zeros). This is the default input before we’ve generated any characters. We also set a⟨0⟩=0⃗a^{\langle 0 \rangle} = \vec{0}a⟨0⟩=0

  • Step 2: Run one step of forward propagation to get a⟨1⟩a^{\langle 1 \rangle}a⟨1⟩ and y^⟨1⟩\hat{y}^{\langle 1 \rangle}y^​⟨1⟩. Here are the equations:

(1)a⟨t+1⟩=tanh⁡(Waxx⟨t⟩+Waaa⟨t⟩+b)a^{\langle t+1 \rangle} = \tanh(W_{ax} x^{\langle t \rangle } + W_{aa} a^{\langle t \rangle } + b)\tag{1}a⟨t+1⟩=tanh(Wax​x⟨t⟩+Waa​a⟨t⟩+b)(1)

(2)z⟨t+1⟩=Wyaa⟨t+1⟩+byz^{\langle t + 1 \rangle } = W_{ya} a^{\langle t + 1 \rangle } + b_y \tag{2}z⟨t+1⟩=Wya​a⟨t+1⟩+by​(2)

(3)y^⟨t+1⟩=softmax(z⟨t+1⟩)\hat{y}^{\langle t+1 \rangle } = softmax(z^{\langle t + 1 \rangle })\tag{3}y^​⟨t+1⟩=softmax(z⟨t+1⟩)(3)

Note that y^⟨t+1⟩\hat{y}^{\langle t+1 \rangle }y^​⟨t+1⟩ is a (softmax) probability vector (its entries are between 0 and 1 and sum to 1). y^i⟨t+1⟩\hat{y}^{\langle t+1 \rangle}_iy^​i⟨t+1⟩​ represents the probability that the character indexed by “i” is the next character. We have provided a softmax() function that you can use.

  • Step 3: Carry out sampling: Pick the next character’s index according to the probability distribution specified by y^⟨t+1⟩\hat{y}^{\langle t+1 \rangle }y^​⟨t+1⟩. This means that if y^i⟨t+1⟩=0.16\hat{y}^{\langle t+1 \rangle }_i = 0.16y^​i⟨t+1⟩​=0.16, you will pick the index “i” with 16% probability. To implement it, you can use np.random.choice.

Here is an example of how to use np.random.choice():

np.random.seed(0)
p = np.array([0.1, 0.0, 0.7, 0.2])
index = np.random.choice([0, 1, 2, 3], p = p.ravel())

This means that you will pick the index according to the distribution:
P(index=0)=0.1,P(index=1)=0.0,P(index=2)=0.7,P(index=3)=0.2P(index = 0) = 0.1, P(index = 1) = 0.0, P(index = 2) = 0.7, P(index = 3) = 0.2P(index=0)=0.1,P(index=1)=0.0,P(index=2)=0.7,P(index=3)=0.2.

  • Step 4: The last step to implement in sample() is to overwrite the variable x, which currently stores x⟨t⟩x^{\langle t \rangle }x⟨t⟩, with the value of x⟨t+1⟩x^{\langle t + 1 \rangle }x⟨t+1⟩. You will represent x⟨t+1⟩x^{\langle t + 1 \rangle }x⟨t+1⟩ by creating a one-hot vector corresponding to the character you’ve chosen as your prediction. You will then forward propagate x⟨t+1⟩x^{\langle t + 1 \rangle }x⟨t+1⟩ in Step 1 and keep repeating the process until you get a “\n” character, indicating you’ve reached the end of the dinosaur name.
'''
np.random.choice(a, size=None, replace=True, p=None)
参数意思分别 是从a 中以概率P,随机选择size个, p没有指定的时候相当于是等概率分布
a : 1-D array-like or int如果是ndarray数组,随机样本在该数组获取(取数据元素),如果是整型数据随机样本生成类似np.arange(n)
size:整型或整型元组中元素个数,可选替换:布尔型,可选 样本是否有重复值(False,没有;True,有;默认:True)
p : 1-D array-like, optional1维数组,可选和a里的每个输入联系,如果没有该参数,默认假设a里的每个输入等概率出现。(和a中元素一一对应,表示该元素出现的概率,概率大的,出现较多)replace 代表的意思是抽样之后还放不放回去如果是False的话,那么出来的三个数都不一样。如果是True的话, 有可能会出现重复的,因为前面的抽的放回去了。
'''a1 = np.random.choice(a=5, size=3, replace=False, p=None)
print(a1)
# 非一致的分布,会以多少的概率提出来
a2 = np.random.choice(a=5, size=3, replace=True, p=[0.2, 0.1, 0.3, 0.4, 0.0])
print(a2)
#
[0 4 3]
[3 3 3]
# GRADED FUNCTION: sampledef sample(parameters, char_to_ix, seed):"""Sample a sequence of characters according to a sequence of probability distributions output of the RNNArguments:parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b. char_to_ix -- python dictionary mapping each character to an index.seed -- used for grading purposes. Do not worry about it.Returns:indices -- a list of length n containing the indices of the sampled characters.根据参数随机生成单词:第一个字母完全由parameters生成,后面的字母生成会跟前面的字母有关系,直到遇到'\n',一个单词生成结束np.random.choice(range(vocab_size), p = y.ravel()),根据每个位置的概率,随机输出一个位置值,此位置值的字母被选中"""# Retrieve parameters and relevant shapes from "parameters" dictionaryWaa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']vocab_size = by.shape[0]
#     print ( vocab_size)n_a = Waa.shape[1]### START CODE HERE #### Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)x = np.zeros((vocab_size, 1))# Step 1': Initialize a_prev as zeros (≈1 line)a_prev = np.zeros((n_a, 1))# Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)indices = []# Idx is a flag to detect a newline character, we initialize it to -1idx = -1 # Loop over time-steps t. At each time-step, sample a character from a probability distribution and append # its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well # trained model), which helps debugging and prevents entering an infinite loop. counter = 0newline_character = char_to_ix['\n']while (idx != newline_character and counter != 50):# Step 2: Forward propagate x using the equations (1), (2) and (3)a = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + b)z = np.dot(Wya, a) + byy = softmax(z)# for grading purposesnp.random.seed(counter+seed) # Step 3: Sample the index of a character within the vocabulary from the probability distribution yidx = np.random.choice(range(vocab_size), p = y.ravel())'''ravel()将多维数组降位一维,默认是行序优先ravel与flatten和reshape的区别,见下面链接:https://blog.csdn.net/liuweiyuxiang/article/details/78220080'''# Append the index to "indices"indices.append(idx)# Step 4: Overwrite the input character as the one corresponding to the sampled index.x = np.zeros((vocab_size, 1))x[idx] = 1# Update "a_prev" to be "a"a_prev = a# for grading purposesseed += 1counter +=1### END CODE HERE ###if (counter == 50):indices.append(char_to_ix['\n'])return indices
np.random.seed(2)
_, n_a = 20, 100
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}indices = sample(parameters, char_to_ix, 0)
print("Sampling:")
print("list of sampled indices:", indices)
print("list of sampled characters:", [ix_to_char[i] for i in indices])
Sampling:
list of sampled indices: [12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, 7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 17, 24, 12, 13, 0, 0]
list of sampled characters: ['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o', 'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o', 'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'q', 'x', 'l', 'm', '\n', '\n']

Expected output:

list of sampled indices:

[12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24,
7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 5, 6, 12, 25, 0, 0]

list of sampled characters:

['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o',
'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o',
'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'e', 'f', 'l', 'y', '\n', '\n']

3 - Building the language model

It is time to build the character-level language model for text generation.

3.1 - Gradient descent

In this section you will implement a function performing one step of stochastic gradient descent (with clipped gradients). You will go through the training examples one at a time, so the optimization algorithm will be stochastic gradient descent. As a reminder, here are the steps of a common optimization loop for an RNN:

  • Forward propagate through the RNN to compute the loss
  • Backward propagate through time to compute the gradients of the loss with respect to the parameters
  • Clip the gradients if necessary
  • Update your parameters using gradient descent

Exercise: Implement this optimization process (one step of stochastic gradient descent).

We provide you with the following functions:

def rnn_forward(X, Y, a_prev, parameters):""" Performs the forward propagation through the RNN and computes the cross-entropy loss.It returns the loss' value as well as a "cache" storing values to be used in the backpropagation."""....return loss, cachedef rnn_backward(X, Y, parameters, cache):""" Performs the backward propagation through time to compute the gradients of the loss with respectto the parameters. It returns also all the hidden states."""...return gradients, adef update_parameters(parameters, gradients, learning_rate):""" Updates parameters using the Gradient Descent Update Rule."""...return parameters
# GRADED FUNCTION: optimizedef optimize(X, Y, a_prev, parameters, learning_rate = 0.01):"""Execute one step of the optimization to train the model.Arguments:X -- list of integers, where each integer is a number that maps to a character in the vocabulary.Y -- list of integers, exactly the same as X but shifted one index to the left.a_prev -- previous hidden state.parameters -- python dictionary containing:Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)b --  Bias, numpy array of shape (n_a, 1)by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)learning_rate -- learning rate for the model.Returns:loss -- value of the loss function (cross-entropy)gradients -- python dictionary containing:dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)db -- Gradients of bias vector, of shape (n_a, 1)dby -- Gradients of output bias vector, of shape (n_y, 1)a[len(X)-1] -- the last hidden state, of shape (n_a, 1)"""### START CODE HERE #### Forward propagate through time (≈1 line)loss, cache = rnn_forward(X, Y, a_prev, parameters)# Backpropagate through time (≈1 line)gradients, a = rnn_backward(X, Y, parameters, cache)# Clip your gradients between -5 (min) and 5 (max) (≈1 line)gradients = clip(gradients, 5)# Update parameters (≈1 line)parameters = update_parameters(parameters, gradients, learning_rate)### END CODE HERE ###return loss, gradients, a[len(X)-1]
np.random.seed(1)
vocab_size, n_a = 27, 100
a_prev = np.random.randn(n_a, 1)
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
X = [12,3,5,11,22,3]
Y = [4,14,11,22,25, 26]loss, gradients, a_last = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
print("Loss =", loss)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("np.argmax(gradients[\"dWax\"]) =", np.argmax(gradients["dWax"]))
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])
print("a_last[4] =", a_last[4])
Loss = 126.50397572165346
gradients["dWaa"][1][2] = 0.1947093153472697
np.argmax(gradients["dWax"]) = 93
gradients["dWya"][1][2] = -0.007773876032004693
gradients["db"][4] = [-0.06809825]
gradients["dby"][1] = [0.01538192]
a_last[4] = [-1.]

Expected output:

Loss

126.503975722

gradients[“dWaa”][1][2]

0.194709315347

np.argmax(gradients[“dWax”])

93

gradients[“dWya”][1][2]

-0.007773876032

gradients[“db”][4]

[-0.06809825]

gradients[“dby”][1]

[ 0.01538192]

a_last[4]

[-1.]

3.2 - Training the model

Given the dataset of dinosaur names, we use each line of the dataset (one name) as one training example. Every 100 steps of stochastic gradient descent, you will sample 10 randomly chosen names to see how the algorithm is doing. Remember to shuffle the dataset, so that stochastic gradient descent visits the examples in random order.

Exercise: Follow the instructions and implement model(). When examples[index] contains one dinosaur name (string), to create an example (X, Y), you can use this:

        index = j % len(examples)X = [None] + [char_to_ix[ch] for ch in examples[index]] Y = X[1:] + [char_to_ix["\n"]]

Note that we use: index= j % len(examples), where j = 1....num_iterations, to make sure that examples[index] is always a valid statement (index is smaller than len(examples)).
The first entry of X being None will be interpreted by rnn_forward() as setting x⟨0⟩=0⃗x^{\langle 0 \rangle} = \vec{0}x⟨0⟩=0. Further, this ensures that Y is equal to X but shifted one step to the left, and with an additional “\n” appended to signify the end of the dinosaur name.

# GRADED FUNCTION: modeldef model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27):"""Trains the model and generates dinosaur names. Arguments:data -- text corpusix_to_char -- dictionary that maps the index to a characterchar_to_ix -- dictionary that maps a character to an indexnum_iterations -- number of iterations to train the model forn_a -- number of units of the RNN celldino_names -- number of dinosaur names you want to sample at each iteration. vocab_size -- number of unique characters found in the text, size of the vocabularyReturns:parameters -- learned parameters"""# Retrieve n_x and n_y from vocab_sizen_x, n_y = vocab_size, vocab_size# Initialize parametersparameters = initialize_parameters(n_a, n_x, n_y)# Initialize loss (this is required because we want to smooth our loss, don't worry about it)loss = get_initial_loss(vocab_size, dino_names)# Build list of all dinosaur names (training examples).with open("dinos.txt") as f:examples = f.readlines()examples = [x.lower().strip() for x in examples]'''调用该函数时,可设置传入参数或者不设置参数若不传入字符,则默认去除字符串开头和结尾的空格或换行字符;若传入字符,则会依据传入的字符来进行去除操作'''# Shuffle list of all dinosaur namesnp.random.seed(0)np.random.shuffle(examples)'随机修改序列顺序,改变自身内容'# Initialize the hidden state of your LSTMa_prev = np.zeros((n_a, 1))# Optimization loopfor j in range(num_iterations):
#         np.random.shuffle(examples)
#         '随机修改序列顺序,改变自身内容'### START CODE HERE #### Use the hint above to define one training example (X,Y) (≈ 2 lines)index = j % len(examples)X = [None] + [char_to_ix[ch] for ch in examples[index]] Y = X[1:] + [char_to_ix["\n"]]'''每次拿一个单词来训练模型,反复训练所有样本,那么最终会得到一套过拟合样本的参数。这样创造出来的东西很难有创新,例如像诗歌,会太生硬。'''# Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters# Choose a learning rate of 0.01curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)### END CODE HERE #### Use a latency trick to keep the loss smooth. It happens here to accelerate the training.loss = smooth(loss, curr_loss)# Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properlyif j % 5000 == 0:print('Iteration: %d, Loss: %f' % (j, loss) + '\n')# The number of dinosaur names to printseed = 0for name in range(dino_names):# Sample indices and print themsampled_indices = sample(parameters, char_to_ix, seed)print_sample(sampled_indices, ix_to_char)seed += 1  # To get the same result for grading purposed, increment the seed by one. print('\n')return parameters

Run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations, your model should learn to generate reasonable-looking names.

parameters = model(data, ix_to_char, char_to_ix)
Iteration: 0, Loss: 23.087336Nkzxwtdmfqoeyhsqwasjkjvu
Kneb
Kzxwtdmfqoeyhsqwasjkjvu
Neb
Zxwtdmfqoeyhsqwasjkjvu
Eb
XwtdmfqoeyhsqwasjkjvuIteration: 5000, Loss: 25.290275Ngyusedonis
Klecagropechus
Lytosaurus
Necagropechusangotmeeycerum
Xuskangosaurus
Da
TosaurusIteration: 10000, Loss: 23.844446Onyusaurus
Klecalosaurus
Lustodon
Ola
Xusodonia
Eeaeosaurus
TroceosaurusIteration: 15000, Loss: 23.048476Phyus
Licaacosaurus
Lustrapops
Padaerona
Yuspcheosaurus
Eeagosaurus
TrochipodsaurueongIteration: 20000, Loss: 23.008798Onyusperchohychus
Lola
Lytrranfosaurus
Olaa
Ytrrcharomulus
Ehagosaurus
TrrcharonyhusIteration: 25000, Loss: 22.659178Onyusceratops
Loja
Lystriongoluisaurus
Olaadrria
Yusiangnilumus
Elajropeeryx
TrraraptorIteration: 30000, Loss: 22.587893Piusosaurus
Locaadrus
Lutosaurus
Pacalosaurus
Yusochesaurus
Eg
Trraodon

Conclusion

You can see that your algorithm has started to generate plausible dinosaur names towards the end of the training. At first, it was generating random characters, but towards the end you could see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with hyperparameters to see if you can get even better results. Our implemetation generated some really cool names like maconucon, marloralus and macingsersaurus. Your model hopefully also learned that dinosaur names tend to end in saurus, don, aura, tor, etc.

If your model generates some non-cool names, don’t blame the model entirely–not all actual dinosaur names sound cool. (For example, dromaeosauroides is an actual dinosaur name and is in the training set.) But this model should give you a set of candidates from which you can pick the coolest!

This assignment had used a relatively small dataset, so that you could train an RNN quickly on a CPU. Training a model of the english language requires a much bigger dataset, and usually needs much more computation, and could run for many hours on GPUs. We ran our dinosaur name for quite some time, and so far our favoriate name is the great, undefeatable, and fierce: Mangosaurus!

4 - Writing like Shakespeare

The rest of this notebook is optional and is not graded, but we hope you’ll do it anyway since it’s quite fun and informative.

A similar (but more complicated) task is to generate Shakespeare poems. Instead of learning from a dataset of Dinosaur names you can use a collection of Shakespearian poems. Using LSTM cells, you can learn longer term dependencies that span many characters in the text–e.g., where a character appearing somewhere a sequence can influence what should be a different character much much later in ths sequence. These long term dependencies were less important with dinosaur names, since the names were quite short.

Let's become poets!

We have implemented a Shakespeare poem generator with Keras. Run the following cell to load the required packages and models. This may take a few minutes.

from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Model, load_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking
from keras.layers import LSTM
from keras.utils.data_utils import get_file
from keras.preprocessing.sequence import pad_sequences
from shakespeare_utils import *
import sys
import io
Using TensorFlow backend.Loading text data...
Creating training set...
number of training examples: 31412
Vectorizing training set...
Loading model...C:\conda\envs\tensorflow\lib\site-packages\keras\engine\saving.py:327: UserWarning: Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.warnings.warn('Error in loading the saved optimizer '

To save you some time, we have already trained a model for ~1000 epochs on a collection of Shakespearian poems called “The Sonnets”.

Let’s train the model for one more epoch. When it finishes training for an epoch—this will also take a few minutes—you can run generate_output, which will prompt asking you for an input (<40 characters). The poem will start with your sentence, and our RNN-Shakespeare will complete the rest of the poem for you! For example, try "Forsooth this maketh no sense " (don’t enter the quotation marks). Depending on whether you include the space at the end, your results might also differ–try it both ways, and try other inputs as well.

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)model.fit(x, y, batch_size=128, epochs=1, callbacks=[print_callback])
Epoch 1/1
31412/31412 [==============================] - 62s 2ms/step - loss: 2.7406<keras.callbacks.History at 0x1f457566358>
# Run this cell to try with different inputs without having to re-train the model
generate_output()
Write the beginning of your poem, the Shakespeare machine will complete it. Your input is: good lifeHere is your poem: good life, seound as sack but dos bear,
seland a sad enconce hit in this aud and than dell.conle to a coorny laking love? my gacted night,
but you be mo cenence maduloded seens stain,
hy all my wist reitors to hadd once,
that goos betther for her faith of thas by,
more mades dingness do, i from the alfel in thine
bud deliou gand, in heall the ence me on to dignt,
which whe so srow rus that in whate eyes

The RNN-Shakespeare model is very similar to the one you have built for dinosaur names. The only major differences are:

  • LSTMs instead of the basic RNN to capture longer-range dependencies
  • The model is a deeper, stacked LSTM model (2 layer)
  • Using Keras instead of python to simplify the code

If you want to learn more, you can also check out the Keras Team’s text generation implementation on GitHub: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py.

Congratulations on finishing this notebook!

References:

  • This exercise took inspiration from Andrej Karpathy’s implementation: https://gist.github.com/karpathy/d4dee566867f8291f086. To learn more about text generation, also check out Karpathy’s blog post.
  • For the Shakespearian poem generator, our implementation was based on the implementation of an LSTM text generator by the Keras team: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py

吴恩达深度学习5.1练习_Sequence Models_Character level language model - Dinosaurus land相关推荐

  1. 吴恩达深度学习5.3练习_Sequence Models_Trigger word detection

    转载自吴恩达老师深度学习课程作业notebook Trigger Word Detection Welcome to the final programming assignment of this ...

  2. 吴恩达深度学习5.3练习_Sequence Models_Neural machine translation with attention

    转载自吴恩达老师深度学习课程作业notebook Neural Machine Translation Welcome to your first programming assignment for ...

  3. 吴恩达深度学习5.2练习_Sequence Models_Emojify

    转载自吴恩达老师深度学习课程作业notebook Emojify! Welcome to the second assignment of Week 2. You are going to use w ...

  4. 吴恩达深度学习5.2练习_Sequence Models_Operations on word vectors

    转载自吴恩达老师深度学习课程作业notebook Operations on word vectors Welcome to your first assignment of this week! B ...

  5. 吴恩达深度学习5.1练习_Sequence Models_Building a RNN Step by Step

    转载自吴恩达老师深度学习课程作业notebook 与课件答案不一致的原因: da_prevt = da[:, :, T_x -1] gradients = lstm_cell_backward(da_ ...

  6. 吴恩达深度学习5.3笔记_Sequence Models_序列模型和注意力机制

    版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/weixin_42432468 学习心得: 1.每周的视频课程看一到两遍 2.做笔记 3.做每周的作业 ...

  7. 吴恩达深度学习5.2笔记_Sequence Models_自然语言处理与词嵌入

    版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/weixin_42432468 学习心得: 1.每周的视频课程看一到两遍 2.做笔记 3.做每周的作业 ...

  8. 吴恩达深度学习5.1笔记_Sequence Models_循环序列模型

    版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/weixin_42432468 学习心得: 1.每周的视频课程看一到两遍 2.做笔记 3.做每周的作业 ...

  9. [转载]《吴恩达深度学习核心笔记》发布,黄海广博士整理!

    红色石头 深度学习专栏 深度学习入门首推课程就是吴恩达的深度学习专项课程系列的 5 门课.该专项课程最大的特色就是内容全面.通俗易懂并配备了丰富的实战项目.今天,给大家推荐一份关于该专项课程的核心笔记 ...

最新文章

  1. JVM内存管理学习总结(一)
  2. java编写数字金字塔_用JAVA写数字金字塔
  3. 微信小程序中base64格式的小程序码通过canvas画出来无效
  4. 判断python模型是否安装的办法
  5. Java持久性API(JPA)第7讲——实体生命周期及生命周期回调方法
  6. Hangfire源码解析-如何实现可扩展IOC的?
  7. java二叉树 最大值_leetcode刷题笔记-654. 最大二叉树(java实现)
  8. 【python零基础入门学习】Python入门,带你快速学习,Python 中文编码
  9. sublime text 3 , 3143
  10. 【前端】【cornerstionjs】Cornerstone加载base64表示的jpg图像
  11. VSCode Python解决 No module named 问题
  12. 芒格:如何面对投资中的巨大回撤?
  13. Mac OS 区块链hyperledger环境搭建、环境架构介绍、环境如何用、部署 Chaincode、智能合约的调用
  14. 圈圈usb cannot convert from 'BOOL (__thiscall CMyUsbHidTestAppDlg::* )(UINT,DWORD)' to 'LRESULT (__
  15. 微信小程序与公众号推送消息
  16. 基础篇必看,史上最全的iOS开发教程集锦,没有之一
  17. 时序分析 29 - 时序预测 - 格兰杰因果关系(下) python实践2
  18. VMDK(VMWare Virtual Machine Disk Format)是虚拟机VMware创建的虚拟硬盘格式
  19. python2.7 get-pip.py总是网络超时
  20. uniapp H5端使用高德地图完成路线规划

热门文章

  1. a标签的四个伪类是什么?如何排序?为什么?
  2. a标签无跳转的死链接
  3. EMD_MAINTENANCE.EXECUTE_EM_DBMS_JOB_PROCS的删除创建
  4. Oracle11g安装完成后给用户解锁
  5. 用C++,调用浏览器打开一个网页
  6. myeclipse 10.7 for linux激活
  7. elasticsearch组合查询
  8. echarts 饼图移动端_VUE移动端项目中使用Echart
  9. Spring MVC的国际化
  10. 软件测试用例设计方法-判定表法