前一段时间把公开课cs231n看完，然后这里分享下assignment3的代码，水平有限，如有疏漏之处请见谅。assignment3主要内容包括Image Captioning和深度网络可视化。对于Image Captioning，已经预先提取了图像特征不需要自己做，只需要自己实现RNN，LSTM就行，下面是各个作业的代码（写的有点烂，各位凑合着看吧）。

Image Captioning with Vanilla RNNs and Image Captioning with LSTMs

实现Vanilla RNN和lstm

rnn_layers.py

import numpy as np"""
This file defines layer types that are commonly used for recurrent neural
networks.
"""def rnn_step_forward(x, prev_h, Wx, Wh, b):"""Run the forward pass for a single timestep of a vanilla RNN that uses a tanhactivation function.The input data has dimension D, the hidden state has dimension H, and we usea minibatch size of N.Inputs:- x: Input data for this timestep, of shape (N, D).- prev_h: Hidden state from previous timestep, of shape (N, H)- Wx: Weight matrix for input-to-hidden connections, of shape (D, H)- Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)- b: Biases of shape (H,)Returns a tuple of:- next_h: Next hidden state, of shape (N, H)- cache: Tuple of values needed for the backward pass."""next_h, cache = None, None############################################################################### TODO: Implement a single forward step for the vanilla RNN. Store the next  ## hidden state and any values you need for the backward pass in the next_h   ## and cache variables respectively.                                          ###############################################################################N, D = x.shape_, H = Wx.shapenext_h = np.dot(x, Wx) + np.dot(prev_h, Wh) + bnext_h = np.tanh(next_h)cache = (x, prev_h, Wx, Wh, next_h)###############################################################################                               END OF YOUR CODE                             ###############################################################################return next_h, cachedef rnn_step_backward(dnext_h, cache):"""Backward pass for a single timestep of a vanilla RNN.Inputs:- dnext_h: Gradient of loss with respect to next hidden state- cache: Cache object from the forward passReturns a tuple of:- dx: Gradients of input data, of shape (N, D)- dprev_h: Gradients of previous hidden state, of shape (N, H)- dWx: Gradients of input-to-hidden weights, of shape (D, H)- dWh: Gradients of hidden-to-hidden weights, of shape (H, H)- db: Gradients of bias vector, of shape (H,)"""dx, dprev_h, dWx, dWh, db = None, None, None, None, None############################################################################### TODO: Implement the backward pass for a single step of a vanilla RNN.      ##                                                                            ## HINT: For the tanh function, you can compute the local derivative in terms ## of the output value from tanh.                                             ###############################################################################(x, prev_h, Wx, Wh, next_h) = cachedtanh = (1 - next_h * next_h) * dnext_hdb = np.sum(dtanh, axis = 0)dWx = np.dot(x.T, dtanh)dWh = np.dot(prev_h.T, dtanh)dprev_h = np.dot(dtanh, Wh.T)dx = np.dot(dtanh, Wx.T)###############################################################################                               END OF YOUR CODE                             ###############################################################################return dx, dprev_h, dWx, dWh, dbdef rnn_forward(x, h0, Wx, Wh, b):"""Run a vanilla RNN forward on an entire sequence of data. We assume an inputsequence composed of T vectors, each of dimension D. The RNN uses a hiddensize of H, and we work over a minibatch containing N sequences. After runningthe RNN forward, we return the hidden states for all timesteps.Inputs:- x: Input data for the entire timeseries, of shape (N, T, D).- h0: Initial hidden state, of shape (N, H)- Wx: Weight matrix for input-to-hidden connections, of shape (D, H)- Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)- b: Biases of shape (H,)Returns a tuple of:- h: Hidden states for the entire timeseries, of shape (N, T, H).- cache: Values needed in the backward pass"""h, cache = None, None############################################################################### TODO: Implement forward pass for a vanilla RNN running on a sequence of    ## input data. You should use the rnn_step_forward function that you defined  ## above.                                                                     ###############################################################################N, T, D = x.shape_, H = Wh.shapecache = []h = np.zeros((N, T, H))for i in np.arange(T):h0, tmp = rnn_step_forward(x[:,i,:], h0, Wx, Wh, b)cache.append(tmp)h[:, i, :] = h0      ###############################################################################                               END OF YOUR CODE                             ###############################################################################return h, cachedef rnn_backward(dh, cache):"""Compute the backward pass for a vanilla RNN over an entire sequence of data.Inputs:- dh: Upstream gradients of all hidden states, of shape (N, T, H)Returns a tuple of:- dx: Gradient of inputs, of shape (N, T, D)- dh0: Gradient of initial hidden state, of shape (N, H)- dWx: Gradient of input-to-hidden weights, of shape (D, H)- dWh: Gradient of hidden-to-hidden weights, of shape (H, H)- db: Gradient of biases, of shape (H,)"""dx, dh0, dWx, dWh, db = None, None, None, None, None############################################################################### TODO: Implement the backward pass for a vanilla RNN running an entire      ## sequence of data. You should use the rnn_step_backward function that you   ## defined above.                                                             ###############################################################################N, T, H = dh.shape_, _, W, _, _ = cache[0]D = W.shape[0] dx = np.zeros((N, T, D))dh0 = np.zeros((N, H))dWx = np.zeros((D, H))dWh = np.zeros((H, H))db = np.zeros(H)dh_prev = np.zeros((N, H))for i in reversed(np.arange(T)):dh_cur = dh_prev + dh[:, i, :] #copy!!!dx[:, i, :], dh_prev, tdWx, tdWh, tdb = rnn_step_backward(dh_cur, cache[i])      dWx += tdWxdWh += tdWhdb += tdbdh0 = dh_prev###############################################################################                               END OF YOUR CODE                             ###############################################################################return dx, dh0, dWx, dWh, dbdef word_embedding_forward(x, W):"""Forward pass for word embeddings. We operate on minibatches of size N whereeach sequence has length T. We assume a vocabulary of V words, assigning eachto a vector of dimension D.Inputs:- x: Integer array of shape (N, T) giving indices of words. Each element idxof x muxt be in the range 0 <= idx < V.- W: Weight matrix of shape (V, D) giving word vectors for all words.Returns a tuple of:- out: Array of shape (N, T, D) giving word vectors for all input words.- cache: Values needed for the backward pass"""out, cache = None, None############################################################################### TODO: Implement the forward pass for word embeddings.                      ##                                                                            ## HINT: This should be very simple.                                          ################################################################################print x.shapeN, T = x.shapeV, D = W.shapeout = np.zeros((N, T, D))for i in np.arange(N):for j in np.arange(T):out[i, j, :] = W[int(x[i, j]), :]cache = (x, W)###############################################################################                               END OF YOUR CODE                             ###############################################################################return out, cachedef word_embedding_backward(dout, cache):"""Backward pass for word embeddings. We cannot back-propagate into the wordssince they are integers, so we only return gradient for the word embeddingmatrix.HINT: Look up the function np.add.atInputs:- dout: Upstream gradients of shape (N, T, D)- cache: Values from the forward passReturns:- dW: Gradient of word embedding matrix, of shape (V, D)."""dW = None############################################################################### TODO: Implement the backward pass for word embeddings.                     ##                                                                            ## HINT: Look up the function np.add.at                                       ###############################################################################N, T, D = dout.shape(x, W) = cacheV, _ = W.shapedW = np.zeros((V, D))for i in np.arange(N):for j in np.arange(T):dW[x[i, j], :] += dout[i, j, :]###############################################################################                               END OF YOUR CODE                             ###############################################################################return dWdef sigmoid(x):"""A numerically stable version of the logistic sigmoid function."""pos_mask = (x >= 0)neg_mask = (x < 0)z = np.zeros_like(x)z[pos_mask] = np.exp(-x[pos_mask])z[neg_mask] = np.exp(x[neg_mask])top = np.ones_like(x)top[neg_mask] = z[neg_mask]return top / (1 + z)def lstm_step_forward(x, prev_h, prev_c, Wx, Wh, b):"""Forward pass for a single timestep of an LSTM.The input data has dimension D, the hidden state has dimension H, and we usea minibatch size of N.Inputs:- x: Input data, of shape (N, D)- prev_h: Previous hidden state, of shape (N, H)- prev_c: previous cell state, of shape (N, H)- Wx: Input-to-hidden weights, of shape (D, 4H)- Wh: Hidden-to-hidden weights, of shape (H, 4H)- b: Biases, of shape (4H,)Returns a tuple of:- next_h: Next hidden state, of shape (N, H)- next_c: Next cell state, of shape (N, H)- cache: Tuple of values needed for backward pass."""next_h, next_c, cache = None, None, None############################################################################## TODO: Implement the forward pass for a single timestep of an LSTM.        ## You may want to use the numerically stable sigmoid implementation above.  ##############################################################################N, D = x.shape_, H = prev_h.shapea = np.dot(x, Wx) + np.dot(prev_h, Wh) + bi = sigmoid(a[:, :H])f = sigmoid(a[:, H:2*H])o = sigmoid(a[:, 2*H:3*H])g = np.tanh(a[:, 3*H:])next_c = f * prev_c + i * gtanh_c = np.tanh(next_c)next_h = o * tanh_ccache = (a, i, f, o, g, tanh_c, Wx, Wh, b, x, prev_c, prev_h)###############################################################################                               END OF YOUR CODE                             ###############################################################################return next_h, next_c, cachedef lstm_step_backward(dnext_h, dnext_c, cache):"""Backward pass for a single timestep of an LSTM.Inputs:- dnext_h: Gradients of next hidden state, of shape (N, H)- dnext_c: Gradients of next cell state, of shape (N, H)- cache: Values from the forward passReturns a tuple of:- dx: Gradient of input data, of shape (N, D)- dprev_h: Gradient of previous hidden state, of shape (N, H)- dprev_c: Gradient of previous cell state, of shape (N, H)- dWx: Gradient of input-to-hidden weights, of shape (D, 4H)- dWh: Gradient of hidden-to-hidden weights, of shape (H, 4H)- db: Gradient of biases, of shape (4H,)"""dx, dprev_h, dprev_c, dWx, dWh, db = None, None, None, None, None, None############################################################################## TODO: Implement the backward pass for a single timestep of an LSTM.       ##                                                                           ## HINT: For sigmoid and tanh you can compute local derivatives in terms of  ## the output value from the nonlinearity.                                   ##############################################################################(a, i, f, o, g, tanh_c, Wx, Wh, b, x, prev_c, prev_h) = cacheN, D = x.shape _, H = prev_h.shapedx = np.zeros_like(a)do = dnext_h * tanh_c # N x Hdtanh_c = dnext_h * o # N x Hdc = dtanh_c * (1 - tanh_c * tanh_c) # N x H#print dc.shape#print dnext_c.shapedc = dc + dnext_c # N x H ??????df = dc * prev_c # N x Hdprev_c = dc * f # N x Hdi = dc * g # N x Hdg = dc * i # N x Hda = np.zeros_like(a)da[:, :H] = di * (1 - i) * ida[:, H:2*H] = df * (1 - f) * fda[:, 2*H:3*H] = do * (1 - o) * oda[:, 3*H:] = dg * (1 - g * g )#da = np.array([da_1,da_2:da_3:da_4]) # N x 4H#print da.shapedb = np.sum(da, axis = 0)dWx = np.dot(x.T, da )dWh = np.dot(prev_h.T, da)dx = np.dot(da, Wx.T)dprev_h = np.dot(da, Wh.T)###############################################################################                               END OF YOUR CODE                             ###############################################################################return dx, dprev_h, dprev_c, dWx, dWh, dbdef lstm_forward(x, h0, Wx, Wh, b):"""Forward pass for an LSTM over an entire sequence of data. We assume an inputsequence composed of T vectors, each of dimension D. The LSTM uses a hiddensize of H, and we work over a minibatch containing N sequences. After runningthe LSTM forward, we return the hidden states for all timesteps.Note that the initial cell state is passed as input, but the initial cellstate is set to zero. Also note that the cell state is not returned; it isan internal variable to the LSTM and is not accessed from outside.Inputs:- x: Input data of shape (N, T, D)- h0: Initial hidden state of shape (N, H)- Wx: Weights for input-to-hidden connections, of shape (D, 4H)- Wh: Weights for hidden-to-hidden connections, of shape (H, 4H)- b: Biases of shape (4H,)Returns a tuple of:- h: Hidden states for all timesteps of all sequences, of shape (N, T, H)- cache: Values needed for the backward pass."""h, cache = None, None############################################################################## TODO: Implement the forward pass for an LSTM over an entire timeseries.   ## You should use the lstm_step_forward function that you just defined.      ##############################################################################N, T, D = x.shape_, H = h0.shapecache = []h = np.zeros((N, T, H))c0 = np.zeros((N, H))for i in np.arange(T):h0, c0, tmp = lstm_step_forward(x[:, i, :], h0, c0, Wx, Wh, b)h[:, i, :] = h0cache.append(tmp)###############################################################################                               END OF YOUR CODE                             ###############################################################################return h, cachedef lstm_backward(dh, cache):"""Backward pass for an LSTM over an entire sequence of data.]Inputs:- dh: Upstream gradients of hidden states, of shape (N, T, H)- cache: Values from the forward passReturns a tuple of:- dx: Gradient of input data of shape (N, T, D)- dh0: Gradient of initial hidden state of shape (N, H)- dWx: Gradient of input-to-hidden weight matrix of shape (D, 4H)- dWh: Gradient of hidden-to-hidden weight matrix of shape (H, 4H)- db: Gradient of biases, of shape (4H,)"""dx, dh0, dWx, dWh, db = None, None, None, None, None############################################################################## TODO: Implement the backward pass for an LSTM over an entire timeseries.  ## You should use the lstm_step_backward function that you just defined.     ##############################################################################N, T, H = dh.shapeD = cache[0][6].shape[0]dc = np.zeros((N, H))dx = np.zeros((N, T, D))dWx = np.zeros((D, 4*H))dWh = np.zeros((H, 4*H))db = np.zeros(4*H)dh0 = np.zeros((N, H))for i in reversed(np.arange(T)):dh0 += dh[:, i, :]dx[:, i, :], dh0, dc, dw, dwh, tdb = lstm_step_backward(dh0, dc, cache[i])dWx += dwdWh += dwh#print tdb.shape#print db.shapedb += tdb###############################################################################                               END OF YOUR CODE                             ###############################################################################return dx, dh0, dWx, dWh, dbdef temporal_affine_forward(x, w, b):"""Forward pass for a temporal affine layer. The input is a set of D-dimensionalvectors arranged into a minibatch of N timeseries, each of length T. We usean affine function to transform each of those vectors into a new vector ofdimension M.Inputs:- x: Input data of shape (N, T, D)- w: Weights of shape (D, M)- b: Biases of shape (M,)Returns a tuple of:- out: Output data of shape (N, T, M)- cache: Values needed for the backward pass"""N, T, D = x.shapeM = b.shape[0]out = x.reshape(N * T, D).dot(w).reshape(N, T, M) + bcache = x, w, b, outreturn out, cachedef temporal_affine_backward(dout, cache):"""Backward pass for temporal affine layer.Input:- dout: Upstream gradients of shape (N, T, M)- cache: Values from forward passReturns a tuple of:- dx: Gradient of input, of shape (N, T, D)- dw: Gradient of weights, of shape (D, M)- db: Gradient of biases, of shape (M,)"""x, w, b, out = cacheN, T, D = x.shapeM = b.shape[0]dx = dout.reshape(N * T, M).dot(w.T).reshape(N, T, D)dw = dout.reshape(N * T, M).T.dot(x.reshape(N * T, D)).Tdb = dout.sum(axis=(0, 1))return dx, dw, dbdef temporal_softmax_loss(x, y, mask, verbose=False):"""A temporal version of softmax loss for use in RNNs. We assume that we aremaking predictions over a vocabulary of size V for each timestep of atimeseries of length T, over a minibatch of size N. The input x gives scoresfor all vocabulary elements at all timesteps, and y gives the indices of theground-truth element at each timestep. We use a cross-entropy loss at eachtimestep, summing the loss over all timesteps and averaging across theminibatch.As an additional complication, we may want to ignore the model output at sometimesteps, since sequences of different length may have been combined into aminibatch and padded with NULL tokens. The optional mask argument tells uswhich elements should contribute to the loss.Inputs:- x: Input scores, of shape (N, T, V)- y: Ground-truth indices, of shape (N, T) where each element is in the range0 <= y[i, t] < V- mask: Boolean array of shape (N, T) where mask[i, t] tells whether or notthe scores at x[i, t] should contribute to the loss.Returns a tuple of:- loss: Scalar giving loss- dx: Gradient of loss with respect to scores x."""N, T, V = x.shapex_flat = x.reshape(N * T, V)y_flat = y.reshape(N * T)mask_flat = mask.reshape(N * T)probs = np.exp(x_flat - np.max(x_flat, axis=1, keepdims=True))probs /= np.sum(probs, axis=1, keepdims=True)loss = -np.sum(mask_flat * np.log(probs[np.arange(N * T), y_flat])) / Ndx_flat = probs.copy()dx_flat[np.arange(N * T), y_flat] -= 1dx_flat /= Ndx_flat *= mask_flat[:, None]if verbose: print 'dx_flat: ', dx_flat.shapedx = dx_flat.reshape(N, T, V)return loss, dx

rnn.py

import numpy as npfrom cs231n.layers import *
from cs231n.rnn_layers import *class CaptioningRNN(object):"""A CaptioningRNN produces captions from image features using a recurrentneural network.The RNN receives input vectors of size D, has a vocab size of V, works onsequences of length T, has an RNN hidden dimension of H, uses word vectorsof dimension W, and operates on minibatches of size N.Note that we don't use any regularization for the CaptioningRNN."""def __init__(self, word_to_idx, input_dim=512, wordvec_dim=128,hidden_dim=128, cell_type='rnn', dtype=np.float32):"""Construct a new CaptioningRNN instance.Inputs:- word_to_idx: A dictionary giving the vocabulary. It contains V entries,and maps each string to a unique integer in the range [0, V).- input_dim: Dimension D of input image feature vectors.- wordvec_dim: Dimension W of word vectors.- hidden_dim: Dimension H for the hidden state of the RNN.- cell_type: What type of RNN to use; either 'rnn' or 'lstm'.- dtype: numpy datatype to use; use float32 for training and float64 fornumeric gradient checking."""if cell_type not in {'rnn', 'lstm'}:raise ValueError('Invalid cell_type "%s"' % cell_type)self.cell_type = cell_typeself.dtype = dtypeself.word_to_idx = word_to_idxself.idx_to_word = {i: w for w, i in word_to_idx.iteritems()}self.params = {}vocab_size = len(word_to_idx)self._null = word_to_idx['<NULL>']self._start = word_to_idx.get('<START>', None)self._end = word_to_idx.get('<END>', None)# Initialize word vectorsself.params['W_embed'] = np.random.randn(vocab_size, wordvec_dim)self.params['W_embed'] /= 100# Initialize CNN -> hidden state projection parametersself.params['W_proj'] = np.random.randn(input_dim, hidden_dim)self.params['W_proj'] /= np.sqrt(input_dim)self.params['b_proj'] = np.zeros(hidden_dim)# Initialize parameters for the RNNdim_mul = {'lstm': 4, 'rnn': 1}[cell_type]self.params['Wx'] = np.random.randn(wordvec_dim, dim_mul * hidden_dim)self.params['Wx'] /= np.sqrt(wordvec_dim)self.params['Wh'] = np.random.randn(hidden_dim, dim_mul * hidden_dim)self.params['Wh'] /= np.sqrt(hidden_dim)self.params['b'] = np.zeros(dim_mul * hidden_dim)# Initialize output to vocab weightsself.params['W_vocab'] = np.random.randn(hidden_dim, vocab_size)self.params['W_vocab'] /= np.sqrt(hidden_dim)self.params['b_vocab'] = np.zeros(vocab_size)# Cast parameters to correct dtypefor k, v in self.params.iteritems():self.params[k] = v.astype(self.dtype)def loss(self, features, captions):"""Compute training-time loss for the RNN. We input image features andground-truth captions for those images, and use an RNN (or LSTM) to computeloss and gradients on all parameters.Inputs:- features: Input image features, of shape (N, D)- captions: Ground-truth captions; an integer array of shape (N, T) whereeach element is in the range 0 <= y[i, t] < VReturns a tuple of:- loss: Scalar loss- grads: Dictionary of gradients parallel to self.params"""# Cut captions into two pieces: captions_in has everything but the last word# and will be input to the RNN; captions_out has everything but the first# word and this is what we will expect the RNN to generate. These are offset# by one relative to each other because the RNN should produce word (t+1)# after receiving word t. The first element of captions_in will be the START# token, and the first element of captions_out will be the first word.captions_in = captions[:, :-1]captions_out = captions[:, 1:]# You'll need thismask = (captions_out != self._null)# Weight and bias for the affine transform from image features to initial# hidden stateW_proj, b_proj = self.params['W_proj'], self.params['b_proj']# Word embedding matrixW_embed = self.params['W_embed']# Input-to-hidden, hidden-to-hidden, and biases for the RNNWx, Wh, b = self.params['Wx'], self.params['Wh'], self.params['b']# Weight and bias for the hidden-to-vocab transformation.W_vocab, b_vocab = self.params['W_vocab'], self.params['b_vocab']loss, grads = 0.0, {}############################################################################# TODO: Implement the forward and backward passes for the CaptioningRNN.   ## In the forward pass you will need to do the following:                   ## (1) Use an affine transformation to compute the initial hidden state     ##     from the image features. This should produce an array of shape (N, H)## (2) Use a word embedding layer to transform the words in captions_in     ##     from indices to vectors, giving an array of shape (N, T, W).         ## (3) Use either a vanilla RNN or LSTM (depending on self.cell_type) to    ##     process the sequence of input word vectors and produce hidden state  ##     vectors for all timesteps, producing an array of shape (N, T, H).    ## (4) Use a (temporal) affine transformation to compute scores over the    ##     vocabulary at every timestep using the hidden states, giving an      ##     array of shape (N, T, V).                                            ## (5) Use (temporal) softmax to compute loss using captions_out, ignoring  ##     the points where the output word is <NULL> using the mask above.     ##                                                                          ## In the backward pass you will need to compute the gradient of the loss   ## with respect to all model parameters. Use the loss and grads variables   ## defined above to store loss and gradients; grads[k] should give the      ## gradients for self.params[k].                                            #############################################################################"""forward pass"""#first layer affine transformation to calculate h0h0, flcache = affine_forward(features, W_proj, b_proj)# Using embedding layer to get word vectorsfx, emcache = word_embedding_forward(captions_in, W_embed)# RNN or LSTMif(self.cell_type == 'rnn'):fx, hicache = rnn_forward(fx, h0, Wx, Wh, b)else: #if(self.cell_type == 'lstm'):fx, hicache = lstm_forward(fx, h0, Wx, Wh, b)# affine transformationfx, affcache = temporal_affine_forward(fx, W_vocab, b_vocab)# softmaxloss, dx = temporal_softmax_loss(fx, captions_out, mask)"""backward pass"""# affine trans backwarddx, grads['W_vocab'], grads['b_vocab'] = temporal_affine_backward(dx, affcache)# RNN or LSTM backwardif(self.cell_type == 'rnn'):dx, dh0, grads['Wx'], grads['Wh'], grads['b'] = rnn_backward(dx, hicache)else:dx, dh0, grads['Wx'], grads['Wh'], grads['b'] = lstm_backward(dx, hicache)# embedding backwardgrads['W_embed'] = word_embedding_backward(dx, emcache)# first layer(affine transform) backward_, grads['W_proj'], grads['b_proj'] = affine_backward(dh0, flcache)#############################################################################                             END OF YOUR CODE                             #############################################################################return loss, gradsdef sample(self, features, max_length=30):"""Run a test-time forward pass for the model, sampling captions for inputfeature vectors.At each timestep, we embed the current word, pass it and the previous hiddenstate to the RNN to get the next hidden state, use the hidden state to getscores for all vocab words, and choose the word with the highest score asthe next word. The initial hidden state is computed by applying an affinetransform to the input image features, and the initial word is the <START>token.For LSTMs you will also have to keep track of the cell state; in that casethe initial cell state should be zero.Inputs:- features: Array of input image features of shape (N, D).- max_length: Maximum length T of generated captions.Returns:- captions: Array of shape (N, max_length) giving sampled captions,where each element is an integer in the range [0, V). The first elementof captions should be the first sampled word, not the <START> token."""N = features.shape[0]captions = self._null * np.ones((N, max_length + 1), dtype=np.int32)# Unpack parametersW_proj, b_proj = self.params['W_proj'], self.params['b_proj']W_embed = self.params['W_embed']Wx, Wh, b = self.params['Wx'], self.params['Wh'], self.params['b']W_vocab, b_vocab = self.params['W_vocab'], self.params['b_vocab']############################################################################ TODO: Implement test-time sampling for the model. You will need to      ## initialize the hidden state of the RNN by applying the learned affine   ## transform to the input image features. The first word that you feed to  ## the RNN should be the <START> token; its value is stored in the         ## variable self._start. At each timestep you will need to do to:          ## (1) Embed the previous word using the learned word embeddings           ## (2) Make an RNN step using the previous hidden state and the embedded   ##     current word to get the next hidden state.                          ## (3) Apply the learned affine transformation to the next hidden state to ##     get scores for all words in the vocabulary                          ## (4) Select the word with the highest score as the next word, writing it ##     to the appropriate slot in the captions variable                    ##                                                                         ## For simplicity, you do not need to stop generating after an <END> token ## is sampled, but you can if you want to.                                 ##                                                                         ## HINT: You will not be able to use the rnn_forward or lstm_forward       ## functions; you'll need to call rnn_step_forward or lstm_step_forward in ## a loop.                                                                 ############################################################################"""sampling process"""# get h0h0, _ = affine_forward(features, W_proj, b_proj)# generate captionsnc = np.zeros_like(h0)word = np.ones((N, 1)) * self._startcaptions[:, 0] = self._startfor i in np.arange(max_length):fx, _ = word_embedding_forward(word, W_embed)#print fx.shapeif( self.cell_type == 'rnn' ):h0, _ = rnn_step_forward(np.squeeze(fx), h0, Wx, Wh, b)else:h0, nc, _ = lstm_step_forward( np.squeeze(fx), h0, nc, Wx, Wh, b )#print fx.shape#h0 = np.squeeze(fx)#h0 = copy(fx)fx, _ = temporal_affine_forward(h0[:, np.newaxis,:], W_vocab, b_vocab) # copy_, T, D = fx.shape#print fx.shapeword = np.argmax(fx, axis = 2)#word = word.reshape(-1, 1)#word = self.idx_to_word[word_idx]#print word.shapecaptions[:, i + 1] = np.squeeze(word)#############################################################################                             END OF YOUR CODE                             #############################################################################return captions

Image Gradients: Saliency maps and Fooling Images

Saliency Maps

from cs231n.layers import svm_loss
def compute_saliency_maps(X, y, model):"""Compute a class saliency map using the model for images X and labels y.Input:- X: Input images, of shape (N, 3, H, W)- y: Labels for X, of shape (N,)- model: A PretrainedCNN that will be used to compute the saliency map.Returns:- saliency: An array of shape (N, H, W) giving the saliency maps for the inputimages."""print X.shapesaliency = None############################################################################### TODO: Implement this function. You should use the forward and backward     ## methods of the PretrainedCNN class, and compute gradients with respect to  ## the unnormalized class score of the ground-truth classes in y.             ###############################################################################out, cache = model.forward(X, None, None, 'test')_, dsc = svm_loss(out, y)print dsc.shapedx = np.zeros_like(dsc)dx[np.arange(X.shape[0]), y] = 1 #better than svm_loss dxdx, _ = model.backward(dx, cache)saliency = np.amax(np.absolute(dx), axis = 1)#print saliency.shape#print X.shape###############################################################################                             END OF YOUR CODE                               ###############################################################################return saliency

Fooling Images

from cs231n.layers import softmax_loss
def make_fooling_image(X, target_y, model):"""Generate a fooling image that is close to X, but that the model classifiesas target_y.Inputs:- X: Input image, of shape (1, 3, 64, 64)- target_y: An integer in the range [0, 100)- model: A PretrainedCNNReturns:- X_fooling: An image that is close to X, but that is classifed as target_yby the model."""X_fooling = X.copy()############################################################################### TODO: Generate a fooling image X_fooling that the model will classify as   ## the class target_y. Use gradient ascent on the target class score, using   ## the model.forward method to compute scores and the model.backward method   ## to compute image gradients.                                                ##                                                                            ## HINT: For most examples, you should be able to generate a fooling image    ## in fewer than 100 iterations of gradient ascent.                           ###############################################################################i = 1while True:out, cache = model.forward(X_fooling, None, None, 'test')if( np.argmax(out) == target_y):print ibreak_, dx = svm_loss(out, target_y)#dx= np.zeros_like(out)#dx[np.arange(X.shape[0]),target_y] = 1.0dx, _ = model.backward(dx, cache)lr = 100X_fooling -= 100 * dx #using svm_loss should be - (gradient descent)if( i > 100):print np.argmax(out)print "Eoor!"breaki += 1###############################################################################                             END OF YOUR CODE                               ###############################################################################return X_fooling

Image Generation: Classes, Inversion, DeepDream

Class visualization

def create_class_visualization(target_y, model, **kwargs):"""Perform optimization over the image to generate class visualizations.Inputs:- target_y: Integer in the range [0, 100) giving the target class- model: A PretrainedCNN that will be used for generationKeyword arguments:- learning_rate: Floating point number giving the learning rate- blur_every: An integer; how often to blur the image as a regularizer- l2_reg: Floating point number giving L2 regularization strength on the image;this is lambda in the equation above.- max_jitter: How much random jitter to add to the image as regularization- num_iterations: How many iterations to run for- show_every: How often to show the image"""learning_rate = kwargs.pop('learning_rate', 10000)blur_every = kwargs.pop('blur_every', 1)l2_reg = kwargs.pop('l2_reg', 1e-6)max_jitter = kwargs.pop('max_jitter', 4)num_iterations = kwargs.pop('num_iterations', 100)show_every = kwargs.pop('show_every', 25)X = np.random.randn(1, 3, 64, 64)for t in xrange(num_iterations):# As a regularizer, add random jitter to the imageox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)X = np.roll(np.roll(X, ox, -1), oy, -2)dX = None############################################################################# TODO: Compute the image gradient dX of the image with respect to the     ## target_y class score. This should be similar to the fooling images. Also ## add L2 regularization to dX and update the image X using the image       ## gradient and the learning rate.                                          #############################################################################out, cache = model.forward(X, mode = 'test')dout = np.zeros_like(out)dout[np.arange(X.shape[0]), target_y] = 1.0dx, _ = model.backward(dout, cache)dx += 2 * l2_reg * XX += learning_rate * dx#############################################################################                             END OF YOUR CODE                             ############################################################################## Undo the jitterX = np.roll(np.roll(X, -ox, -1), -oy, -2)# As a regularizer, clip the imageX = np.clip(X, -data['mean_image'], 255.0 - data['mean_image'])# As a regularizer, periodically blur the imageif t % blur_every == 0:X = blur_image(X)# Periodically show the imageif t % show_every == 0:plt.imshow(deprocess_image(X, data['mean_image']))plt.gcf().set_size_inches(3, 3)plt.axis('off')plt.show()return X

Feature Inversion

def invert_features(target_feats, layer, model, **kwargs):"""Perform feature inversion in the style of Mahendran and Vedaldi 2015, usingL2 regularization and periodic blurring.Inputs:- target_feats: Image features of the target image, of shape (1, C, H, W);we will try to generate an image that matches these features- layer: The index of the layer from which the features were extracted- model: A PretrainedCNN that was used to extract featuresKeyword arguments:- learning_rate: The learning rate to use for gradient descent- num_iterations: The number of iterations to use for gradient descent- l2_reg: The strength of L2 regularization to use; this is lambda in theequation above.- blur_every: How often to blur the image as implicit regularization; setto 0 to disable blurring.- show_every: How often to show the generated image; set to 0 to disableshowing intermediate reuslts.Returns:- X: Generated image of shape (1, 3, 64, 64) that matches the target features."""learning_rate = kwargs.pop('learning_rate', 10000)num_iterations = kwargs.pop('num_iterations', 500)l2_reg = kwargs.pop('l2_reg', 1e-7)blur_every = kwargs.pop('blur_every', 1)show_every = kwargs.pop('show_every', 50)X = np.random.randn(1, 3, 64, 64)for t in xrange(num_iterations):############################################################################# TODO: Compute the image gradient dX of the reconstruction loss with      ## respect to the image. You should include L2 regularization penalizing    ## large pixel values in the generated image using the l2_reg parameter;    ## then update the generated image using the learning_rate from above.      #############################################################################out, cache = model.forward(X, end  = layer, mode = 'test')loss = np.sum((out - target_feats)**2) + l2_reg * np.sum(X**2) dout = 2 * (out - target_feats)dx, _ = model.backward(dout, cache)dx += 2 * l2_reg * XX -= learning_rate * dx#############################################################################                             END OF YOUR CODE                             ############################################################################## As a regularizer, clip the imageX = np.clip(X, -data['mean_image'], 255.0 - data['mean_image'])# As a regularizer, periodically blur the imageif (blur_every > 0) and t % blur_every == 0:X = blur_image(X)if (show_every > 0) and (t % show_every == 0 or t + 1 == num_iterations):print 'loss: %f' %loss plt.imshow(deprocess_image(X, data['mean_image']))plt.gcf().set_size_inches(3, 3)plt.axis('off')plt.title('t = %d' % t)plt.show()

DeepDream

def deepdream(X, layer, model, **kwargs):"""Generate a DeepDream image.Inputs:- X: Starting image, of shape (1, 3, H, W)- layer: Index of layer at which to dream- model: A PretrainedCNN objectKeyword arguments:- learning_rate: How much to update the image at each iteration- max_jitter: Maximum number of pixels for jitter regularization- num_iterations: How many iterations to run for- show_every: How often to show the generated image"""X = X.copy()learning_rate = kwargs.pop('learning_rate', 5.0)max_jitter = kwargs.pop('max_jitter', 16)num_iterations = kwargs.pop('num_iterations', 100)show_every = kwargs.pop('show_every', 25)for t in xrange(num_iterations):# As a regularizer, add random jitter to the imageox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)X = np.roll(np.roll(X, ox, -1), oy, -2)dX = None############################################################################# TODO: Compute the image gradient dX using the DeepDream method. You'll   ## need to use the forward and backward methods of the model object to      ## extract activations and set gradients for the chosen layer. After        ## computing the image gradient dX, you should use the learning rate to     ## update the image X.                                                      #############################################################################out, cache =  model.forward(X, end = layer, mode = 'test')dx, _ = model.backward(out, cache)X += learning_rate * dx#############################################################################                             END OF YOUR CODE                             ############################################################################## Undo the jitterX = np.roll(np.roll(X, -ox, -1), -oy, -2)# As a regularizer, clip the imagemean_pixel = data['mean_image'].mean(axis=(1, 2), keepdims=True)X = np.clip(X, -mean_pixel, 255.0 - mean_pixel)# Periodically show the imageif t == 0 or (t + 1) % show_every == 0:img = deprocess_image(X, data['mean_image'], mean='pixel')plt.imshow(img)plt.title('t = %d' % (t + 1))plt.gcf().set_size_inches(8, 8)plt.axis('off')plt.show()return X

cs231n assignment3相关推荐

cs231n 课程作业 Assignment 3
作业总结终于来到了最后一次作业,这次主要是讲 RNN 或 LSTM 这个时序模型,感觉如果公式已经熟悉了的话(没有的话多看几遍,也可以参考我上篇博文的公式总结,囧),作业应该比上次的简单.代码量也少 ...
吴恩达深度学习课程deeplearning.ai课程作业：Class 1 Week 3 assignment3
吴恩达deeplearning.ai课程作业,自己写的答案. 补充说明: 1. 评论中总有人问为什么直接复制这些notebook运行不了?请不要直接复制粘贴,不可能运行通过的,这个只是notebook ...
cs231n assignment答案
cs231n-assignment答案前言:每年的作业都差不多,但是有些地方有微小改动,比如将循环的内容单独作为一个函数,核心内容其实都是一样的. assignment1要求见该网站:https:/ ...
cs231n笔记总结
cs231n的课程以及作业都完成的差不多了,后续的课程更多的涉及到卷积神经网络的各个子方向了,比如语义分割.目标检测.定位.可视化.迁移学习.模型压缩等等.assignment3有涉及到这些中的一部分 ...
全球名校课程作业分享系列(8)--斯坦福计算机视觉与深度学习CS231n之tensorflow实践
课程作业原地址:CS231n Assignment 1 作业及整理:@邓妍蕾 && @郭承坤 && @寒小阳时间:2018年2月. 出处:http://blog.cs ...
【CS231n】斯坦福大学李飞飞视觉识别课程笔记
最近开了一个新坑--[CS231n]斯坦福大学李飞飞视觉识别课程,准备认真学习并记录自己的学习历程. 文章目录 [CS231n]斯坦福大学李飞飞视觉识别课程笔记课程笔记学习安排 Week 1 We ...
CS231n 2016 通关第三章-SVM 作业分析
作业内容,完成作业便可熟悉如下内容: cell 1 设置绘图默认参数 1 # Run some setup code for this notebook. 2 3 import random 4 i ...
CS231n 学习笔记（3）——神经网络 part3 ：最优化
stanford的course note 近日在维护中,所以换了http://cs231n.stanford.edu/slides/网页的lecture4作为最优化部分的学习资料. 训练神经网络的三要 ...
CS231n 学习笔记（2）——神经网络 part2 ：Softmax classifier
*此系列为斯坦福李飞飞团队的系列公开课"cs231n convolutional neural network for visual recognition "的学习笔记.本文主要 ...
CS231n 学习笔记（1）——神经网络 part1 ：图像分类与数据驱动方法
*此系列为斯坦福李飞飞团队的系列公开课"cs231n convolutional neural network for visual recognition "的学习笔记.本文主要 ...

cs231n assignment3

Image Captioning with Vanilla RNNs and Image Captioning with LSTMs

rnn_layers.py

rnn.py

Image Gradients: Saliency maps and Fooling Images

Saliency Maps

Fooling Images

Image Generation: Classes, Inversion, DeepDream

Class visualization

Feature Inversion

DeepDream

cs231n assignment3相关推荐

最新文章

热门文章