pytorch做seq2seq注意力模型的翻译
以下是对pytorch 1.0版本 的seq2seq+注意力模型做法语--英语翻译的理解(这个代码在pytorch0.4上也可以正常跑):
1 #-*- coding: utf-8 -*- 2 """ 3 Translation with a Sequence to Sequence Network and Attention4 *************************************************************5 **Author**: `Sean Robertson <https://github.com/spro/practical-pytorch>`_6 7 In this project we will be teaching a neural network to translate from8 French to English.9 10 ::11 12 [KEY: > input, = target, < output]13 14 > il est en train de peindre un tableau .15 = he is painting a picture .16 < he is painting a picture .17 18 > pourquoi ne pas essayer ce vin delicieux ?19 = why not try that delicious wine ?20 < why not try that delicious wine ?21 22 > elle n est pas poete mais romanciere .23 = she is not a poet but a novelist .24 < she not not a poet but a novelist .25 26 > vous etes trop maigre .27 = you re too skinny .28 < you re all alone .29 30 ... to varying degrees of success.31 32 This is made possible by the simple but powerful idea of the `sequence33 to sequence network <http://arxiv.org/abs/1409.3215>`__, in which two34 recurrent neural networks work together to transform one sequence to35 another. An encoder network condenses an input sequence into a vector,36 and a decoder network unfolds that vector into a new sequence.37 38 .. figure:: /_static/img/seq-seq-images/seq2seq.png39 :alt:40 41 To improve upon this model we'll use an `attention42 mechanism <https://arxiv.org/abs/1409.0473>`__, which lets the decoder43 learn to focus over a specific range of the input sequence.44 45 **Recommended Reading:**46 47 I assume you have at least installed PyTorch, know Python, and48 understand Tensors:49 50 - https://pytorch.org/ For installation instructions51 - :doc:`/beginner/deep_learning_60min_blitz` to get started with PyTorch in general52 - :doc:`/beginner/pytorch_with_examples` for a wide and deep overview53 - :doc:`/beginner/former_torchies_tutorial` if you are former Lua Torch user54 55 56 It would also be useful to know about Sequence to Sequence networks and57 how they work:58 59 - `Learning Phrase Representations using RNN Encoder-Decoder for60 Statistical Machine Translation <http://arxiv.org/abs/1406.1078>`__61 - `Sequence to Sequence Learning with Neural62 Networks <http://arxiv.org/abs/1409.3215>`__63 - `Neural Machine Translation by Jointly Learning to Align and64 Translate <https://arxiv.org/abs/1409.0473>`__65 - `A Neural Conversational Model <http://arxiv.org/abs/1506.05869>`__66 67 You will also find the previous tutorials on68 :doc:`/intermediate/char_rnn_classification_tutorial`69 and :doc:`/intermediate/char_rnn_generation_tutorial`70 helpful as those concepts are very similar to the Encoder and Decoder71 models, respectively.72 73 And for more, read the papers that introduced these topics:74 75 - `Learning Phrase Representations using RNN Encoder-Decoder for76 Statistical Machine Translation <http://arxiv.org/abs/1406.1078>`__77 - `Sequence to Sequence Learning with Neural78 Networks <http://arxiv.org/abs/1409.3215>`__79 - `Neural Machine Translation by Jointly Learning to Align and80 Translate <https://arxiv.org/abs/1409.0473>`__81 - `A Neural Conversational Model <http://arxiv.org/abs/1506.05869>`__82 83 84 **Requirements**85 """ 86 from __future__ importunicode_literals, print_function, division87 from io importopen88 importunicodedata89 importstring90 importre91 importrandom92 93 importtorch94 importtorch.nn as nn95 from torch importoptim96 importtorch.nn.functional as F97 98 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")99 100 ###################################################################### 101 #Loading data files 102 #================== 103 #104 #The data for this project is a set of many thousands of English to 105 #French translation pairs. 106 #107 #`This question on Open Data Stack 108 #Exchange <http://opendata.stackexchange.com/questions/3888/dataset-of-sentences-translated-into-many-languages>`__ 109 #pointed me to the open translation site http://tatoeba.org/ which has 110 #downloads available at http://tatoeba.org/eng/downloads - and better 111 #yet, someone did the extra work of splitting language pairs into 112 #individual text files here: http://www.manythings.org/anki/ 113 #114 #The English to French pairs are too big to include in the repo, so 115 #download to ``data/eng-fra.txt`` before continuing. The file is a tab 116 #separated list of translation pairs: 117 #118 #:: 119 #120 #I am cold. J'ai froid. 121 #122 #.. Note:: 123 #Download the data from 124 #`here <https://download.pytorch.org/tutorial/data.zip>`_ 125 #and extract it to the current directory. 126 127 ###################################################################### 128 #Similar to the character encoding used in the character-level RNN 129 #tutorials, we will be representing each word in a language as a one-hot 130 #vector, or giant vector of zeros except for a single one (at the index 131 #of the word). Compared to the dozens of characters that might exist in a 132 #language, there are many many more words, so the encoding vector is much 133 #larger. We will however cheat a bit and trim the data to only use a few 134 #thousand words per language. 135 #136 #.. figure:: /_static/img/seq-seq-images/word-encoding.png 137 #:alt: 138 #139 #140 141 142 ###################################################################### 143 #We'll need a unique index per word to use as the inputs and targets of 144 #the networks later. To keep track of all this we will use a helper class 145 #called ``Lang`` which has word → index (``word2index``) and index → word 146 #(``index2word``) dictionaries, as well as a count of each word 147 #``word2count`` to use to later replace rare words. 148 #149 150 SOS_token =0151 EOS_token = 1 152 153 154 #每个单词需要对应唯一的索引作为稍后的网络输入和目标.为了追踪这些索引 155 #则使用一个帮助类 Lang ,类中有 词 → 索引 (word2index) 和 索引 → 词 156 #(index2word) 的字典, 以及每个词word2count 用来替换稀疏词汇. 157 158 159 #此处创建的Lang 对象来表示源/目标语言,它包含三部分:word2index、 160 #index2word 和word2count,分别表示单词到id、id 到单词和单词的词频。 161 #word2count的作用是用于过滤一些低频词(把它变成unknown) 162 163 classLang:164 def __init__(self, name):165 self.name =name166 self.word2index ={}167 self.word2count ={}168 self.index2word = {0: "SOS", 1: "EOS"}169 self.n_words = 2 #Count SOS and EOS 170 171 defaddSentence(self, sentence):172 for word in sentence.split(' '):173 self.addWord(word) #用于添加单词 174 175 defaddWord(self, word):176 if word not in self.word2index: #是不是新的词 177 #如果不在word2index里,则需要新的定义字典 178 self.word2index[word] =self.n_words179 self.word2count[word] = 1 180 self.index2word[self.n_words] =word181 self.n_words += 1 #相当于每次index+1 182 else:183 self.word2count[word] += 1 #计算每次词的个数 184 185 186 ###################################################################### 187 #The files are all in Unicode, to simplify we will turn Unicode 188 #characters to ASCII, make everything lowercase, and trim most 189 #punctuation. 190 #191 192 #Turn a Unicode string to plain ASCII, thanks to 193 #http://stackoverflow.com/a/518232/2809427 194 195 #此处是为了将Unicode字符串转换为纯ASCII 196 #原文件是Unicode编码 197 defunicodeToAscii(s):198 return ''.join(199 c for c in unicodedata.normalize('NFD', s)200 if unicodedata.category(c) != 'Mn' 201 )202 203 204 #Lowercase, trim, and remove non-letter characters 205 206 #小写,修剪和删除非字母字符 207 defnormalizeString(s):208 s =unicodeToAscii(s.lower().strip())209 s = re.sub(r"([.!?])", r"\1", s)210 s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)211 returns212 213 214 ###################################################################### 215 #To read the data file we will split the file into lines, and then split 216 #lines into pairs. The files are all English → Other Language, so if we 217 #want to translate from Other Language → English I added the ``reverse`` 218 #flag to reverse the pairs. 219 #220 221 222 #要读取数据文件,我们将把文件分成行,然后将行成对分开. 这些文件 223 #都是英文→其他语言,所以如果我们想从其他语言翻译→英文,我们添加了 224 #翻转标志 reverse来翻转词语对. 225 def readLangs(lang1, lang2, reverse=False):226 print("Reading lines...")227 228 #Read the file and split into lines 229 #读取文件并按行分开 230 lines = open('data/%s-%s.txt' % (lang1, lang2), encoding='utf-8'). \231 read().strip().split('\n')232 233 #Split every line into pairs and normalize 234 #将每一行分成两列并进行标准化 235 pairs = [[normalizeString(s) for s in l.split('\t')] for l inlines]236 237 #Reverse pairs, make Lang instances 238 #翻转对,Lang实例化 239 ifreverse:240 pairs = [list(reversed(p)) for p inpairs]241 input_lang =Lang(lang2)242 output_lang =Lang(lang1)243 else:244 input_lang =Lang(lang1)245 output_lang =Lang(lang2)246 247 returninput_lang, output_lang, pairs248 249 250 ###################################################################### 251 #Since there are a *lot* of example sentences and we want to train 252 #something quickly, we'll trim the data set to only relatively short and 253 #simple sentences. Here the maximum length is 10 words (that includes 254 #ending punctuation) and we're filtering to sentences that translate to 255 #the form "I am" or "He is" etc. (accounting for apostrophes replaced 256 #earlier). 257 #258 259 #由于例句较多,为了方便快速训练,则会将数据集裁剪为相对简短的句子. 260 #这里的单词的最大长度是10词(包括结束标点符号), 261 #保留”I am” 和”He is” 开头的数据 262 263 MAX_LENGTH = 10 264 265 eng_prefixes =(266 "i am", "i m",267 "he is", "he s",268 "she is", "she s",269 "you are", "you re",270 "we are", "we re",271 "they are", "they re" 272 )273 274 275 deffilterPair(p):276 return len(p[0].split(' ')) < MAX_LENGTH and\277 len(p[1].split(' ')) < MAX_LENGTH and\278 p[1].startswith(eng_prefixes)279 #是否满足长度 280 281 282 deffilterPairs(pairs):283 return [pair for pair in pairs iffilterPair(pair)]284 285 286 ###################################################################### 287 #The full process for preparing the data is: 288 #289 #- Read text file and split into lines, split lines into pairs 290 #- Normalize text, filter by length and content 291 #- Make word lists from sentences in pairs 292 #293 294 def prepareData(lang1, lang2, reverse=False):295 input_lang, output_lang, pairs =readLangs(lang1, lang2, reverse)296 #读入数据lang1,lang2,并翻转 297 print("Read %s sentence pairs" %len(pairs))298 #一共读入了多少对 299 pairs =filterPairs(pairs)300 #符合条件的配对有多少对 301 print("Trimmed to %s sentence pairs" %len(pairs))302 print("Counting words...")303 for pair inpairs:304 input_lang.addSentence(pair[0])305 output_lang.addSentence(pair[1])306 print("Counted words:")307 print(input_lang.name, input_lang.n_words)308 print(output_lang.name, output_lang.n_words)309 returninput_lang, output_lang, pairs310 311 312 #对数据进行预处理 313 input_lang, output_lang, pairs = prepareData('eng', 'fra', True)314 print(random.choice(pairs)) #随机展示一对 315 316 317 ###################################################################### 318 #The Seq2Seq Model 319 #================= 320 #321 #A Recurrent Neural Network, or RNN, is a network that operates on a 322 #sequence and uses its own output as input for subsequent steps. 323 #324 #A `Sequence to Sequence network <http://arxiv.org/abs/1409.3215>`__, or 325 #seq2seq network, or `Encoder Decoder 326 #network <https://arxiv.org/pdf/1406.1078v3.pdf>`__, is a model 327 #consisting of two RNNs called the encoder and decoder. The encoder reads 328 #an input sequence and outputs a single vector, and the decoder reads 329 #that vector to produce an output sequence. 330 #331 #.. figure:: /_static/img/seq-seq-images/seq2seq.png 332 #:alt: 333 #334 #Unlike sequence prediction with a single RNN, where every input 335 #corresponds to an output, the seq2seq model frees us from sequence 336 #length and order, which makes it ideal for translation between two 337 #languages. 338 #339 #Consider the sentence "Je ne suis pas le chat noir" → "I am not the 340 #black cat". Most of the words in the input sentence have a direct 341 #translation in the output sentence, but are in slightly different 342 #orders, e.g. "chat noir" and "black cat". Because of the "ne/pas" 343 #construction there is also one more word in the input sentence. It would 344 #be difficult to produce a correct translation directly from the sequence 345 #of input words. 346 #347 #With a seq2seq model the encoder creates a single vector which, in the 348 #ideal case, encodes the "meaning" of the input sequence into a single 349 #vector — a single point in some N dimensional space of sentences. 350 #351 352 353 ###################################################################### 354 #The Encoder 355 #----------- 356 #357 #The encoder of a seq2seq network is a RNN that outputs some value for 358 #every word from the input sentence. For every input word the encoder 359 #outputs a vector and a hidden state, and uses the hidden state for the 360 #next input word. 361 #362 #.. figure:: /_static/img/seq-seq-images/encoder-network.png 363 #:alt: 364 #365 #366 367 classEncoderRNN(nn.Module):368 def __init__(self, input_size, hidden_size):369 super(EncoderRNN, self).__init__()370 self.hidden_size =hidden_size371 #定义隐藏层 372 self.embedding =nn.Embedding(input_size, hidden_size)373 #word embedding的定义可以这么理解,例如nn.Embedding(2, 4) 374 #2表示有2个词,4表示4维度,其实也就是一个2x4的矩阵, 375 #如果有100个词,每个词10维,就可以写为nn.Embedding(100, 10) 376 #注意这里的词向量的建立只是初始的词向量,并没有经过任何修改优化 377 #需要建立神经网络通过learning的办法修改word embedding里面的参数 378 #使得word embedding每一个词向量能够表示每一个不同的词。 379 self.gru = nn.GRU(hidden_size, hidden_size) #用到了上面提到的GRU模型 380 381 defforward(self, input, hidden):382 embedded = self.embedding(input).view(1, 1, -1) #-1是指自适应,view相当于reshape函数 383 output =embedded384 output, hidden =self.gru(output, hidden)385 returnoutput, hidden386 387 def initHidden(self): #初始化 388 return torch.zeros(1, 1, self.hidden_size, device=device)389 390 391 ###################################################################### 392 #The Decoder 393 #----------- 394 #395 #The decoder is another RNN that takes the encoder output vector(s) and 396 #outputs a sequence of words to create the translation. 397 #398 399 400 ###################################################################### 401 #Simple Decoder 402 #^^^^^^^^^^^^^^ 403 #404 #In the simplest seq2seq decoder we use only last output of the encoder. 405 #This last output is sometimes called the *context vector* as it encodes 406 #context from the entire sequence. This context vector is used as the 407 #initial hidden state of the decoder. 408 #409 #At every step of decoding, the decoder is given an input token and 410 #hidden state. The initial input token is the start-of-string ``<SOS>`` 411 #token, and the first hidden state is the context vector (the encoder's 412 #last hidden state). 413 #414 #.. figure:: /_static/img/seq-seq-images/decoder-network.png 415 #:alt: 416 #417 #418 419 classDecoderRNN(nn.Module):420 #DecoderRNN与encoderRNN结构类似,结合图片即可搞清逻辑 421 def __init__(self, hidden_size, output_size):422 super(DecoderRNN, self).__init__()423 self.hidden_size =hidden_size424 425 self.embedding =nn.Embedding(output_size, hidden_size)426 self.gru =nn.GRU(hidden_size, hidden_size)427 self.out =nn.Linear(hidden_size, output_size)428 self.softmax = nn.LogSoftmax(dim=1)429 430 defforward(self, input, hidden):431 output = self.embedding(input).view(1, 1, -1) #-1是指自适应,view相当于reshape函数 432 output =F.relu(output)433 output, hidden = self.gru(output, hidden) #此处使用gru神经网络 434 #对上述结果使用softmax,就是图片中左边倒数第二个 435 output =self.softmax(self.out(output[0]))436 returnoutput, hidden437 438 definitHidden(self):439 return torch.zeros(1, 1, self.hidden_size, device=device)440 441 442 ###################################################################### 443 #I encourage you to train and observe the results of this model, but to 444 #save space we'll be going straight for the gold and introducing the 445 #Attention Mechanism. 446 #447 448 449 ###################################################################### 450 #Attention Decoder 451 #^^^^^^^^^^^^^^^^^ 452 #453 #If only the context vector is passed betweeen the encoder and decoder, 454 #that single vector carries the burden of encoding the entire sentence. 455 #456 #Attention allows the decoder network to "focus" on a different part of 457 #the encoder's outputs for every step of the decoder's own outputs. First 458 #we calculate a set of *attention weights*. These will be multiplied by 459 #the encoder output vectors to create a weighted combination. The result 460 #(called ``attn_applied`` in the code) should contain information about 461 #that specific part of the input sequence, and thus help the decoder 462 #choose the right output words. 463 #464 #.. figure:: https://i.imgur.com/1152PYf.png 465 #:alt: 466 #467 #Calculating the attention weights is done with another feed-forward 468 #layer ``attn``, using the decoder's input and hidden state as inputs. 469 #Because there are sentences of all sizes in the training data, to 470 #actually create and train this layer we have to choose a maximum 471 #sentence length (input length, for encoder outputs) that it can apply 472 #to. Sentences of the maximum length will use all the attention weights, 473 #while shorter sentences will only use the first few. 474 #475 #.. figure:: /_static/img/seq-seq-images/attention-decoder-network.png 476 #:alt: 477 #478 #479 480 classAttnDecoderRNN(nn.Module):481 def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):482 super(AttnDecoderRNN, self).__init__()483 self.hidden_size =hidden_size484 self.output_size =output_size485 self.dropout_p =dropout_p486 self.max_length =max_length487 488 self.embedding =nn.Embedding(self.output_size, self.hidden_size)489 self.attn = nn.Linear(self.hidden_size * 2, self.max_length)490 self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)491 self.dropout =nn.Dropout(self.dropout_p)492 self.gru =nn.GRU(self.hidden_size, self.hidden_size)493 self.out =nn.Linear(self.hidden_size, self.output_size)494 495 defforward(self, input, hidden, encoder_outputs):496 #对于输入的input内容进行embedding和dropout操作 497 #dropout是指随机丢弃一些神经元 498 embedded = self.embedding(input).view(1, 1, -1)499 embedded =self.dropout(embedded)500 501 #此处相当于学出来了attention的权重 502 #需要注意的是torch的concatenate函数是torch.cat,是在已有的维度上拼接, 503 #而stack是建立一个新的维度,然后再在该纬度上进行拼接。 504 attn_weights =F.softmax(505 self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)506 507 #将attention权重作用在encoder_outputs上 508 #对存储在两个批batch1和batch2内的矩阵进行批矩阵乘操作。 509 #batch1和 batch2都为包含相同数量矩阵的3维张量。 510 #如果batch1是形为b×n×m的张量,batch1是形为b×m×p的张量, 511 #则out和mat的形状都是n×p 512 attn_applied =torch.bmm(attn_weights.unsqueeze(0),513 encoder_outputs.unsqueeze(0))514 #拼接操作,将embedded和attn_Applied拼接起来 515 output = torch.cat((embedded[0], attn_applied[0]), 1)516 #返回一个新的张量,对输入的制定位置插入维度 1 517 output =self.attn_combine(output).unsqueeze(0)518 519 output =F.relu(output)520 output, hidden =self.gru(output, hidden)521 522 output = F.log_softmax(self.out(output[0]), dim=1)523 returnoutput, hidden, attn_weights524 525 definitHidden(self):526 return torch.zeros(1, 1, self.hidden_size, device=device)527 528 529 ###################################################################### 530 #.. note:: There are other forms of attention that work around the length 531 #limitation by using a relative position approach. Read about "local 532 #attention" in `Effective Approaches to Attention-based Neural Machine 533 #Translation <https://arxiv.org/abs/1508.04025>`__. 534 #535 #Training 536 #======== 537 #538 #Preparing Training Data 539 #----------------------- 540 #541 #To train, for each pair we will need an input tensor (indexes of the 542 #words in the input sentence) and target tensor (indexes of the words in 543 #the target sentence). While creating these vectors we will append the 544 #EOS token to both sequences. 545 #546 547 defindexesFromSentence(lang, sentence):548 return [lang.word2index[word] for word in sentence.split(' ')]549 550 551 deftensorFromSentence(lang, sentence):552 #获得词的索引 553 indexes =indexesFromSentence(lang, sentence)554 #将EOS标记添加到两个序列中 555 indexes.append(EOS_token)556 return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)557 558 559 deftensorsFromPair(pair):560 #每一对为需要输入的张量(输入句子中的词的索引)和目标张量 561 #(目标语句中的词的索引) 562 input_tensor =tensorFromSentence(input_lang, pair[0])563 target_tensor = tensorFromSentence(output_lang, pair[1])564 return(input_tensor, target_tensor)565 566 567 ###################################################################### 568 #Training the Model 569 #------------------ 570 #571 #To train we run the input sentence through the encoder, and keep track 572 #of every output and the latest hidden state. Then the decoder is given 573 #the ``<SOS>`` token as its first input, and the last hidden state of the 574 #encoder as its first hidden state. 575 #576 #"Teacher forcing" is the concept of using the real target outputs as 577 #each next input, instead of using the decoder's guess as the next input. 578 #Using teacher forcing causes it to converge faster but `when the trained 579 #network is exploited, it may exhibit 580 #instability <http://minds.jacobs-university.de/sites/default/files/uploads/papers/ESNTutorialRev.pdf>`__. 581 #582 #You can observe outputs of teacher-forced networks that read with 583 #coherent grammar but wander far from the correct translation - 584 #intuitively it has learned to represent the output grammar and can "pick 585 #up" the meaning once the teacher tells it the first few words, but it 586 #has not properly learned how to create the sentence from the translation 587 #in the first place. 588 #589 #Because of the freedom PyTorch's autograd gives us, we can randomly 590 #choose to use teacher forcing or not with a simple if statement. Turn 591 #``teacher_forcing_ratio`` up to use more of it. 592 #593 594 teacher_forcing_ratio = 0.5 595 596 597 #teacher forcing即指使用教师强迫其能够更快的收敛 598 #不过当训练好的网络被利用时,容易表现出不稳定性 599 #teacher_forcing_ratio即指教师训练比率 600 #用于训练的函数 601 602 603 deftrain(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion,604 max_length=MAX_LENGTH):605 #encoder即指EncoderRNN(input_lang.n_words, hidden_size) 606 #attn_decoder即指 AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1) 607 #hidden=256 608 encoder_hidden =encoder.initHidden()609 610 #encoder_optimizer 即指optim.SGD(encoder.parameters(), lr=learning_rate) 611 #decoder_optimizer 即指optim.SGD(decoder.parameters(), lr=learning_rate) 612 #nn.Parameter()是Variable的一种,常被用于模块参数(module parameter)。 613 #Parameters 是 Variable 的子类。Paramenters和Modules一起使用的时候会有一些特殊的属性, 614 #即:当Paramenters赋值给Module的属性的时候,他会自动的被加到 Module的 参数列表中 615 #(即:会出现在 parameters() 迭代器中)。将Varibale赋值给Module属性则不会有这样的影响。 616 #这样做的原因是:我们有时候会需要缓存一些临时的状态(state), 比如:模型中RNN的最后一个隐状态。 617 #如果没有Parameter这个类的话,那么这些临时变量也会注册成为模型变量。 618 encoder_optimizer.zero_grad()619 decoder_optimizer.zero_grad()620 621 #得到长度 622 input_length =input_tensor.size(0)623 target_length =target_tensor.size(0)624 625 #初始化outour值 626 encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)627 628 loss =0629 630 #以下循环是学习过程 631 for ei inrange(input_length):632 encoder_output, encoder_hidden =encoder(input_tensor[ei], encoder_hidden)633 encoder_outputs[ei] = encoder_output[0, 0] #这里为什么取 0,0 634 635 #定义decoder的Input值 636 decoder_input = torch.tensor([[SOS_token]], device=device)637 638 decoder_hidden =encoder_hidden639 640 use_teacher_forcing = True if random.random() < teacher_forcing_ratio elseFalse641 642 ifuse_teacher_forcing:643 #Teacher forcing: Feed the target as the next input 644 #教师强制: 将目标作为下一个输入 645 #你观察教师强迫网络的输出,这些网络是用连贯的语法阅读的,但却远离了正确的翻译 - 646 #直观地来看它已经学会了代表输出语法,并且一旦老师告诉它前几个单词,就可以"拾取"它的意思, 647 #但它没有适当地学会如何从翻译中创建句子. 648 for di inrange(target_length):649 #通过decoder得到输出值 650 decoder_output, decoder_hidden, decoder_attention =decoder(651 decoder_input, decoder_hidden, encoder_outputs)652 #定义损失函数并计算 653 loss +=criterion(decoder_output, target_tensor[di])654 decoder_input = target_tensor[di] #Teacher forcing 655 656 else:657 #Without teacher forcing: use its own predictions as the next input 658 #没有教师强迫: 使用自己的预测作为下一个输入 659 for di inrange(target_length):660 #通过decoder得到输出值 661 decoder_output, decoder_hidden, decoder_attention =decoder(662 decoder_input, decoder_hidden, encoder_outputs)663 664 #topk:第k个最小元素,返回第k个最小元素 665 #返回前k个最大元素,注意是前k个,largest=False,返回前k个最小元素 666 #此函数的功能是求取1-D 或N-D Tensor的最低维度的前k个最大的值,返回值为两个Tuple 667 #其中values是前k个最大值的Tuple,indices是对应的下标,默认返回结果是从大到小排序的。 668 topv, topi = decoder_output.topk(1)669 decoder_input = topi.squeeze().detach() #detach from history as input 670 671 loss +=criterion(decoder_output, target_tensor[di])672 if decoder_input.item() ==EOS_token:673 break 674 #反向传播 675 loss.backward()676 677 #更新参数 678 encoder_optimizer.step()679 decoder_optimizer.step()680 681 return loss.item() /target_length682 683 684 ###################################################################### 685 #This is a helper function to print time elapsed and estimated time 686 #remaining given the current time and progress %. 687 #688 689 importtime690 importmath691 692 693 #根据当前时间和进度百分比,这是一个帮助功能,用于打印经过的时间和估计的剩余时间. 694 695 defasMinutes(s):696 m = math.floor(s / 60)697 s -= m * 60 698 return '%dm %ds' %(m, s)699 700 701 deftimeSince(since, percent):702 now =time.time()703 s = now -since704 es = s /(percent)705 rs = es -s706 return '%s (- %s)' %(asMinutes(s), asMinutes(rs))707 708 709 ###################################################################### 710 #The whole training process looks like this: 711 #712 #- Start a timer 713 #- Initialize optimizers and criterion 714 #- Create set of training pairs 715 #- Start empty losses array for plotting 716 #717 #Then we call ``train`` many times and occasionally print the progress (% 718 #of examples, time so far, estimated time) and average loss. 719 #720 721 def trainIters(encoder, decoder, n_iters, print_every=1000, plot_every=100, learning_rate=0.01):722 start =time.time()723 plot_losses =[]724 print_loss_total = 0 #Reset every print_every 725 plot_loss_total = 0 #Reset every plot_every 726 727 encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)728 decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)729 730 #获取训练的一对样本 731 training_pairs =[tensorsFromPair(random.choice(pairs))732 for i inrange(n_iters)]733 #定义出的损失函数 734 criterion =nn.NLLLoss()735 736 for iter in range(1, n_iters + 1):737 training_pair = training_pairs[iter - 1]738 input_tensor =training_pair[0]739 target_tensor = training_pair[1]740 741 #训练的过程并用于当损失函数 742 loss =train(input_tensor, target_tensor, encoder,743 decoder, encoder_optimizer, decoder_optimizer, criterion)744 print_loss_total +=loss745 plot_loss_total +=loss746 747 if iter % print_every ==0:748 print_loss_avg = print_loss_total /print_every749 print_loss_total =0750 #打印进度(样本的百分比,到目前为止的时间,估计的时间)和平均损失. 751 print('%s (%d %d%%) %.4f' % (timeSince(start, iter /n_iters),752 iter, iter / n_iters * 100, print_loss_avg))753 754 if iter % plot_every ==0:755 plot_loss_avg = plot_loss_total /plot_every756 plot_losses.append(plot_loss_avg)757 plot_loss_total =0758 #绘制图像 759 showPlot(plot_losses)760 761 762 ###################################################################### 763 #Plotting results 764 #---------------- 765 #766 #Plotting is done with matplotlib, using the array of loss values 767 #``plot_losses`` saved while training. 768 #769 770 importmatplotlib.pyplot as plt771 772 plt.switch_backend('agg')773 importmatplotlib.ticker as ticker774 importnumpy as np775 776 777 #使用matplotlib进行绘图,使用训练时保存的损失值plot_losses数组. 778 defshowPlot(points):779 plt.figure()780 fig, ax =plt.subplots()781 #this locator puts ticks at regular intervals 782 #这个定位器会定期发出提示信息 783 loc = ticker.MultipleLocator(base=0.2)784 ax.yaxis.set_major_locator(loc)785 plt.plot(points)786 787 788 ###################################################################### 789 #Evaluation 790 #========== 791 #792 #Evaluation is mostly the same as training, but there are no targets so 793 #we simply feed the decoder's predictions back to itself for each step. 794 #Every time it predicts a word we add it to the output string, and if it 795 #predicts the EOS token we stop there. We also store the decoder's 796 #attention outputs for display later. 797 #798 799 def evaluate(encoder, decoder, sentence, max_length=MAX_LENGTH):800 with torch.no_grad():801 #从sentence中得到对应的变量 802 input_tensor =tensorFromSentence(input_lang, sentence)803 #长度 804 input_length =input_tensor.size()[0]805 806 #encoder即指EncoderRNN(input_lang.n_words, hidden_size) 807 #attn_decoder即指 AttnDecoderRNN(hidden_size, 808 #output_lang.n_words, dropout_p=0.1) 809 #hidden=256 810 encoder_hidden =encoder.initHidden()811 812 #初始化outputs值 813 encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)814 815 #以下是学习过程 816 for ei inrange(input_length):817 encoder_output, encoder_hidden =encoder(input_tensor[ei],818 encoder_hidden)819 encoder_outputs[ei] +=encoder_output[0, 0]820 821 #定义好decoder部分的input值 822 decoder_input = torch.tensor([[SOS_token]], device=device) #SOS 823 824 #设置好隐藏层 825 decoder_hidden =encoder_hidden826 827 decoded_words =[]828 decoder_attentions =torch.zeros(max_length, max_length)829 830 for di inrange(max_length):831 #得到结果 832 decoder_output, decoder_hidden, decoder_attention =decoder(decoder_input, decoder_hidden, encoder_outputs)833 834 #attention部分的数据 835 decoder_attentions[di] =decoder_attention.data836 #选择output中的第一个值 837 topv, topi = decoder_output.data.topk(1)838 if topi.item() ==EOS_token:839 decoded_words.append('<EOS>')840 break 841 else:842 decoded_words.append(output_lang.index2word[topi.item()]) #将output_lang添加到decoded 843 844 decoder_input =topi.squeeze().detach()845 846 return decoded_words, decoder_attentions[:di + 1]847 848 849 ###################################################################### 850 #We can evaluate random sentences from the training set and print out the 851 #input, target, and output to make some subjective quality judgements: 852 #853 854 #从训练集中评估随机的句子并打印出输入,目标和输出以作出一些主观质量判断 855 def evaluateRandomly(encoder, decoder, n=10):856 for i inrange(n):857 pair =random.choice(pairs)858 print('>', pair[0])859 print('=', pair[1])860 output_words, attentions =evaluate(encoder, decoder, pair[0])861 output_sentence = ' '.join(output_words)862 print('<', output_sentence)863 print('')864 865 866 ###################################################################### 867 #Training and Evaluating 868 #======================= 869 #870 #With all these helper functions in place (it looks like extra work, but 871 #it makes it easier to run multiple experiments) we can actually 872 #initialize a network and start training. 873 #874 #Remember that the input sentences were heavily filtered. For this small 875 #dataset we can use relatively small networks of 256 hidden nodes and a 876 #single GRU layer. After about 40 minutes on a MacBook CPU we'll get some 877 #reasonable results. 878 #879 #.. Note:: 880 #If you run this notebook you can train, interrupt the kernel, 881 #evaluate, and continue training later. Comment out the lines where the 882 #encoder and decoder are initialized and run ``trainIters`` again. 883 #884 885 hidden_size = 256 886 #编码部分 887 encoder1 =EncoderRNN(input_lang.n_words, hidden_size).to(device)888 #加入了attention机制的解码部分 889 attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)890 #训练部分 891 trainIters(encoder1, attn_decoder1, 75000, print_every=5000)892 893 ###################################################################### 894 #随机生成一组结果 895 evaluateRandomly(encoder1, attn_decoder1)896 897 ###################################################################### 898 #Visualizing Attention 899 #--------------------- 900 #901 #A useful property of the attention mechanism is its highly interpretable 902 #outputs. Because it is used to weight specific encoder outputs of the 903 #input sequence, we can imagine looking where the network is focused most 904 #at each time step. 905 #906 #You could simply run ``plt.matshow(attentions)`` to see attention output 907 #displayed as a matrix, with the columns being input steps and rows being 908 #output steps: 909 #910 911 output_words, attentions = evaluate(encoder1, attn_decoder1, "je suis trop froid .")912 plt.matshow(attentions.numpy())913 914 915 ###################################################################### 916 #For a better viewing experience we will do the extra work of adding axes 917 #and labels: 918 919 defshowAttention(input_sentence, output_words, attentions):920 #Set up figure with colorbar 921 fig =plt.figure()922 ax = fig.add_subplot(111)923 cax = ax.matshow(attentions.numpy(), cmap='bone')924 fig.colorbar(cax)925 926 #Set up axes 927 ax.set_xticklabels([''] + input_sentence.split(' ') + 928 ['<EOS>'], rotation=90)929 ax.set_yticklabels([''] +output_words)930 931 #Show label at every tick 932 ax.xaxis.set_major_locator(ticker.MultipleLocator(1))933 ax.yaxis.set_major_locator(ticker.MultipleLocator(1))934 935 plt.show()936 937 938 defevaluateAndShowAttention(input_sentence):939 output_words, attentions =evaluate(940 encoder1, attn_decoder1, input_sentence)941 print('input =', input_sentence)942 print('output =', ' '.join(output_words))943 showAttention(input_sentence, output_words, attentions)944 945 946 evaluateAndShowAttention("elle a cinq ans de moins que moi .")947 evaluateAndShowAttention("elle est trop petit .")948 evaluateAndShowAttention("je ne crains pas de mourir .")949 evaluateAndShowAttention("c est un jeune directeur plein de talent .")950 951 ###################################################################### 952 #Exercises 953 #========= 954 #955 #- Try with a different dataset 956 #957 #- Another language pair 958 #- Human → Machine (e.g. IOT commands) 959 #- Chat → Response 960 #- Question → Answer 961 #962 #- Replace the embeddings with pre-trained word embeddings such as word2vec or 963 #GloVe 964 #- Try with more layers, more hidden units, and more sentences. Compare 965 #the training time and results. 966 #- If you use a translation file where pairs have two of the same phrase 967 #(``I am test \t I am test``), you can use this as an autoencoder. Try 968 #this: 969 #970 #- Train as an autoencoder 971 #- Save only the Encoder network 972 #- Train a new Decoder for translation from there 973 #
转载于:https://www.cnblogs.com/www-caiyin-com/p/10123346.html
pytorch做seq2seq注意力模型的翻译相关推荐
- 人工智能之注意力模型
朋友们,如需转载请标明出处:http://blog.csdn.net/jiangjunshow 注意力模型 通过对教程中前面一些文章的学习,我们知道可以用上面的神经网络来实现机器翻译.假设要将一段法语 ...
- [翻译Pytorch教程]NLP从零开始:使用序列到序列网络和注意力机制进行翻译
翻译自官网手册:NLP From Scratch: Translation with a Sequence to Sequence Network and Attention Author: Sean ...
- GAT: 图注意力模型介绍及PyTorch代码分析
文章目录 GAT: 图注意力模型介绍及代码分析 原理 图注意力层(Graph Attentional Layer) 情境一:节点和它的一个邻居 情境二:节点和它的多个邻节点 聚合(Aggregatio ...
- 基于PyTorch实现Seq2Seq + Attention的英汉Neural Machine Translation
NMT(Neural Machine Translation)基于神经网络的机器翻译模型效果越来越好,还记得大学时代Google翻译效果还是差强人意,近些年来使用NMT后已基本能满足非特殊需求了.目前 ...
- Tensorflow 自动文摘: 基于Seq2Seq+Attention模型的Textsum模型
Github下载完整代码 https://github.com/rockingdingo/deepnlp/tree/master/deepnlp/textsum 简介 这篇文章中我们将基于Tensor ...
- 【NLP】Attention Model(注意力模型)学习总结
最近一直在研究深度语义匹配算法,搭建了个模型,跑起来效果并不是很理想,在分析原因的过程中,发现注意力模型在解决这个问题上还是很有帮助的,所以花了两天研究了一下. 此文大部分参考深度学习中的注意力机制( ...
- 王小草【深度学习】笔记第七弹--RNN与应用案例:注意力模型与机器翻译
标签(空格分隔): 王小草深度学习笔记 1. 注意力模型 1.2 注意力模型概述 注意力模型(attention model)是一种用于做图像描述的模型.在笔记6中讲过RNN去做图像描述,但是精准度可 ...
- 关于《注意力模型--Attention注意力机制》的学习
关于<注意力模型--Attention注意力机制>的学习 此文大部分参考深度学习中的注意力机制(2017版) 张俊林的博客,不过添加了一些个人的思考与理解过程.在github上找到一份基于 ...
- Seq2Seq Attention模型详解
目录 一.从传统Seq2Seq说起 二.在Seq2Seq中引入Attention 三.引入Attention后,与传统的Seq2Seq的不同之处 四.Seq2Seq的损失计算和解码过程 Seq2seq ...
最新文章
- pytorch自定义交叉熵损失函数
- 使用RawSocket进行网络抓包
- c语言不用switch做计算器,超级新手,用switch写了个计算器程序,求指导
- 1128 N Queens Puzzle (20 分)【难度: 一般 / 知识点: 模拟】
- android studio查看字节码,使用Android studio查看Kotlin的字节码教程
- JZOJ 5421. 【NOIP2017提高A组集训10.25】嘟嘟噜
- XgCalendar 代码导读和Demo详解(1)参数说明和数据结构
- java nio.2群发_JAVA NIO TCP SOCKET 聊天群发
- 电脑有回声_电脑连接麦克风有回音怎么办?麦克风回声的解决方法
- 如何保留小数点后任意一位数
- ManualResetEvent类的用法
- 淘宝母婴购物数据分析
- 2020大疆数字IC校招笔试题(3)——CMOS 反相器【CMOS逻辑】【MOS管】【PMOS】【NMOS】
- mmdetection3d SUN RGB-D数据集预处理
- Python:安装 psycopg2
- hbuilderx代码自动补全_HBuilderX代码提示系统说明
- oracle add_months()函数
- Unable to start activity com.unionpay.uppay.PayActivity
- [电路]12-回路电流法
- Facebook变现方式详解