show attend and tell 计算bleu分数（1到4）

Calculate BLEU scores

参考：How to do calculate all bleu scores during evaluvaion #37
主要是参数的改变，默认计算的是BLEU4的分数，从源码中也可以看出来

# Calculate BLEU-4 scores
bleu4 = corpus_bleu(references, hypotheses)
# weights = (1.0 / 1.0,)
bleu1 = corpus_bleu(references, hypotheses, (1.0 / 1.0,))
bleu2 = corpus_bleu(references, hypotheses, (1.0/2.0, 1.0/2.0,))
bleu3 = corpus_bleu(references, hypotheses, (1.0/3.0, 1.0/3.0, 1.0/3.0,))
bleu1_4 = 'bleu1:' + str(bleu1) + '\n' + 'bleu2:' + str(bleu2) + '\n' + 'bleu3:' + str(bleu3) + '\n' + 'bleu4:' + str(bleu4)
print(bleu1_4)

改变bleu4中weights=(0.25, 0.25, 0.25, 0.25)的值即可以得到相应的值

corpus_bleu源码（有兴趣可以全看，没有的话只看参数和注释即可）：

def corpus_bleu(list_of_references,hypotheses,weights=(0.25, 0.25, 0.25, 0.25),smoothing_function=None,auto_reweigh=False,
):"""Calculate a single corpus-level BLEU score (aka. system-level BLEU) for allthe hypotheses and their respective references.Instead of averaging the sentence level BLEU scores (i.e. macro-averageprecision), the original BLEU metric (Papineni et al. 2002) accounts forthe micro-average precision (i.e. summing the numerators and denominatorsfor each hypothesis-reference(s) pairs before the division).>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',...         'ensures', 'that', 'the', 'military', 'always',...         'obeys', 'the', 'commands', 'of', 'the', 'party']>>> ref1a = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',...          'ensures', 'that', 'the', 'military', 'will', 'forever',...          'heed', 'Party', 'commands']>>> ref1b = ['It', 'is', 'the', 'guiding', 'principle', 'which',...          'guarantees', 'the', 'military', 'forces', 'always',...          'being', 'under', 'the', 'command', 'of', 'the', 'Party']>>> ref1c = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',...          'army', 'always', 'to', 'heed', 'the', 'directions',...          'of', 'the', 'party']>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was',...         'interested', 'in', 'world', 'history']>>> ref2a = ['he', 'was', 'interested', 'in', 'world', 'history',...          'because', 'he', 'read', 'the', 'book']>>> list_of_references = [[ref1a, ref1b, ref1c], [ref2a]]>>> hypotheses = [hyp1, hyp2]>>> corpus_bleu(list_of_references, hypotheses) # doctest: +ELLIPSIS0.5920...The example below show that corpus_bleu() is different from averagingsentence_bleu() for hypotheses>>> score1 = sentence_bleu([ref1a, ref1b, ref1c], hyp1)>>> score2 = sentence_bleu([ref2a], hyp2)>>> (score1 + score2) / 2 # doctest: +ELLIPSIS0.6223...:param list_of_references: a corpus of lists of reference sentences, w.r.t. hypotheses:type list_of_references: list(list(list(str))):param hypotheses: a list of hypothesis sentences:type hypotheses: list(list(str)):param weights: weights for unigrams, bigrams, trigrams and so on:type weights: list(float):param smoothing_function::type smoothing_function: SmoothingFunction:param auto_reweigh: Option to re-normalize the weights uniformly.:type auto_reweigh: bool:return: The corpus-level BLEU score.:rtype: float"""# Before proceeding to compute BLEU, perform sanity checks.p_numerators = Counter()  # Key = ngram order, and value = no. of ngram matches.p_denominators = Counter()  # Key = ngram order, and value = no. of ngram in ref.hyp_lengths, ref_lengths = 0, 0assert len(list_of_references) == len(hypotheses), ("The number of hypotheses and their reference(s) should be the " "same ")# Iterate through each hypothesis and their corresponding references.for references, hypothesis in zip(list_of_references, hypotheses):# For each order of ngram, calculate the numerator and# denominator for the corpus-level modified precision.for i, _ in enumerate(weights, start=1):p_i = modified_precision(references, hypothesis, i)p_numerators[i] += p_i.numeratorp_denominators[i] += p_i.denominator# Calculate the hypothesis length and the closest reference length.# Adds them to the corpus-level hypothesis and reference counts.hyp_len = len(hypothesis)hyp_lengths += hyp_lenref_lengths += closest_ref_length(references, hyp_len)# Calculate corpus-level brevity penalty.bp = brevity_penalty(ref_lengths, hyp_lengths)# Uniformly re-weighting based on maximum hypothesis lengths if largest# order of n-grams < 4 and weights is set at default.if auto_reweigh:if hyp_lengths < 4 and weights == (0.25, 0.25, 0.25, 0.25):weights = (1 / hyp_lengths,) * hyp_lengths# Collects the various precision values for the different ngram orders.p_n = [Fraction(p_numerators[i], p_denominators[i], _normalize=False)for i, _ in enumerate(weights, start=1)]# Returns 0 if there's no matching n-grams# We only need to check for p_numerators[1] == 0, since if there's# no unigrams, there won't be any higher order ngrams.if p_numerators[1] == 0:return 0# If there's no smoothing, set use method0 from SmoothinFunction class.if not smoothing_function:smoothing_function = SmoothingFunction().method0# Smoothen the modified precision.# Note: smoothing_function() may convert values into floats;#       it tries to retain the Fraction object as much as the#       smoothing method allows.p_n = smoothing_function(p_n, references=references, hypothesis=hypothesis, hyp_len=hyp_lengths)s = (w_i * math.log(p_i) for w_i, p_i in zip(weights, p_n))s = bp * math.exp(math.fsum(s))return s

show attend and tell 计算bleu分数（1到4）相关推荐

python分数运算_在Python中计算BLEU分数
5 个答案: 答案 0 :(得分:12) BLEU分数由两部分组成,修正精度和简洁惩罚. 详细信息可以在paper中看到. 您可以使用NLTK内的nltk.align.bleu_score模块. 一个 ...
python中计算BLEU分数
BLEU,全称为Bilingual Evaluation Understudy(双语评估替换),是一个比较候选文本翻译与其他一个或多个参考翻译的评价分数. 尽管BLEU一开始是为翻译工作而开发,但它也 ...
python计算现场得分_浅谈用 Python 计算文本 BLEU 分数
浅谈用 Python 计算文本 BLEU 分数 BLEU, 全称为 Bilingual Evaluation Understudy(双语评估替换), 是一个比较候选文本翻译与其他一个或多个参考翻译的评 ...
浅谈用Python计算文本BLEU分数
在本教程中,你探索了BLEU评分,根据在机器翻译和其他语言生成任务中的参考文本对候选文本进行评估和评分. 具体来说,你学到了: BLEU评分的简单入门介绍,并直观地感受到到底是什么正在被计算. 如何使 ...
Python计算文本BLEU分数
BLEU的全名为:bilingual evaluation understudy,即:双语互译质量评估辅助工具(双语替换测评).它是用来评估机器翻译质量的工具.BLEU的设计思想:机器翻译结果越接近专 ...
机器学习笔记 - 什么是BLEU分数?
1.什么是BLEU分数? BLEU(BiLingual Evaluation Understudy)或双语评估研究是一种基于分数的方法,用于评估由自然语言处理 (NLP) 系统执行的翻译工作的质量. ...
Python 计算总分数和平均分 - Python零基础入门教程
目录一.Python 计算总分数和平均分源码二.猜你喜欢零基础 Python 学习路线推荐 : Python 学习目录 >> Python 基础入门一.Python 计算总分数和平 ...
linux 计算标准差,Azure Linux VM 的计算基准测试分数 - Azure Virtual Machines | Microsoft Docs...
您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn. Linux VM 的计 ...
计算学生分数的最大值，最小值和总分
package shi;import java.util.Scanner; public class shi{public static void main(String[] args) {// TO ...

show attend and tell 计算bleu分数（1到4）

Calculate BLEU scores

corpus_bleu源码（有兴趣可以全看，没有的话只看参数和注释即可）：

show attend and tell 计算bleu分数（1到4）相关推荐

最新文章

热门文章