Calculate BLEU scores

参考:How to do calculate all bleu scores during evaluvaion #37
主要是参数的改变,默认计算的是BLEU4的分数,从源码中也可以看出来

# Calculate BLEU-4 scores
bleu4 = corpus_bleu(references, hypotheses)
# weights = (1.0 / 1.0,)
bleu1 = corpus_bleu(references, hypotheses, (1.0 / 1.0,))
bleu2 = corpus_bleu(references, hypotheses, (1.0/2.0, 1.0/2.0,))
bleu3 = corpus_bleu(references, hypotheses, (1.0/3.0, 1.0/3.0, 1.0/3.0,))
bleu1_4 = 'bleu1:' + str(bleu1) + '\n' + 'bleu2:' + str(bleu2) + '\n' + 'bleu3:' + str(bleu3) + '\n' + 'bleu4:' + str(bleu4)
print(bleu1_4)

改变bleu4中weights=(0.25, 0.25, 0.25, 0.25)的值即可以得到相应的值

corpus_bleu源码(有兴趣可以全看,没有的话只看参数和注释即可):

def corpus_bleu(list_of_references,hypotheses,weights=(0.25, 0.25, 0.25, 0.25),smoothing_function=None,auto_reweigh=False,
):"""Calculate a single corpus-level BLEU score (aka. system-level BLEU) for allthe hypotheses and their respective references.Instead of averaging the sentence level BLEU scores (i.e. macro-averageprecision), the original BLEU metric (Papineni et al. 2002) accounts forthe micro-average precision (i.e. summing the numerators and denominatorsfor each hypothesis-reference(s) pairs before the division).>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',...         'ensures', 'that', 'the', 'military', 'always',...         'obeys', 'the', 'commands', 'of', 'the', 'party']>>> ref1a = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',...          'ensures', 'that', 'the', 'military', 'will', 'forever',...          'heed', 'Party', 'commands']>>> ref1b = ['It', 'is', 'the', 'guiding', 'principle', 'which',...          'guarantees', 'the', 'military', 'forces', 'always',...          'being', 'under', 'the', 'command', 'of', 'the', 'Party']>>> ref1c = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',...          'army', 'always', 'to', 'heed', 'the', 'directions',...          'of', 'the', 'party']>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was',...         'interested', 'in', 'world', 'history']>>> ref2a = ['he', 'was', 'interested', 'in', 'world', 'history',...          'because', 'he', 'read', 'the', 'book']>>> list_of_references = [[ref1a, ref1b, ref1c], [ref2a]]>>> hypotheses = [hyp1, hyp2]>>> corpus_bleu(list_of_references, hypotheses) # doctest: +ELLIPSIS0.5920...The example below show that corpus_bleu() is different from averagingsentence_bleu() for hypotheses>>> score1 = sentence_bleu([ref1a, ref1b, ref1c], hyp1)>>> score2 = sentence_bleu([ref2a], hyp2)>>> (score1 + score2) / 2 # doctest: +ELLIPSIS0.6223...:param list_of_references: a corpus of lists of reference sentences, w.r.t. hypotheses:type list_of_references: list(list(list(str))):param hypotheses: a list of hypothesis sentences:type hypotheses: list(list(str)):param weights: weights for unigrams, bigrams, trigrams and so on:type weights: list(float):param smoothing_function::type smoothing_function: SmoothingFunction:param auto_reweigh: Option to re-normalize the weights uniformly.:type auto_reweigh: bool:return: The corpus-level BLEU score.:rtype: float"""# Before proceeding to compute BLEU, perform sanity checks.p_numerators = Counter()  # Key = ngram order, and value = no. of ngram matches.p_denominators = Counter()  # Key = ngram order, and value = no. of ngram in ref.hyp_lengths, ref_lengths = 0, 0assert len(list_of_references) == len(hypotheses), ("The number of hypotheses and their reference(s) should be the " "same ")# Iterate through each hypothesis and their corresponding references.for references, hypothesis in zip(list_of_references, hypotheses):# For each order of ngram, calculate the numerator and# denominator for the corpus-level modified precision.for i, _ in enumerate(weights, start=1):p_i = modified_precision(references, hypothesis, i)p_numerators[i] += p_i.numeratorp_denominators[i] += p_i.denominator# Calculate the hypothesis length and the closest reference length.# Adds them to the corpus-level hypothesis and reference counts.hyp_len = len(hypothesis)hyp_lengths += hyp_lenref_lengths += closest_ref_length(references, hyp_len)# Calculate corpus-level brevity penalty.bp = brevity_penalty(ref_lengths, hyp_lengths)# Uniformly re-weighting based on maximum hypothesis lengths if largest# order of n-grams < 4 and weights is set at default.if auto_reweigh:if hyp_lengths < 4 and weights == (0.25, 0.25, 0.25, 0.25):weights = (1 / hyp_lengths,) * hyp_lengths# Collects the various precision values for the different ngram orders.p_n = [Fraction(p_numerators[i], p_denominators[i], _normalize=False)for i, _ in enumerate(weights, start=1)]# Returns 0 if there's no matching n-grams# We only need to check for p_numerators[1] == 0, since if there's# no unigrams, there won't be any higher order ngrams.if p_numerators[1] == 0:return 0# If there's no smoothing, set use method0 from SmoothinFunction class.if not smoothing_function:smoothing_function = SmoothingFunction().method0# Smoothen the modified precision.# Note: smoothing_function() may convert values into floats;#       it tries to retain the Fraction object as much as the#       smoothing method allows.p_n = smoothing_function(p_n, references=references, hypothesis=hypothesis, hyp_len=hyp_lengths)s = (w_i * math.log(p_i) for w_i, p_i in zip(weights, p_n))s = bp * math.exp(math.fsum(s))return s

show attend and tell 计算bleu分数(1到4)相关推荐

  1. python分数运算_在Python中计算BLEU分数

    5 个答案: 答案 0 :(得分:12) BLEU分数由两部分组成,修正精度和简洁惩罚. 详细信息可以在paper中看到. 您可以使用NLTK内的nltk.align.bleu_score模块. 一个 ...

  2. python中计算BLEU分数

    BLEU,全称为Bilingual Evaluation Understudy(双语评估替换),是一个比较候选文本翻译与其他一个或多个参考翻译的评价分数. 尽管BLEU一开始是为翻译工作而开发,但它也 ...

  3. python计算现场得分_浅谈用 Python 计算文本 BLEU 分数

    浅谈用 Python 计算文本 BLEU 分数 BLEU, 全称为 Bilingual Evaluation Understudy(双语评估替换), 是一个比较候选文本翻译与其他一个或多个参考翻译的评 ...

  4. 浅谈用Python计算文本BLEU分数

    在本教程中,你探索了BLEU评分,根据在机器翻译和其他语言生成任务中的参考文本对候选文本进行评估和评分. 具体来说,你学到了: BLEU评分的简单入门介绍,并直观地感受到到底是什么正在被计算. 如何使 ...

  5. Python计算文本BLEU分数

    BLEU的全名为:bilingual evaluation understudy,即:双语互译质量评估辅助工具(双语替换测评).它是用来评估机器翻译质量的工具.BLEU的设计思想:机器翻译结果越接近专 ...

  6. 机器学习笔记 - 什么是BLEU分数?

    1.什么是BLEU分数? BLEU(BiLingual Evaluation Understudy)或双语评估研究是一种基于分数的方法,用于评估由自然语言处理 (NLP) 系统执行的翻译工作的质量. ...

  7. Python 计算总分数和平均分 - Python零基础入门教程

    目录 一.Python 计算总分数和平均分源码 二.猜你喜欢 零基础 Python 学习路线推荐 : Python 学习目录 >> Python 基础入门 一.Python 计算总分数和平 ...

  8. linux 计算标准差,Azure Linux VM 的计算基准测试分数 - Azure Virtual Machines | Microsoft Docs...

    您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn. Linux VM 的计 ...

  9. 计算学生分数的最大值,最小值和总分

    package shi;import java.util.Scanner; public class shi{public static void main(String[] args) {// TO ...

最新文章

  1. 关于程序员的那些事——一个五年程序员的总结
  2. 《Java 核心技术卷1 第10版》学习笔记------ 接口(Interface)
  3. iOS开发之登录注册系统
  4. 1024程序员节来啦!!.NET技术圈独家优惠劵,折后再折,赶紧来抢啊
  5. 一步步编写操作系统 11 实模式下程序分段的原因
  6. php项目实战流程_一个完整的php流程管理实例代码分享
  7. Docker学习总结(19)——Google开源的容器集群管理系统Kubernetes介绍
  8. LeetCode每周刷题(2019.6.24-2019.6.30)
  9. Effective Python 中文版
  10. 【IDEA类注释模板和方法注释模板】
  11. Linux复制文件到当前目录
  12. 新加坡基金会非盈利公司区块链ICO
  13. java文件 默认打开方式_修改文件的默认打开方式(亲测有效)
  14. 阿里云物联网平台-数据解析脚本详解
  15. 小米应用市场Aso,怎么做小米应用市场ASO?
  16. STM32智能门锁之调试步进电机
  17. IBM X系列服务器、刀片中心安装指南和用户手册、操作系统安装指南(中文版)汇总
  18. python爬取网易云音乐歌单_【python】爬取并批量下载网易云歌单,嗨翻暑假!
  19. 思岚科技激光雷达在室外使用效果如何?
  20. UDC分类号查询(转载)

热门文章

  1. P3834 【模板】可持久化线段树 1(主席树)
  2. Delphi应用程序的调试(二)使用断点
  3. [笔记]路由器与交换机的区别
  4. UVA 10976 Fractions Again?!【暴力枚举/注意推导下/分子分母分开保存】
  5. Remoting方面的转帖1
  6. odyssey react鉴定_Nike Odyssey React SHLD开箱测评 Nike Odyssey React SHLD实物欣赏
  7. linux路由修改密码,Linux中修改Mysql root用户密码的方法
  8. linux系统的数据库是本地吗,Linux下MySQL无法在本地以非root用户身份连接数据库...
  9. html图片postmultipart,sendmail-MIMEText-MIMEImage-MIMEMultipart.py——发送带图片的HTML格式报表...
  10. 8 MyBatis动态SQL