整理摘自 https://datascience.stackexchange.com/questions/15989/micro-average-vs-macro-average-performance-in-a-multiclass-classification-settin/16001

Micro- and macro-averages (for whatever metric) will compute slightly different things, and thus their interpretation differs. A macro-average will compute the metric independently for each class and then take the average (hence treating all classes equally), whereas a micro-average will aggregate the contributions of all classes to compute the average metric. In a multi-class classification setup, micro-average is preferable if you suspect there might be class imbalance (i.e you may have many more examples of one class than of other classes).

To illustrate why, take for example precision Pr=TP / (TP+FP). Let's imagine you have a One-vs-All(there is only one correct class output per example) multi-class classification system with four classes and the following numbers when tested:

  • Class A: 1 TP and 1 FP
  • Class B: 10 TP and 90 FP
  • Class C: 1 TP and 1 FP
  • Class D: 1 TP and 1 FP

You can see easily that PrA=PrC=PrD=0.5 , whereas PrB=0.1.

  • A macro-average will then compute: Pr=0.5+0.1+0.5+0.54=0.4
  • A micro-average will compute: Pr=1+10+1+12+100+2+2=0.123

宏查准率:这些类别中是否有尽可能多的类别的查准率尽可能高。-- 侧重各个类别是否预测准确

微查准率:这多组实验中,预测准确的数据占总的预测数据的比例。-- 侧重预测准确的数据的比例

These are quite different values for precision. Intuitively, in the macro-average the "good" precision (0.5) of classes A, C and D is contributing to maintain a "decent" overall precision (0.4). While this is technically true (across classes, the average precision is 0.4), it is a bit misleading, since a large number of examples are not properly classified. These examples predominantly correspond to class B, so they only contribute 1/4 towards the average in spite of constituting 94.3% of your test data. The micro-average will adequately capture this class imbalance, and bring the overall precision average down to 0.123 (more in line with the precision of the dominating class B (0.1)).

当class-imblance已知,但仍要采用macro-average时,需要采取的措施:

1. 报告macro-average + standard deviation(标准差) (对于>=3的多分类任务)

2. 加权macro-average  (考虑样本数的影响)

For computational reasons, it may sometimes be more convenient to compute class averages and then macro-average them. If class imbalance is known to be an issue, there are several ways around it. One is to report not only the macro-average, but also its standard deviation (for 3 or more classes). Another is to compute a weighted macro-average, in which each class contribution to the average is weighted by the relative number of examples available for it. In the above scenario, we obtain:

1. Prmacro−mean=0.25·0.5+0.25·0.1+0.25·0.5+0.25·0.5=0.4

Prmacro−stdev=0.173

2. Prmacro−weighted= 2/106 * 0.5 + 100 / 106 * 0.1 + 2 / 106 * 0.5 + 2 / 106 * 0.5

= 0.0189·0.5+0.943·0.1+0.0189·0.5+0.0189·0.5=0.009+0.094+0.009+0.009=0.123

The large standard deviation (0.173) already tells us that the 0.4 average does not stem from a uniform precision among classes, but it might be just easier to compute the weighted macro-average, which in essence is another way of computing the micro-average.

转载于:https://www.cnblogs.com/shiyublog/p/9798870.html

Micro Average vs Macro average Performance in a Multiclass classification setting相关推荐

  1. 模型评估指标micro avg、macro avg和weighted avg的计算方式及区别

    模型评估指标micro avg.macro avg和weighted avg的计算方式及区别-技术圈

  2. [论文翻译] Class-incremental learning: survey and performance evaluation on image classification

    论文地址:https://arxiv.org/abs/2010.15277 代码:https://github.com/mmasana/FACIL 发表于:arXiv Oct 2020 省略了图.表的 ...

  3. 宏平均macro average

    [机器学习]多类分类性能评价之宏平均(macro-average)与微平均(micro-average)_qq280929090的专栏-CSDN博客

  4. 目标检测等相关评价指标(AP AR Average Precision和Average Recall)

    https://cocodataset.org/#detection-eval COCO 提供了 12 种用于衡量目标检测器性能的评价指标. [1] - 除非特别说明,AP 和 AR 一般是在多个 I ...

  5. 分类模型评估体系:混淆矩阵、PR曲线、F1、Weighted F1、Micro F1、Macro F1、ROCAUC、KS曲线、Lift曲线、GAIN曲线

    目录 混淆矩阵 一类错误+二类错误 查准率(precision)和查全率(recall) PR曲线 调和参数F1

  6. Performance Metrics for Binary Classification

    查看全文 http://www.taodudu.cc/news/show-262150.html 相关文章: word文档中添加mathtype加载项 创作链接 2020年十大机器学习框架 np.c_ ...

  7. 二分类最优阈值确定_分类问题的评估指标一览

    前言 最近分类问题搞的有点多,但对一些指标依旧有模糊的地方(虽然做了笔记), 事实证明, 笔记笔记,没有进到脑子里呀. 因此,我想着肯定有跟我一样半生半熟的小伙伴在分类指标这块依旧有迷惑,毕竟常用的几 ...

  8. MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

    MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions ...

  9. 【ValueError: Target is multiclass but average=‘binary‘. Please choose another average setting, one 】

    完整报错为:ValueError: Target is multiclass but average='binary'. Please choose another average setting, ...

最新文章

  1. 2014年中国互联网的50大预测
  2. 【项目管理】工件清单说明
  3. Transformer Family
  4. C++11 POD类型
  5. 360 屏蔽ajax,怎么在easy ui做全局Ajax拦截啊?
  6. 【kafka】kafka 如何查看 内部 磁盘 网络 是否繁忙
  7. 从skyeye学习arm( linux篇)
  8. git merge后如何撤销
  9. php104.tmp,首页-kaka窝论坛-综合管理-细说PHP-kaka窝 - Powered by Discuz!
  10. Linux 任务控制的几个技巧( , [ctrl]-z, jobs, fg, bg, kill)
  11. 如何卸载office201032位_office2010卸载不了应该如何通过清理注册表解决方法?
  12. OpenGLGLUT入门学习
  13. c语言学习--强制类型转换
  14. 无线网卡驱动突然坏了怎么办
  15. 【C语言练习4】根据公式计算π的值,π=4-4/3+4/5-4/7+4/9+... 打印出一个表格来显示,用公式中的1项、2项、3项...计算出来π的近似值
  16. 2021-2027全球及中国翻译软件行业研究及十四五规划分析报告
  17. info1110辅导quiz1
  18. Google Earth Engine (GEE) ——全球海岸线数全球海岸线数据集30米分辨率
  19. 抖音直播带货选品怎么做?怎么排品组货才能提升商品转化?
  20. 【BAT-表姐御用04ren进阶命令】文件数字顺序命名及改名(含命令解释)

热门文章

  1. python 结构体数组_python实现结构体数组(初始化并赋值)
  2. python打包软件后报错 :SyntaxError: Non-UTF-8 code starting with ‘\x90‘ in file 的原因及解决方法
  3. 计算机网络课程设计之网络聊天程序的设计与实现
  4. STM32的学习记录--1.准备工作
  5. [4] ADB 应用管理
  6. clocks_per_sec 时间不正确_测血糖的正确做法:这4步一定别搞错了
  7. sql 整改措施 注入_SQL注入入侵防范措施
  8. Qt文档阅读笔记-Q_GADGET官方解析及实例
  9. Qt笔记-添加Win10Pcap库获取网络适配器(MinGW编译器)
  10. python dataframe 取每行的最大值,在python数据框中的每一行中查找最大值