标记偏见

Back in 2013, a man by the name of Eric Loomis was arrested in Wisconsin. Loomis was driving a car that had been used in a shooting and pled guilty to eluding an officer. It’s a case that should be fairly unremarkable but at sentencing, the judge sentenced Loomis to six years in prison based, in part, on the recommendation of a machine learning algorithm called COMPAS (Correctional Offender Management Profiling for Alternative Sanction). When his lawyers asked to examine the algorithm that effectively sentenced their client to jail, they were rebuffed. In other words, COMPAS marked Loomis as a “high risk” offender but his lawyers had no way of understanding why or challenging the model itself.

早在2013年,一个名叫Eric Loomis的男子在威斯康星州被捕。 鲁米斯(Loomis)驾驶的是一辆用于射击的汽车,并认罪,以逃避一名军官。 在这种情况下, 法官应该判处Loomis六年有期徒刑 ,部分原因是基于机器学习算法COMPAS(针对替代制裁的更正性犯罪管理剖析)的建议, 判处卢米斯有期徒刑6年 。 当他的律师要求检查有效判刑其委托人入狱的算法时,他们被拒绝了。 换句话说,COMPAS将Loomis标记为“高风险”罪犯,但他的律师无法理解模型本身的原因或挑战。

The Supreme Court had a chance to rule on this case in 2017 but demurred. Since then, researchers from Dartmouth have studied the algorithm and shown that COMPAS is “no better at predicting an individual’s risk of recidivism than random volunteers recruited from the internet.” Past that, COMPAS uses a supposed 137 criteria when determining risk for a defendant and those same researchers found equal success using only two: age and prior convictions. And not only that, but COMPAS was found to display racial bias: it sentenced black defendants over-aggressively while being overly lenient with white ones.

最高法院有机会在2017年对此案做出裁决,但遭到了异议。 从那以后, 达特茅斯大学的研究人员研究了该算法 ,结果表明,COMPAS“比从互联网上招募的随机志愿者更能预测一个人的再犯风险”。 除此之外,COMPAS在确定被告的风险时使用了假定的137条标准,而那些相同的研究人员仅使用年龄和先前的定罪两个条件就获得了相同的成功。 不仅如此 ,COMPAS还表现出种族偏见:它对黑人被告人过于激进地判刑,而对白人则过于宽容。

This is the point where Supreme Court Chief Justice Earl Warren would’ve asked “yes, but is that fair?” Should we sentence criminals with a model they can’t interrogate? Is a commercially sold algorithm with provable bias the right way to determine how long a defendant should be behind bars? Is COMPAS fair?

在这一点上,最高法院首席大法官厄尔·沃伦(Earl Warren)会问“ 是的,但这公平吗?” 我们应该以一种无法审问的模式来判罪犯吗? 商业上出售的具有可证明偏差的算法是确定被告应被拘留多长时间的正确方法吗? 是COMPAS公平的吗?

The answer is no. That’s easy. But the answer to “how do we fix this?” is a lot more complex.

答案是不。 这很简单。 但是答案是“我们如何解决这个问题?” 比较复杂。

As an industry, how we create, train, source, and build our machine learning models is incredibly important. We’ll be covering opportunity in AI and access to AI in our next few pieces. But ethically built machine learning models aren’t worth much if the impact they have on society is definitely bad.

作为一个行业,我们如何创建,训练,获取和构建机器学习模型非常重要。 在接下来的几篇文章中,我们将介绍AI的机会和使用AI的机会。 但是,如果道德上建立的机器学习模型对社会的影响绝对不好,那么它们就不值钱。

Now, the reason we led this piece with the story of Eric Loomis is simple: courts and legislators don’t currently have the expertise needed to adjudicate these issues. Anyone who’s watched a CEO from a big tech company testify in front of Congress likely understands that. The responsibility is ours. We need to take ownership of the impact of the technologies we’re building. And like every other facet of responsible AI, that means thinking ethically and morally in addition to financially.

现在,我们以埃里克·鲁米斯(Eric Loomis)的故事作为首篇文章的原因很简单:法院和立法者目前不具备裁定这些问题所需的专业知识。 看到大型科技公司的首席执行官在国会面前作证的任何人都可能理解这一点。 这是我们的责任 我们需要掌握我们正在构建的技术的影响。 像负责任的AI的所有其他方面一样,这意味着除了在财务上还要进行道德和道德思考。

对抗人工智能中的偏见 (Fighting bias in AI)

The story of COMPAS is not an outlier. We started our series talking about Google’s image recognition problem wherein pictures of Black engineers were mislabeled as gorillas. But there are many more. Their search engine also exposed far fewer ads for high-paying executive jobs to women. Online loan applications suffer from data that uses redlining practices that date back to the 30s and results in less Black and Hispanic loans being approved when those borrowers should be eligible.

COMPAS的故事绝非偶然。 我们开始讨论Google的图像识别问题,其中黑人工程师的照片被误贴为大猩猩。 但是还有更多。 他们的搜索引擎还向女性展示了用于高薪行政职位的广告 。 在线贷款申请的数据使用可追溯到30年代的重新整理做法,导致这些借款人应有资格获得批准的黑人和西班牙裔贷款减少。

These aren’t toy problems. It’s not mislabeled sentiment or some intern’s AI project gone awry. These biases affect our neighbor’s’ ability to get a mortgage or get into the college of their dreams or land a job they’re qualified for. And all of these models were put into production by real companies who frankly should have known better.

这些不是玩具问题。 这并不是贴错标签的情绪,也不是实习生的AI项目出了问题。 这些偏见影响了我们邻居获得抵押贷款或进入他们梦想中的大学或找到他们有资格的工作的能力。 坦率地说,所有这些模型都是由真正的公司投入生产的,这些公司应该更了解。

There are of course ways to fix these unfairness problems. Training your model with new information that combats bias — -like training your image rec algorithm on darker complected people — -can help. Removing biasing data like historical information from an era of redlining can help. And remembering that representativeness does not equal usefulness is also important: a model may need extra information about certain classes because they are difficult or nuanced examples. Lastly — and this one is vital — -you should attack bias in the data collection process as well. It’s much easier to fix the cause of your bias problem (the data) than the consequence (the models).

当然,有解决这些不公平问题的方法。 使用可消除偏见的新信息来训练模型(例如,在较黑暗的受屈者身上训练图像记录算法)会有所帮助。 从重新整理的时代中删除历史数据等有偏见的数据可能会有所帮助。 并且记住代表性不等于有用性也很重要 :一个模型可能需要有关某些类的额外信息,因为它们是困难或细微的例子。 最后,这是至关重要的,您也应该在数据收集过程中消除偏见。 解决偏差问题的原因(数据)比后果(模型)容易得多。

Past that, all AI applications should have some explainability built in. We know some might balk at this and claim it’s an unnecessary burden but if your model is producing bias and you can’t say why, why would you release that model in the first place?

除此之外,所有AI应用程序都应该内置一些可 解释性 。我们知道有些应用程序可能对此感到厌烦,并声称这是不必要的负担,但是如果您的模型产生偏差并且您不能说出原因 ,为什么要在第一个版本中发布该模型地点?

Because again: this isn’t up to legislators. They’re behind on this and may not catch up in time. It’s about the industry taking ownership and responsibility for our technology. Do you want to release a model with innate biases into the world? Do you want what you create to make other people’s lives worse?

再一次因为:这不取决于立法者。 他们落后于此,可能无法及时赶上。 这是关于行业对我们的技术拥有所有权和责任。 您想发布一个天生具有偏见的模型吗? 您是否想要创造使别人的生活更糟的东西?

That’s hopefully another one of those easy answers.

希望这是这些简单答案中的另一个。

可持续AI的需求 (The Need for Sustainable AI)

You don’t need a hundred trend pieces to understand AI is more widespread now than it was just a decade ago. The reason? We emerged from the most recent AI Winter largely because of the massive increase in both compute power and available data. Put simply: there was more stuff to train models on and the machines that trained them got fast enough to make it worth the cost.

您不需要一百个趋势片就可以了解AI比十年前更加普及。 原因? 我们从最近的AI Winter中脱颖而出的主要原因是计算能力和可用数据的大量增加 。 简而言之:有更多的东西可以训练模型,而训练它们的机器又足够快,值得付出成本。

There’s a real cost to that explosion of data and compute power though. And we don’t just mean a monetary cost. We also mean an environmental one.

但是,数据和计算能力的激增确实要付出代价。 而且,我们不仅意味着金钱成本。 我们也意味着环境。

Last year, a study from Emma Strubell out of UMass found that a single deep learning model can generate over 600,000 pounds of carbon dioxide (to put that in perspective, the average American generates 36,000 pounds of carbon dioxide in an entire year). Now consider that best-in-class models in 2018 required 300,000 times the compute resources as they did in 2012. If that trend continues, our industry will be meaningfully contributing to our climate change crisis.

去年,来自UMass的Emma Strubell进行的一项研究发现,单一的深度学习模型可以产生超过600,000磅的二氧化碳(从总体上看,美国人一年平均可以产生36,000磅的二氧化碳)。 现在考虑一下,与2012 年相比,2018年的一流模型需要300,000倍的计算资源 。如果这种趋势持续下去,我们的行业将对我们的气候变化危机做出有意义的贡献。

So how do we fix this? We can start by addressing some of our worst habits.

那么我们该如何解决呢? 我们可以从解决一些最坏的习惯开始。

Right now, we’re using too much data to train our models. There’s this pervasive idea that more data is always better when, in fact, data has varying levels of utility. Some of it is helpful to your models, some is useless, and some is actively harmful. Understanding what data will truly make your model better and training on that instead of just training on everything? Not only does that reduce your carbon footprint but it can make your models better and more accurate.

目前,我们正在使用太多数据来训练我们的模型。 有一个普遍的想法,那就是,事实上,当数据具有不同的实用级别时,更多的数据总是更好的。 其中有些对您的模型有帮助,有些无用,有些对人体有害。 了解哪些数据将真正使您的模型更好,并对其进行培训而不只是对所有方面进行培训? 这不仅可以减少碳足迹,而且可以使您的模型更好 ,更准确。

Past that, there’s an issue with re-training. Some practitioners choose to retrain their models from scratch, both during training and when they’re in production, over and over again. This has real sustainability costs and, additionally, squanders some of what the model learned in prior iterations. It’s also worth asking how often your model requires retraining. For certain use cases like eCommerce recommendation algorithms or cybersecurity applications, that might be necessary. For others, it’s a massive use of resources with vanishing utility.

除此之外,再培训还有一个问题。 一些从业者选择在培训期间和生产中从头开始重新训练他们的模型。 这具有真正的可持续性成本,此外,浪费了该模型在先前迭代中学到的一些东西。 这也是值得一问你的模型需要多长时间再培训。 对于某些用例,例如电子商务推荐算法或网络安全应用程序,可能是必需的。 对于其他人来说,这是资源的大量使用,而实用性却逐渐消失。

It’s more than a little ironic that AI promises efficiency but we’re training and building AIs inefficiently. That inefficiency has monetary costs for the organizations we work in but also environmental ones. Responsible AI requires us to look honestly at what our habits are and how we can invest in technologies and partnerships that reduce our compute costs, our training times, and our environmental impact.

人工智能可以保证效率,这有点讽刺,但是我们正在以低效率的方式训练和构建人工智能。 这种低效率不仅对我们工作的组织造成金钱损失,而且对环境组织也造成金钱损失。 负责任的AI要求我们诚实地看待自己的习惯,以及如何投资于技术和合作伙伴关系,以减少计算成本,培训时间和环境影响。

只是做更多的好 (Just Do More Good)

The last point we want to make about impact is a simple one: we as machine learning practitioners need to take it upon ourselves to do more good.

关于影响力,我们要说的最后一点很简单:作为机器学习从业者,我们需要承担起自己的责任,以做更多的事情。

There’s no shortage of AI projects that can help our fellow citizens. Just piggybacking on the last section, we already have researchers doing great work to help fight deforestation, to forecast cyclones, to monitor endangered species, and so, so much more. But AI can help with more than climate change. We know that it’s a great technology for all kinds of medical diagnoses but partnerships, like the one Facebook had with Red Cross to help mapping and natural disaster response, are really promising. After all, nobody expects Facebook to understand the nuances of these efforts like the Red Cross but nobody expects the Red Cross to have hundreds of machine learning scientists on staff.

不乏可以帮助我们同胞的AI项目。 只是在最后一节中piggy带,我们已经有研究人员在帮助抗击毁林 , 预测旋风 , 监测濒危物种等方面做了大量工作。 但是,人工智能不仅可以帮助解决气候变化问题。 我们知道这是一种适用于各种医学诊断的出色技术,但是像Facebook与红十字会建立的一项有助于制图和自然灾害应对的合作伙伴关系确实很有希望。 毕竟,没有人期望Facebook能像红十字会那样理解这些努力的细微差别,但没有人期望红十字会拥有数百名机器学习科学家。

And we understand: not everyone has the budget or the ability to donate their time to help NGOs and charitable organizations. That said: if you can, do so. At the very least, discounting your business’s price for non-profits and others aiming to solve intractable, real-world problems is an ethical and simple solution we can all pledge to do.

我们了解:并非每个人都有预算或有能力捐赠时间来帮助非政府组织和慈善组织。 也就是说:如果可以,请这样做。 至少,为非营利组织和其他旨在解决棘手的现实问题的企业打折,这是我们都可以保证的符合道德和简单的解决方案。

Lastly, one of the best parts of the machine learning community is how much code is open sourced, how much data is available to each of us, and how many practitioners do ML as both a job and a hobby. A few hours here and there can make a big difference to making the world a better place and making AI a force for good, not just a force for profit.

最后,机器学习社区中最好的部分之一是开源的代码量,我们每个人可用的数据量以及有多少从业人员将ML作为工作和嗜好进行。 在这里和那里呆上几个小时,对于使世界变得更美好,使AI成为造福力量,而不仅仅是盈利的力量而言,具有很大的不同。

阅读整个系列: (Read the whole series:)

  • Part 1: How we got responsible AI all wrong

    第1部分: 我们如何使负责任的AI完全错误

  • Part 3: Increasing accessibility to AI

    第3部分: 增加AI的可访问性

  • Part 4: How we can create more opportunities in AI

    第4部分: 我们如何在AI中创造更多机会

翻译自: https://medium.com/alectio/impact-bias-and-sustainability-in-ai-b23d378b2561

标记偏见


http://www.taodudu.cc/news/show-1873874.html

相关文章:

  • gpt2 代码自动补全_如果您认为GPT-3使编码器过时,则您可能不编写代码
  • 机器学习 深度学习 ai_什么是AI? 从机器学习到决策自动化
  • 艺术与机器人
  • 中国ai人工智能发展太快_中国的AI:开放采购和幕后玩家
  • 让我们手动计算:深入研究Logistic回归
  • vcenter接管_人工智能接管广告创意
  • 人工智能ai算法_当AI算法脱轨时
  • 人工智能 企业变革_我们如何利用(人工)情报变革医院的运营管理
  • ai 道德_AI如何提升呼叫中心的道德水平?
  • 张北草原和锡林郭勒草原区别_草原:比您不知道的恶魔还强
  • keras pytorch_使用PyTorch重新创建Keras功能API
  • 人工智能ai应用高管指南_解决AI中的种族偏见:好奇心指南
  • 人工智能ai以算法为基础_IT团队如何为AI项目奠定正确的基础
  • ai人工智能_AI偏见如何发生?
  • unityui计分_铅计分成长
  • ml工程师_ML工程师正在失业。 仍然学习ML
  • ai智能和大数据测试_测试版可帮助您根据自己的条件创建数据和AI平台
  • ai人工智能_毕竟人工智能可能不适合您
  • gpt-2 文章自动生成_有助于您理解GPT-3的文章
  • 科技公司亚马逊名字由来_名字叫什么? 为什么亚马逊的“认可”是可爱营销的灾难性尝试
  • 无人驾驶 ai算法_AI机器学习具有碳足迹,因此无人驾驶汽车也是如此
  • 讲个故事,曾祖父
  • ai审计_用于内部审计和风险管理的人工智能
  • 自动化编程 ai_人工智能,自动化和音乐
  • 机器学习--线性回归1_线性回归-进入迷人世界的第一步
  • 神经网络 神经元_神经去耦
  • ai人工智能将替代人类_人类可以信任AI吗?
  • ai人工智能可以干什么_人工智能可以解决我的业务问题吗?
  • 如何识别媒体偏见_面部识别软件:宝贵资产,还是社会偏见的体现?
  • snorkel_Snorkel AI:标记培训数据的程序化方法

标记偏见_人工智能的影响,偏见和可持续性相关推荐

  1. 人工智能也存在偏见?探究人工智能偏见的识别和管理

    摘译 | 李朦朦/赛博研究院实习研究员 来源 | NIST 2022年3月16日,美国国家标准与技术研究院(NIST)发布了<迈向识别和管理人工智能偏见的标准>(以下简称<标准> ...

  2. 什么是认知偏见_偏见

    什么是认知偏见 The term Artificial Intelligence has been in use since 1955. AI pioneer, John McCarthy descr ...

  3. 人工智能的影响调查_调查报告|文科大学生群体对于人工智能影响 就业的认知程度:基于访谈的质性研究...

    摘 要:人工智能技术的迅速发展给各个行业带来的不同程度的替代效应和收入效应已经受到关注.本文基于人工智能在短期到中期将冲击文科类专业就业结构的背景,以文科类大学生作为目标群体,通过12次基于访谈的质性 ...

  4. ai驱动数据安全治理_人工智能驱动的Microsoft工具简介

    ai驱动数据安全治理 介绍 (Introduction) Microsoft is nowadays one of the major providers for AI powered cloud s ...

  5. 神码ai人工智能写作机器人_人工智能和机器学习可以改善营销的6种方式

    神码ai人工智能写作机器人 Six months ago, bustling cities with flourishing businesses and communities across the ...

  6. ai人工智能的本质和未来_人工智能的未来在于模型压缩

    ai人工智能的本质和未来 The future looks towards running deep learning algorithms on more compact devices as an ...

  7. 大数据对社交媒体的影响_数据如何影响媒体,广告和娱乐职业

    大数据对社交媒体的影响 In advance of our upcoming event - Data Science Salon: Applying AI and ML to Media, Adve ...

  8. 蚂蚁金服井贤栋:区块链和人工智能是影响未来的关键技术

    金融科技如何帮到每一个普通的小微经营者?最新的技术如何应用于即将到来的双十一? 11月3日,在第二届钱塘江论坛上,蚂蚁金服董事长兼CEO井贤栋在"金融科技助力高质量发展"的演讲中一 ...

  9. 简述人工智能的发展历程图_人工智能的发展进程及现状

    龙源期刊网 http://www.qikan.com.cn 人工智能的发展进程及现状 作者:刘梦杰 来源:<科技创新与应用> 2020 年第 12 期 摘 ; 要:随着科学技术的进步与发展 ...

  10. 视频库:人工智能开发_人工智能工程师_AI人工智能

    人工智能Python语言入手→→机器学习核心技术→→深度学习核心技术→→NLP自然语言处理技术→→CV计算机视觉技术 视频库:人工智能开发_人工智能工程师_AI人工智能 01-人工智能开发入门 掌握P ...

最新文章

  1. 每日一皮:一个名字打败对手的经典案例...
  2. 轻松智能的数据中心冷却节能方法
  3. 关于NOMINMAX这个预处理宏
  4. 【MFC系列-第11天】CWinApp类成员分析
  5. 纯java分布式内存数据库_最新Java岗面试清单:分布式+Dubbo+线程+Redis+数据库+JVM+并发...
  6. Java LocalDate类| ofYearDay()方法与示例
  7. 终于发现为什么SQL没有释放句柄,原来是保存句柄的变量被覆盖了,丢失了原来的句柄...
  8. 把十六进制字符转换成十进制数
  9. python双线性插值函数_双线性插值法原理 python实现
  10. 大白话Pyramid Vision Transformer
  11. rtt面向对象oopc——3.对官方IO设备模型框架图的补充绘图
  12. PHP通过字符串调用函数
  13. 从产品角度谈如何搞定主动用户与被动用户
  14. 【本人秃顶程序员】SpringMVC工作原理详解
  15. java 修饰词_Java线程和Java修饰词
  16. 谷歌 android 新系统下载安装,google play服务框架下载安装
  17. 计算机office安装错误代码,office安装过程中错误提示1402解决方法
  18. WIN10 WIFI热点 手机连接显示无网络连接问题解决
  19. html里hr标签,HTML hr 标签
  20. 怎么让Html的高度自适应屏幕高度

热门文章

  1. multiple build commands for output file
  2. BZOJ 2724: [Violet 6]蒲公英( 分块 )
  3. 解决@media screen (自适应)IE浏览器不兼容问题
  4. Markdown编辑器简单总结
  5. 某人分析的70后,80后,90后
  6. 每日一句20191126
  7. 190616每日一句
  8. 181113每日一句
  9. 181028每日一句
  10. paip.c++读写ini文件.