标记偏见

No one wants bias in their organization. Underrepresentation has plagued the business world for years and many fear bias is making its way into the artificial intelligence industry. While AI and machine learning are very technical, scientific subjects, they can succumb to human error. Collecting data-while immensely important-is an evolving practice and room for bias is still very much a part of the data prediction process.

没有人希望他们的组织有偏见。 代表性不足已经困扰着商业世界多年,许多人担心偏见正在进入人工智能行业。 虽然AI和机器学习是非常技术性,科学性的学科,但它们可能会屈服于人为错误。 尽管数据非常重要,但收集数据是一种不断发展的实践,存在偏差的空间仍然是数据预测过程的重要组成部分。

偏向于机器学习的领域 (Areas in Machine Learning Vulnerable to Bias)

Training Prediction Models

训练预测模型

Input and output. Both are areas that need governance, but when building a model you should clearly be able to explain what the model does and the expected outcome of the model. It’s important to remember the developers of the prediction model have their own backgrounds and blind spots when it comes to writing algorithms.

输入和输出。 两者都是需要治理的领域,但是在构建模型时,您应该清楚地能够解释模型的作用以及模型的预期结果。 重要的是要记住,预测模型的开发人员在编写算法时要有自己的背景和盲点。

Lack of Representation in Datasets

数据集中缺乏代表性

Author Tom Taulli wrote for Forbes, “Additionally, studies have shown that algorithms trained on historically biased data have significant error rates for communities of color especially in over-predicting the likelihood of a convicted criminal to re-offend which can have serious implications for the justice system.” This can translate to biased algorithms making the wrong decisions for your business.

作者汤姆·陶利(Tom Taulli)为《福布斯 》撰文说:“此外,研究表明,根据历史偏见数据训练的算法对于有色人种群体有明显的错误率,尤其是在过度预测定罪的罪犯再次犯罪的可能性上,这可能会对犯罪分子产生严重的影响。司法系统。” 这可能会转化为偏向算法的业务决策。

Not only is representation in your data subjects important, but also those analyzing and managing datasets should be diverse. According to a recent research report from NYU, “women comprise only 10% of AI research staff at Google and only 2.5% of Google’s workforce is black. This lack of representation is what leads to biased datasets and ultimately algorithms that are much more likely to perpetuate systemic biases.”

不仅数据主题中的表示很重要,而且分析和管理数据集的主题也应多样化。 根据纽约大学的最新研究报告,“女性仅占Google AI研究人员的10%,而Google劳动力中只有2.5%是黑人。 缺乏代表性是导致数据集有偏差的原因,最终导致算法更可能使系统偏差永久化。”

Define the Business Problem and the Dataset

定义业务问题和数据集

In this video, “coding poet” Joy Buolamwini describes how she was excluded from facial recognition technology because not enough diversity was included in the dataset.

在这段视频中,“编码诗人”乔伊·布拉姆维尼(Joy Buolamwini)描述了如何将她从面部识别技术中排除,因为数据集中没有足够的多样性。

设计无偏数据预测 (Designing an Unbiased Data Prediction)

Stating the business problem and the desired outcome is the first step to designing an unbiased data prediction. This will guide your data collection process and decide what attributes you need when making a prediction.

说明业务问题和期望的结果是设计无偏数据预测的第一步。 这将指导您的数据收集过程,并确定进行预测时需要哪些属性。

Let’s say the prediction you want to make is “ What is the persona of a person most likely to commit fraud?”

假设您要做出的预测是“ 一个人最有可能进行欺诈的角色什么 ?”

Building personas of customers most likely to do something requires collecting demographic data. If you were to collect data from all men, the persona most likely to commit fraud would be men. It’s important to take a prediction like this and ask your team, “What data do we have to collect to make this prediction 100% accurate?”

建立最有可能做某事的客户的角色需要收集人口统计数据。 如果要从所有男人那里收集数据,那么最有可能进行欺诈的人物就是男人。 重要的是要进行这样的预测,并询问您的团队: “我们必须收集什么数据才能使此预测100%准确?”

If it isn’t as close to 100% accurate as possible, the prediction might lead you on a wrong course and your messaging might include bias language, your sales techniques might be off, or how you optimize your business might lead you down the wrong path and take away from revenue.

如果准确度无法达到100%,则预测可能会导致您走错路线,而您的消息传递可能会包含偏见语言,销售技巧可能不正确或您如何优化业务方式可能会导致错误路径并从收入中夺走。

Relating to defining the business problem, you also need to be able to state the desired outcome of data collection. For example: We want to collect data from the most users we can to create a relationship between their attributes and the chances of committing fraud. A statement like this can serve as your north star for collecting data.

关于定义业务问题,您还需要能够陈述所需的数据收集结果。 例如: 我们想从最多的用户那里收集数据,以便在他们的属性和欺诈可能性之间建立联系。 这样的语句可以充当您收集数据的北极星。

Before deciding what data to collect avoid sample bias and non-response bias.

在决定收集哪些数据之前,请避免样本偏差无响应偏差

Sample Bias — Only reaching out to a portion of your audience.

偏见样本 -仅覆盖部分受众。

Non-Response Bias — Only a small part of your audience responds to your survey, forum, etc.

无回应偏差 -只有一小部分观众回应您的问卷调查,论坛等。

Audience segmenting can be advantageous in terms of messaging and offerings but collecting data from a small segment of your audience to make general predictions about your user base can lead to skewed data predictions.

受众群体细分在消息传递和服务方面可能是有利的,但是从一小部分受众中收集数据以对用户群进行一般预测可能会导致数据预测出现偏差。

Here’s a great post on the importance of having large datasets.

这是有关拥有大型数据集的重要性的精彩文章。

A Big Part of Bias Datasets are Simply Not Having Enough Eyes on the Data

偏差数据集的很大一部分根本没有足够关注数据

When you have a usable dataset, your team should be able to define what content the dataset carries and ask themselves if enough users with various attributes are represented.

当您拥有可用的数据集时,您的团队应该能够定义数据集携带的内容,并询问自己是否代表了具有各种属性的足够用户。

Your team should also ask themselves if the way the data was collected was fair. Meaning, was the user pressured in any way to answer the question? Did the questions asked make sense? Was the user swayed to answer questions in certain ways?

您的团队还应该问自己,收集数据的方式是否公平。 意思是,用户是否被迫以任何方式回答问题? 问的问题有意义吗? 用户是否愿意以某些方式回答问题?

After you answer those questions, look at the team overseeing the data. Is there enough diversity on your team with different backgrounds who can collectively look at this dataset and say it’s unbiased?

回答完这些问题后,请查看负责监督数据的团队。 您的团队中有不同背景的多样性,他们可以共同查看此数据集并说它是公正的吗?

协作式机器学习如何应对偏见 (How Collaborative Machine Learning Combats Bias)

While researching ways to fight biased data predictions, I found that the biggest problem is data governance and a lack of diversity of those overseeing the data prediction process. One of the articles I came across was how great bosses avoid bias by implementing equal access. Collaborative ML provides equal access.

在研究解决有偏见的数据预测的方法时,我发现最大的问题是数据治理以及监督数据预测过程的人员缺乏多样性。 我遇到的文章之一是,伟大的老板如何通过实施平等的机会来避免偏见 。 协作式ML提供平等的访问权限。

Collaborative machine learning can help provide access to all the members of the team to ask questions collectively and record queries and outcomes.

协作式机器学习可以帮助团队中的所有成员共同访问并提出问题并记录查询和结果。

With collaborative machine learning, the transparency of your data predictions increases to avoid bias. This is a great way to increase governance in machine learning.

借助协作式机器学习,您的数据预测的透明度会增加,从而避免产生偏差。 这是增加机器学习治理的好方法。

Originally published at https://www.obviously.ai.

最初在 https://www.obviously.ai上 发布

翻译自: https://medium.com/downsample/collaborative-and-transparent-machine-learning-fights-bias-260487e9d732

标记偏见


http://www.taodudu.cc/news/show-2801397.html

相关文章:

  • 怎么看域名最终指向的ip_购买最终域名
  • 不只是coding_不只是外表
  • keras 中adam_ADAM电影中的照明技巧和窍门
  • saprk 提交远程作业_如何准备远程作业搜索
  • 程序封装真的是令人讨厌_网络上最令人讨厌的黑暗模式
  • 总论点和分论点_您将面对与他人谈论隐私的4个常见论点
  • 面向对象设计原则_面向对象的设计原则
  • 2020年计算机网络王道_2020年8种最大的网络安全威胁
  • plus钱包受黑客攻击_如何保护您的在线业务免受黑客攻击
  • 程序员为何喜欢debian_程序员为何拖延以及如何停止
  • xr企业级应用在哪里_如何在XR中保持私密
  • 深度学习去燥学习编码_您不应该学习编码的5个理由
  • 如何写出难以维护的代码--代码命名
  • How To JUST DO IT
  • 雅思阅读考点词-同义替换
  • 深度学习框架tensorflow学习与应用——代码笔记11(未完成)
  • 谈判如何在谈判中_工资谈判软件开发人员指南
  • 微观平台_不再受到微观管理
  • Spoken English Practice(I won't succumb to you, not ever again)
  • 数学建模-层次分析法
  • 层次分析法——确定指标权重、解决评价类问题
  • 【数学建模】—— 层次分析法(AHP)
  • Unity UI Text组件添加contentsizefitter后获取RectTransform宽度
  • unity中content size fitter组件不起作用
  • python3.8 安装fitter包失败,网上所有办法都试过不行之后!
  • 关于Unity ContentSizeFitter的坑
  • python pip install fitter 失败解决方案
  • Unity3D Content Size Fitter的坑
  • ContentSizeFitter刷新不及时
  • 压缩包安装fitter库,gbk编码错误解决方法

标记偏见_协作和透明的机器学习可消除偏见相关推荐

  1. 机器学习算法 拟合曲线_制定学习曲线以检测机器学习算法中的错误

    机器学习算法 拟合曲线 机器学习 (Machine Learning) The learning curve is very useful to determine how to improve th ...

  2. sam服务器是什么_使用SAM CLI将机器学习模型部署到无服务器后端

    sam服务器是什么 介绍 (Introduction) Over the last year at CMRA, we have been incorporating more machine lear ...

  3. LWN:机器学习模型的偏见和伦理问题

    点击上方蓝色"Linux News搬运工"关注我们~ Bias and ethical issues in machine-learning models September 2, ...

  4. Google I/O 2019 行纪 —— Google 要让 AI 消除偏见

    作者 | 袁滚滚,CSDN 特约记者 出品 | CSDN 资讯(ID:CSDNnews) 每年的五六月,是全球众多的开发者颇为期待的月份.因为在这两个月中,科技巨头们的年度开发者大会如微软的 Buil ...

  5. 标记偏见_分析师的偏见

    标记偏见 "Beware of the HiPPO in the room" - The risks and dangers of top-down, intuition-base ...

  6. 标记偏见_如何(巧妙地)扭曲视觉效果以支持您的偏见叙事

    标记偏见 Data is important - it is the logical justification for world-changing decisions. Unfortunately ...

  7. 如何识别媒体偏见_描述性语言理解,以识别文本中的潜在偏见

    如何识别媒体偏见 TGumGum can do to bring change by utilizing our Natural Language Processing technology to s ...

  8. python绘制相频特性曲线_数据分析之Matplotlib和机器学习基础

    一.Matplotlib基础知识 Matplotlib 是一个 Python 的 2D绘图库,它以各种硬拷贝格式和跨平台的交互式环境生成出版质量级别的图形. 通过 Matplotlib,开发者可以仅需 ...

  9. 图片 标记 软件_如何设计软件功能标记

    图片 标记 软件 A previous company had a problem: our deploys were thousands of lines in size, took nearly ...

最新文章

  1. C++:求极值的 min_element、max_element和minmax_element算法
  2. 伪代码的写法(转载)
  3. io读取一个文件再写入socket技术_JAVA中IO与NIO面试题
  4. CNCF推出云原生网络功能(CNF)Testbed
  5. 博客那些用到极致的推广方式
  6. 通信人,请不要吝啬举手之劳
  7. graphpad细胞增殖曲线_肿瘤干细胞?居然被这两个新加坡人轻松干掉了?
  8. 基于javaweb(springboot+mybatis)宠物医院预约管理系统设计和实现
  9. 笔记 | 《机器学习》手推笔记更新集成学习(Boosting和随机森林)
  10. Debian 7.8 通过 apt-get 安装 nodejs
  11. matlab-排队模型和排队系统仿真
  12. 风云2号卫星云图_世界气象日话说54所与风云气象卫星的“不解之缘”
  13. 软件著作权申请怎么申报?申报流程、应该注意哪些事项
  14. 域名和IP地址是一回事吗?建网站要买域名还要买IP地址吗?
  15. python语言必刷题——BMI值的计算
  16. 淘宝店群月入万元,店群20大常见方法都在这
  17. 阿里云神龙团队拿下TPCx-BB排名第一的背后技术是什么?
  18. ENVI5.6保姆式安装教程,超详细;附PoJie版安装包
  19. CCAI 2017首日 AI圣经《深度学习》中文版强势首发
  20. 简单说明一下数据库审计能带来的价值

热门文章

  1. 微信小程序之支付密码输入
  2. ESD防护的4种静电保护设计
  3. T1677 [USACO Open08]农场周围的道路——递归
  4. 解决小程序canvas高清屏模糊问题
  5. 录屏怎么录声音?注意一点轻松录制外部音源
  6. 算法核心-动态规划算法
  7. Android笔记(三)按键动态渐变
  8. 面向物流行业的文档管理系统
  9. java依赖什么意思,JavaEE中的依赖性——依赖查找
  10. 苹果xr配置_看看下面这些对比,你就知道苹果x和xr哪个好?