标记偏见

No one wants bias in their organization. Underrepresentation has plagued the business world for years and many fear bias is making its way into the artificial intelligence industry. While AI and machine learning are very technical, scientific subjects, they can succumb to human error. Collecting data-while immensely important-is an evolving practice and room for bias is still very much a part of the data prediction process.

没有人希望他们的组织有偏见。代表性不足已经困扰着商业世界多年，许多人担心偏见正在进入人工智能行业。虽然AI和机器学习是非常技术性，科学性的学科，但它们可能会屈服于人为错误。尽管数据非常重要，但收集数据是一种不断发展的实践，存在偏差的空间仍然是数据预测过程的重要组成部分。

偏向于机器学习的领域 (Areas in Machine Learning Vulnerable to Bias)

Training Prediction Models

训练预测模型

Input and output. Both are areas that need governance, but when building a model you should clearly be able to explain what the model does and the expected outcome of the model. It’s important to remember the developers of the prediction model have their own backgrounds and blind spots when it comes to writing algorithms.

输入和输出。两者都是需要治理的领域，但是在构建模型时，您应该清楚地能够解释模型的作用以及模型的预期结果。重要的是要记住，预测模型的开发人员在编写算法时要有自己的背景和盲点。

Lack of Representation in Datasets

数据集中缺乏代表性

Author Tom Taulli wrote for Forbes, “Additionally, studies have shown that algorithms trained on historically biased data have significant error rates for communities of color especially in over-predicting the likelihood of a convicted criminal to re-offend which can have serious implications for the justice system.” This can translate to biased algorithms making the wrong decisions for your business.

作者汤姆·陶利(Tom Taulli)为《福布斯》撰文说：“此外，研究表明，根据历史偏见数据训练的算法对于有色人种群体有明显的错误率，尤其是在过度预测定罪的罪犯再次犯罪的可能性上，这可能会对犯罪分子产生严重的影响。司法系统。” 这可能会转化为偏向算法的业务决策。

Not only is representation in your data subjects important, but also those analyzing and managing datasets should be diverse. According to a recent research report from NYU, “women comprise only 10% of AI research staff at Google and only 2.5% of Google’s workforce is black. This lack of representation is what leads to biased datasets and ultimately algorithms that are much more likely to perpetuate systemic biases.”

不仅数据主题中的表示很重要，而且分析和管理数据集的主题也应多样化。根据纽约大学的最新研究报告，“女性仅占Google AI研究人员的10％，而Google劳动力中只有2.5％是黑人。缺乏代表性是导致数据集有偏差的原因，最终导致算法更可能使系统偏差永久化。”

Define the Business Problem and the Dataset

定义业务问题和数据集

In this video, “coding poet” Joy Buolamwini describes how she was excluded from facial recognition technology because not enough diversity was included in the dataset.

在这段视频中，“编码诗人”乔伊·布拉姆维尼(Joy Buolamwini)描述了如何将她从面部识别技术中排除，因为数据集中没有足够的多样性。

设计无偏数据预测 (Designing an Unbiased Data Prediction)

Stating the business problem and the desired outcome is the first step to designing an unbiased data prediction. This will guide your data collection process and decide what attributes you need when making a prediction.

说明业务问题和期望的结果是设计无偏数据预测的第一步。这将指导您的数据收集过程，并确定进行预测时需要哪些属性。

Let’s say the prediction you want to make is “ What is the persona of a person most likely to commit fraud?”

假设您要做出的预测是“ 一个人最有可能进行欺诈的角色是什么？”

Building personas of customers most likely to do something requires collecting demographic data. If you were to collect data from all men, the persona most likely to commit fraud would be men. It’s important to take a prediction like this and ask your team, “What data do we have to collect to make this prediction 100% accurate?” ‍

建立最有可能做某事的客户的角色需要收集人口统计数据。如果要从所有男人那里收集数据，那么最有可能进行欺诈的人物就是男人。重要的是要进行这样的预测，并询问您的团队： “我们必须收集什么数据才能使此预测100％准确？” ‍

If it isn’t as close to 100% accurate as possible, the prediction might lead you on a wrong course and your messaging might include bias language, your sales techniques might be off, or how you optimize your business might lead you down the wrong path and take away from revenue.

如果准确度无法达到100％，则预测可能会导致您走错路线，而您的消息传递可能会包含偏见语言，销售技巧可能不正确或您如何优化业务方式可能会导致错误路径并从收入中夺走。

Relating to defining the business problem, you also need to be able to state the desired outcome of data collection. For example: We want to collect data from the most users we can to create a relationship between their attributes and the chances of committing fraud. A statement like this can serve as your north star for collecting data.

关于定义业务问题，您还需要能够陈述所需的数据收集结果。例如： 我们想从最多的用户那里收集数据，以便在他们的属性和欺诈可能性之间建立联系。 这样的语句可以充当您收集数据的北极星。

Before deciding what data to collect avoid sample bias and non-response bias.

在决定收集哪些数据之前，请避免样本偏差和无响应偏差 。

Sample Bias — Only reaching out to a portion of your audience.

偏见样本 -仅覆盖部分受众。

Non-Response Bias — Only a small part of your audience responds to your survey, forum, etc.

无回应偏差 -只有一小部分观众回应您的问卷调查，论坛等。

Audience segmenting can be advantageous in terms of messaging and offerings but collecting data from a small segment of your audience to make general predictions about your user base can lead to skewed data predictions.

受众群体细分在消息传递和服务方面可能是有利的，但是从一小部分受众中收集数据以对用户群进行一般预测可能会导致数据预测出现偏差。

Here’s a great post on the importance of having large datasets.

这是有关拥有大型数据集的重要性的精彩文章。

A Big Part of Bias Datasets are Simply Not Having Enough Eyes on the Data ‍

偏差数据集的很大一部分根本没有足够关注数据

When you have a usable dataset, your team should be able to define what content the dataset carries and ask themselves if enough users with various attributes are represented.

当您拥有可用的数据集时，您的团队应该能够定义数据集携带的内容，并询问自己是否代表了具有各种属性的足够用户。

Your team should also ask themselves if the way the data was collected was fair. Meaning, was the user pressured in any way to answer the question? Did the questions asked make sense? Was the user swayed to answer questions in certain ways?

您的团队还应该问自己，收集数据的方式是否公平。意思是，用户是否被迫以任何方式回答问题？问的问题有意义吗？用户是否愿意以某些方式回答问题？

After you answer those questions, look at the team overseeing the data. Is there enough diversity on your team with different backgrounds who can collectively look at this dataset and say it’s unbiased?

回答完这些问题后，请查看负责监督数据的团队。您的团队中有不同背景的多样性，他们可以共同查看此数据集并说它是公正的吗？

协作式机器学习如何应对偏见 (How Collaborative Machine Learning Combats Bias)

While researching ways to fight biased data predictions, I found that the biggest problem is data governance and a lack of diversity of those overseeing the data prediction process. One of the articles I came across was how great bosses avoid bias by implementing equal access. Collaborative ML provides equal access.

在研究解决有偏见的数据预测的方法时，我发现最大的问题是数据治理以及监督数据预测过程的人员缺乏多样性。我遇到的文章之一是，伟大的老板如何通过实施平等的机会来避免偏见。协作式ML提供平等的访问权限。

Collaborative machine learning can help provide access to all the members of the team to ask questions collectively and record queries and outcomes.

协作式机器学习可以帮助团队中的所有成员共同访问并提出问题并记录查询和结果。

With collaborative machine learning, the transparency of your data predictions increases to avoid bias. This is a great way to increase governance in machine learning.

借助协作式机器学习，您的数据预测的透明度会增加，从而避免产生偏差。这是增加机器学习治理的好方法。

‍

Originally published at https://www.obviously.ai.

最初在 https://www.obviously.ai上 发布。

翻译自: https://medium.com/downsample/collaborative-and-transparent-machine-learning-fights-bias-260487e9d732

标记偏见

查看全文

http://www.taodudu.cc/news/show-2801397.html

怎么看域名最终指向的ip_购买最终域名
不只是coding_不只是外表
keras 中adam_ADAM电影中的照明技巧和窍门
saprk 提交远程作业_如何准备远程作业搜索
程序封装真的是令人讨厌_网络上最令人讨厌的黑暗模式
总论点和分论点_您将面对与他人谈论隐私的4个常见论点
面向对象设计原则_面向对象的设计原则
2020年计算机网络王道_2020年8种最大的网络安全威胁
plus钱包受黑客攻击_如何保护您的在线业务免受黑客攻击
程序员为何喜欢debian_程序员为何拖延以及如何停止
xr企业级应用在哪里_如何在XR中保持私密
深度学习去燥学习编码_您不应该学习编码的5个理由
如何写出难以维护的代码--代码命名
How To JUST DO IT
雅思阅读考点词-同义替换
深度学习框架tensorflow学习与应用——代码笔记11（未完成）
谈判如何在谈判中_工资谈判软件开发人员指南
微观平台_不再受到微观管理
Spoken English Practice（I won't succumb to you, not ever again）
数学建模-层次分析法
层次分析法——确定指标权重、解决评价类问题
【数学建模】—— 层次分析法（AHP）
Unity UI Text组件添加contentsizefitter后获取RectTransform宽度
unity中content size fitter组件不起作用
python3.8 安装fitter包失败，网上所有办法都试过不行之后！
关于Unity ContentSizeFitter的坑
python pip install fitter 失败解决方案
Unity3D Content Size Fitter的坑
ContentSizeFitter刷新不及时
压缩包安装fitter库，gbk编码错误解决方法

标记偏见_协作和透明的机器学习可消除偏见相关推荐

机器学习算法拟合曲线_制定学习曲线以检测机器学习算法中的错误
机器学习算法拟合曲线机器学习 (Machine Learning) The learning curve is very useful to determine how to improve th ...
sam服务器是什么_使用SAM CLI将机器学习模型部署到无服务器后端
sam服务器是什么介绍 (Introduction) Over the last year at CMRA, we have been incorporating more machine lear ...
LWN：机器学习模型的偏见和伦理问题
点击上方蓝色"Linux News搬运工"关注我们~ Bias and ethical issues in machine-learning models September 2, ...
Google I/O 2019 行纪 —— Google 要让 AI 消除偏见
作者 | 袁滚滚,CSDN 特约记者出品 | CSDN 资讯(ID:CSDNnews) 每年的五六月,是全球众多的开发者颇为期待的月份.因为在这两个月中,科技巨头们的年度开发者大会如微软的 Buil ...
标记偏见_分析师的偏见
标记偏见 "Beware of the HiPPO in the room" - The risks and dangers of top-down, intuition-base ...
标记偏见_如何（巧妙地）扭曲视觉效果以支持您的偏见叙事
标记偏见 Data is important - it is the logical justification for world-changing decisions. Unfortunately ...
如何识别媒体偏见_描述性语言理解，以识别文本中的潜在偏见
如何识别媒体偏见 TGumGum can do to bring change by utilizing our Natural Language Processing technology to s ...
python绘制相频特性曲线_数据分析之Matplotlib和机器学习基础
一.Matplotlib基础知识 Matplotlib 是一个 Python 的 2D绘图库,它以各种硬拷贝格式和跨平台的交互式环境生成出版质量级别的图形. 通过 Matplotlib,开发者可以仅需 ...
图片标记软件_如何设计软件功能标记
图片标记软件 A previous company had a problem: our deploys were thousands of lines in size, took nearly ...

标记偏见_协作和透明的机器学习可消除偏见

偏向于机器学习的领域 (Areas in Machine Learning Vulnerable to Bias)

设计无偏数据预测 (Designing an Unbiased Data Prediction)

协作式机器学习如何应对偏见 (How Collaborative Machine Learning Combats Bias)

相关文章：

标记偏见_协作和透明的机器学习可消除偏见相关推荐

最新文章

热门文章