vad唤醒算法_唤醒算法经济后公司需要考虑的问题

vad唤醒算法

We are living in the era of the algorithm driving the digital economy. To keep up, businesses are turning to the backend of machine learning tools to power revenue streams.

我们正处在驱动数字经济的算法时代。为了跟上步伐，企业正在转向机器学习工具的后端来增加收入来源。

According to Gartner’s 2019 CIO Agenda survey:

根据Gartner的2019年CIO议程调查：

Between 2018 and 2019, the percent of organizations adopting AI practices grew from 4% to 14%.

在2018年至2019年之间，采用AI做法的组织百分比从4％增加到14％。

As data continues to be the focal point of decision-making, we will only see more and more businesses harnessing AI capabilities to supercharge certain business functions; however, with this trend, there are growing concerns over the gap between decision makers and their ability to defend the inner-workings of algorithms put to practice by their organizations.

随着数据继续成为决策的重点，我们只会看到越来越多的企业利用AI功能来增强某些业务功能。但是，随着这一趋势，人们越来越关注决策者之间的差距以及他们捍卫组织实施的算法内部工作的能力 。

简化算法 (Algorithms Simplified)

By definition, an algorithm is a series of steps taken to complete a task or operation.

根据定义，算法是完成任务或操作所采取的一系列步骤。

In the context of AI, an algorithm is a process that learns from given data without having to be programmed explicitly through statistical methodology known as machine learning (ML). An ML algorithm can range in complexity from a few paltry steps to complete a single parameter regression to a deep neural network with thousands or millions of parameters interconnected through an even larger number of neurons to form relationships, which are largely unexplainable by today’s technology.

在AI的上下文中，算法是从给定数据中学习而不必通过称为机器学习(ML)的统计方法进行显式编程的过程。 ML算法的复杂性范围很广，从完成几个简单的步骤到完成单个参数的回归，再到深度神经网络，其中成千上万个参数通过更多数量的神经元相互连接以形成关系，而这在当今的技术中是无法解释的。

This latter type of model is known as a blackbox model and is quite common in the practice of AI: this is when the input yields a valuable output, with little to no knowledge of what took place inside of the model and how the data in question was processed.

后一种类型的模型称为黑盒模型 ，在AI的实践中很常见：这是当输入产生有价值的输出时，几乎不知道模型内部发生了什么以及所涉及的数据如何已处理。

缺乏算法责任 (The Lack of Algorithm Accountability)

Traditionally, algorithm accountability has taken a backseat to profit. Researchers and business leaders haven’t needed to make public or defend the details of their proprietary models.

传统上，算法问责制使利润倒退。研究人员和业务负责人无需公开或捍卫其专有模型的细节。

With no outside pressure, why would a company be motivated to pause and contemplate their data practices if their ML pipelines are healthy and customers are satisfied?

在没有外部压力的情况下， 如果公司的机器学习管道运行良好且客户满意 ， 为什么还要激励公司暂停并考虑其数据实践？

For starters, there is the interest in perfecting the product and strategy. Within the data pipeline, you should be able to guarantee quality and no bias at every step. If you can’t, there is a risk of welcoming bad quality data into the system or creating a highly biased model that turns away customers.

对于初学者来说，有完善产品和策略的兴趣。在数据管道中，您应该能够保证质量，并且每一步都没有偏差。如果无法做到，则存在将不良质量的数据欢迎到系统中或创建高度偏向客户的模型的风险。

A famous example came last fall, when Apple released the Apple Card (issued by Goldman Sachs) and customers noticed that female cardholders were given a lower credit line than their male counterparts. When interrogated, Goldman Sachs was largely unable to explain the disparity and ended up facing no consequences.

一个著名的例子是去年秋天，当时苹果发布了由高盛发行的Apple Card，顾客注意到女性持卡人的信用额度比男性持卡人低。在接受审讯时，高盛基本上无法解释这种差异，最终没有任何后果。

This seems to be the pattern with companies and organizations who are called out for discriminatory systems: initial exposure by media and then swiftly brushed away as the product stays in production with few hassled changes on the company’s end.

这似乎是要求歧视性系统的公司和组织的一种模式： 最初是通过媒体曝光，然后随着产品在生产中的使用而Swift被淘汰， 而在公司的末尾几乎没有麻烦的变化。

But, as AI becomes democratized, companies should be wary that regulation will follow suit.

但是，随着AI民主化，公司应警惕监管将随之而来。

在公司设定AI道德标准 (Setting the AI Ethics Standard at Your Company)

“Garbage in = garbage out” is an overused adage in machine learning that will definitely be thrown at you in an intro class.

“垃圾回收=垃圾淘汰”是机器学习中过度使用的格言，在入门课程中肯定会扔给您。

There is a common misconception that in order to remedy model accuracy, you must simply rein in more data. This is not necessarily true. it is quite more strategic to have good quality data than be able to mine a copious amount of data.

常见的误解是，为了纠正模型的准确性，您必须简单地控制更多数据。这不一定是真的。拥有高质量的数据比挖掘大量 数据更具战略意义。

Without being thorough, you risk putting your company in a spot where the data team is struggling with an algorithm that isn’t performing optimally or is backed up with scrubbing messy data to a usable format. It is a laughable point in the industry that a data scientist spends most of their time cleaning data than any other glamorized task. While data scientists tend to be trained on clean toy datasets, in practice, data is extremely difficult to receive cleaned.

如果不彻底，您可能会冒着使公司陷入数据团队陷入困境的风险，因为该算法的性能无法达到最佳或已将杂乱数据整理成可用的格式。在行业中，可笑的一点是，数据科学家花费大部分时间来清理数据，而不是其他任何繁琐的任务。尽管数据科学家倾向于接受有关清洁玩具数据集的培训，但在实践中，很难获得清洁的数据。

If you are building a company or running the department that works with data, it is your responsibility to find resources to create thoughtful data processes.

如果您要建立公司或运营与数据打交道的部门，那么您有责任找到创建周到的数据流程的资源。

If you’re an early-stage company, it is crucial to set this data quality precedence as the company grows and is centered around AI technology. If you do come from a non-technical background and are starting up a company, it might help to consult experts in this area to ensure that your business is running on a healthy pipeline without being fully hands-off. There is a lot of contention in how much AI should be present in our lives, but it looks like it will become more of a reality sooner than later. We are the past the struggle of getting usable information out of these systems, but are stuck fully on the complicated side-effects that do come with it.

如果您是一家处于起步阶段的公司，那么随着公司的发展以及以AI技术为中心，设置此数据质量优先级至关重要。如果您确实来自非技术领域并且正在创办公司，那么可能会帮助咨询该领域的专家，以确保您的业务运转顺利，而无需完全放手。在我们的生活中应该出现多少AI方面存在很多争论，但是看起来它早日成为现实。从这些系统中获取有用信息是我们过去的努力，但是我们完全陷入了随之而来的复杂副作用。

AI民主化如何在道德AI中发挥作用？ (How Does AI Democratization Play in Ethical AI?)

The previous section ties into the goals of democratizing AI — challenging the incorrect perception that AI is highly complicated and unapproachable by those who are not specifically trained in it. There are several no-code/low-code platforms that could help you build initial business intelligence and machine learning models. These platforms are ideal for early stage to early growth-stage companies for specific needs like predicting churn rate or customer lifetime value.

上一节与使AI民主化的目标相关联-挑战错误的认识，即AI高度复杂且未被未经专门培训的人无法接近。有几种无代码/低代码平台可以帮助您建立初始的商业智能和机器学习模型。这些平台对于满足特定需求(例如预测客户流失率或客户生命周期价值)的早期到早期增长阶段的公司是理想的。

However, there are limitations to using no-code platforms with pre-built models in being able to do precise parameter tuning or scale well as the company and data grow. This is would likely be the time your company is able to grow a data department to take care of custom pipelines, but using a platform like Obviously AI to do initial data work is quite effective.

但是，使用无代码平台和预构建模型存在局限性，因为它可以进行精确的参数调整或随着公司和数据的增长而很好地扩展。这可能是您的公司能够发展数据部门来处理自定义管道的时候，但是使用像明显的AI这样的平台进行初始数据工作是非常有效的。

The main challenge for for startups has become how to thoughtfully and emphatically build these machines—and less about if it is possible to.

对于初创企业而言，主要的挑战在于如何深思熟虑地重点开发这些机器，而不是如果可能的话。

As a small to medium sized business, you might not be able to allocate resources to your data team, but you do have the ability to set up a data-powered company ethically and carefully very early on.

作为一家中小型企业，您可能无法为数据团队分配资源，但是您确实有能力在很早的时候就以道德和谨慎的态度成立一家由数据驱动的公司。

Still, there is a fair amount of apprehension with shifting towards the AI economy from analog and what this could mean for privacy and avoiding discrimination. To protect your company and customers, it is important to think about the steps that take to deploy a model.

尽管如此，在从模拟经济转向AI经济的过程中仍存在相当多的担忧，这对于隐私和避免歧视意味着什么。为了保护您的公司和客户，请务必考虑部署模型所要采取的步骤。

在构建AI系统的每个步骤中都可能存在偏差 (Bias is Possible at Every Step of Building Your AI system)

Suppose you are implementing the pipeline for determining credit limit:

假设您正在实施确定信用额度的管道：

You frame the problem as “What is the credit limit granted to the customer based off personal and financial factors?” You are trying to optimize for maximum repayment.

您将问题定义为“ 基于个人和财务因素授予客户的信用额度是多少？ ”您正在尝试优化以实现最大还款。
The model is trained on previously collected data because you have no new user data of this specific credit card since it is unreleased. You have access to tons of previous credit issuing and repayment data on thousands of customers, so you decide to use this to train your model. This data was collected in a way that ended up with a lot of missing data points.

该模型是根据先前收集的数据进行训练的，因为您没有该特定信用卡的新用户数据，因为它尚未发布。您可以访问成千上万的大量客户先前的信用发放和还款数据，因此您决定使用此数据来训练模型。收集这些数据的方式最终导致许多缺失的数据点。

Bias can already be introduced to your system if you didn’t stop to question a few things:

如果您不停止质疑以下问题，那么就已经可以将偏差引入您的系统：

Has your previous data—which will be your training data—been checked for bias? Did you question why there’s missing data?

是否检查过您以前的数据(将作为您的训练数据)是否存在偏倚？ 您是否质疑为什么缺少数据？

Now, we move on to processing the data. Here, you try to figure out what to do with the missing data and determine what variables to use in your model. For those instances of missing data points, you choose to drop them entirely from your dataset. You choose to remove “gender” and “race” from the input because you definitely don’t want a biased system. However, you did not consider the other variables which will implicitly group genders and races anyway — which happens quite often in problematic AI that have been made public.

现在，我们继续处理数据。在这里，您尝试找出如何处理丢失的数据，并确定要在模型中使用哪些变量。对于那些缺少数据点的实例，您选择完全从数据集中删除它们。您选择从输入中删除“性别”和“种族”，因为您绝对不希望系统带有偏见。但是，您没有考虑其他变量，它们无论如何都会隐式地将性别和种族进行分组，而这在已公开的有问题的AI中经常发生。
You choose a proprietary model after some testing to output credit limits, which will maximize repayment for the company. The model is trained on the data you processed and cleaned in the previous step. You test for error rates and are satisfied and the model is put into production as the credit card is released.

经过一些测试后，您可以选择专有模型来输出信用额度，这将最大程度地为公司偿还债务。根据您在上一步中处理和清理的数据对模型进行训练。您测试错误率并感到满意，并且随着信用卡的发放，该模型已投入生产。

The model may be able to now discriminate on the basis of race and/or gender. The missing data points could’ve been due to customers who are younger and are lacking information on their financial background or it could be customers who had to cancel a previous credit card or because of hardship have gaps in repaying on time.

该模型现在可以根据种族和/或性别进行区分。丢失的数据点可能是由于年轻的客户以及缺乏财务背景信息的客户造成的，或者是由于不得不取消以前的信用卡或由于困难而无法按时还款的客户。

This leads the system to not internalize these instances and form an unknown bias for new customers who may have similar backgrounds since those data points were dropped and excluded from the model.

这导致系统无法内部化这些实例，并且对于那些背景相似的新客户形成未知的偏见，因为这些数据点已被删除并从模型中排除。

Machine-intelligence emulates the societal prejudice that plagues us, but it does reduce bias to a certain degree. There are plentiful examples of why AI is better than allowing humans to be the sole decision maker in cases such as policing or determining what high-risk patient should receive care because human bias is exactly the thing that AI-based recommendations were set out to abolish. With this shift, we have introduced a different, more complex type of bias which is inherent to how AI functions.

机器智能模仿了困扰我们的社会偏见，但确实在一定程度上减少了偏见。有很多例子说明为什么在监管或确定哪些高风险患者应接受治疗等方面，人工智能比让人类成为唯一的决策者更好，因为基于人类的偏见正是基于人工智能的建议被取消的原因。。通过这种转变，我们引入了一种不同的，更复杂的偏差类型，这是AI运作方式固有的。

呼吁开源和可解释模型 (The Call for Open Source and Explainable Models)

Algorithms often become questionable when it enters spaces where data is sensitive — finance, healthcare, or the justice system.

当算法进入数据敏感的空间(金融，医疗保健或司法系统)时，算法通常会产生问题。

Without our direct knowledge of it, these algorithms have entered these systems which typically should require more scrutiny and we are finding that they introduce bias in a more complex manner than human decision making — this poses to affect part of society quite adversely, including women and people of color.

在没有我们直接了解的情况下，这些算法已进入了这些系统，通常需要更多的审查，而且我们发现它们以比人为决策更复杂的方式引入偏见- 这对社会的某些部分造成了不利影响，包括妇女和妇女。有色人种。

The usual proposal for this type of recurring discrimination through AI is to make the use of proprietary modeling obsolete. The most infamous example of a company conducting secretive data operations is Palantir, who filed for an IPO in 2020. It faced a lot of public scrutiny because of their government ties. There hasn’t been a need for it to come forward and disclose publicly how it mines or uses data, likely because it does work with organizations such as the CIA and Pentagon.

通过AI进行这种反复歧视的通常建议是过时使用专有模型。公司进行秘密数据操作的最臭名昭著的例子是Palantir，他于2020年申请IPO。由于其与政府的关系，该公司面临着许多公众审查。并不需要公开提出它如何挖掘或使用数据，可能是因为它确实与CIA和五角大楼等组织合作。

Publishing your work in public gives it a better chance of it being checked for flaws or areas that it could collect bias. Popular AI frameworks like TensorFlow and Keras are open-source, and there’s people using it and pointing out any deprecations or bugs in the models frequently. The more likely scenario is that your company gets off the ground with no-code/low-code tools and then this may be something to worry about later down the line when scaling. It is also likely you might not even need blackbox AI tools to meet your business needs.

在公共场合发布您的作品，可以更好地检查可能会引起偏差的缺陷或区域。 TensorFlow和Keras等流行的AI框架是开源的，有人在使用它，并经常指出模型中的任何过时或错误。更有可能的情况是，您的公司使用无代码/低代码工具起步，然后在扩展时可能会担心这事。您甚至可能甚至不需要黑盒AI工具来满足您的业务需求。

In this paper, Cynthia Rudin, professor of computer science at Duke University, vouches for picking interpretable models instead of blackbox models, especially when it comes to sensitive spaces. Of course, there is a compromise in model accuracy when we delve into using a simpler model with more explainability. However, a simpler ML model allows the researcher to be able to understand bias creation better and tune parameters as needed — doing the same for a highly complex model might be impossible. In most cases, Rudin argues, an interpretable model works good enough anyway for the problem being solved for. AI is marketed as an unattainable feat for the most part, but deep neural networks are not needed for you to automate internal processes.

在本文中，杜克大学计算机科学教授辛西娅·鲁丁(Cynthia Rudin)保证选择可解释的模型，而不是黑盒模型，特别是在涉及敏感空间时。当然，当我们深入研究使用具有更多可解释性的更简单模型时，模型准确性会受到损害。但是，更简单的ML模型使研究人员能够更好地理解偏差的产生并根据需要调整参数-对高度复杂的模型进行相同的操作可能是不可能的。鲁丁认为，在大多数情况下，无论如何解决问题，可解释的模型都足够有效。人工智能在大多数情况下是无法实现的壮举，但是您不需要深度神经网络来使内部流程自动化。

Algorithms are prevalent in our digital day-to-day lives and they improve it for the most part. It is inevitable that loosely defined problems, uncleaned data, and unintended bias will enter the system somehow, but these should be dealt with before being put in production. This is because when facial recognition technology discriminates against Black people or your healthcare bot prioritizes men over women for medical care—and that is plain irresponsible.

算法在我们的数字日常生活中非常普遍，并且在很大程度上改善了算法。不可避免地，松散定义的问题，未清除的数据和意外的偏差将以某种方式进入系统，但是这些问题应在投入生产之前进行处理。这是因为当面部识别技术歧视黑人时，或者您的医疗保健机器人将男性优先于女性进行医疗保健时，这是毫无责任的。

算法业务早就该进行监管了 (Algorithmic Businesses are Long Overdue for Regulation)

A lot of society is wholly skeptical of AI technology and with rightful reasoning because a lot of media perpetuates the controlling and invasive nature of AI repeatedly. While not understanding the true capabilities and limitations of the technology we have so far, lawmakers have slowly started to call for regulation but have fallen short:

许多社会完全怀疑AI技术并具有正当的理由，因为许多媒体反复地延续了AI的控制性和侵入性。议员们虽然不了解我们迄今为止所拥有技术的真正功能和局限性，但他们慢慢开始呼吁监管，但未能做到：

In 2017, New York City put together an act to combat discriminatory AI systems. This received a lot of backlash because it was essentially forcing public agencies and tech companies to publicize their code. There was a huge concern for diminishing competitive advantage and also heightened security risk. In this case, it was obvious that the preparers of this act didn’t consider all the nuance that came with AI regulation.

2017年，纽约市制定了一项打击歧视性AI系统的法案。由于这实际上迫使公共机构和科技公司公开其代码，因此受到了强烈反对。人们对减少竞争优势以及增加安全风险深感忧虑。在这种情况下，很明显，该法案的准备者并未考虑到AI法规带来的所有细微差别。
In 2019, Congress proposed the Algorithmic Accountability Act, which would allow the Federal Trade Commission (FTC) to enforce impact assessment on companies who are suspected to be deploying biased systems. This was exceptionally better fleshed out than the act in 2017, but there is still question of how third party affiliation could affect this sensitive investigation inside of a company.

国会在2019年提出了《算法责任法案》，该法案将允许联邦贸易委员会(FTC)对怀疑正在部署偏见系统的公司实施影响评估。这比2017年的法案更为充实，但是仍然存在一个问题，即第三方关系如何影响公司内部的敏感调查。

From these two initiatives, it is clear that the runway to more refined regulation is in place as companies are adopting and deploying AI exponentially more each year. Although a lot appears to fly under the radar as AI regulation is still quite nebulous, we will see more companies facing more speculation.

从这两项计划中可以明显看出，随着公司每年采用和部署AI的次数呈指数级增长，更完善的法规已到位。尽管由于AI法规仍然很模糊，很多人似乎不愿接受，但我们将看到更多的公司面临更多的投机活动。

人工智能算法以多种方式影响消费者 (AI Algorithms Affect Consumers in Many Ways)

This article should encourage thoughtfulness about data processes as it not only serves as a revenue funnel, but also can affect people in many ways and jeopardize the company accidentally. In the situation of Apple and Goldman Sachs, there was little accountability on their end aside from a short statement. For a business leader, this means you should be ready to defend any processes or models that led to customers or users feeling any sort of discrimination. There is no way to be better prepared for this, but by being involved in laying the groundwork for thoughtful AI.

本文应该鼓励对数据流程进行谨慎的考虑，因为它不仅可以充当收入漏斗，而且还可以在许多方面影响人们并意外危害公司。在苹果公司和高盛公司的情况下，除了简短的声明外，他们最终没有承担任何责任。对于业务主管来说，这意味着您应该准备捍卫任何导致客户或用户感到任何歧视的流程或模型。没有办法为此做好更好的准备，而是要参与为深思熟虑的AI奠定基础。