变异数分析_人工智能系统中分析变异的祸害

变异数分析

迈向AI生产力的高原 (Towards AI’s Plateau of Productivity)

There is a famous paradox within the classical interpretation of probability theory called the Bertrand’s paradox. Bertrand formulated a very simple problem as follows.

在概率论的经典解释中有一个著名的悖论，即贝特朗悖论 。贝特朗提出了一个非常简单的问题，如下。

Inscribe an equilateral triangle in a circle and then determine the probability of randomly choosing a chord such that its size is greater than the side of the triangle.

将等边三角形刻成圆形，然后确定随机选择和弦的可能性，使其和弦长大于三角形的边。

Wikipedia) Wikipedia ) CC BY-SA 3.0CC BY-SA 3.0

He gave three valid arguments to solve this problem but each yielding a different result. Check the Wikipedia article for details on the three arguments, if interested. The resolution to this paradox arises from the fact that each of the valid arguments seems to be addressing the same problem but essentially they are solving different mathematical questions.

他提出了三个 有效的论点来解决此问题，但每个论点都有不同的结果。如果有兴趣，请查阅Wikipedia文章以获取有关这三个参数的详细信息。解决这一悖论的原因是，每个有效论点似乎都在解决相同的问题，但实际上它们正在解决不同的数学问题。

I will revisit this paradox in the context of this article, but before that need to highlight some issues in the way we are building AI systems currently.

我将在本文的上下文中再次探讨这个悖论，但是在此之前，需要强调一些我们目前构建AI系统的方式中的问题。

In the ICT industry, engineers are increasingly moving towards building AI systems to add value to customers by solving existing problems and making processes more efficient. With the seemingly successful application of deep learning, experts are opining, with conviction, that the AI winter has finally come to an end.

在ICT行业中，工程师越来越倾向于构建AI系统，以解决现有问题并提高流程效率，从而为客户增加价值。随着深度学习的成功应用，专家们坚信AI的冬天终于结束了。

But, there are at least three major issues (as also reported by various experts) which we are dealing with while building the AI systems.

但是，在构建AI系统时，我们至少要处理三个主要问题(各专家也报告过)。

建立在黑匣子模型上 (Building over Blackbox Models)

ML engineers work at various abstraction layers. More often than not, the underlying machine learning algorithm is a black box to the engineer who intends to integrate it with the overall system. The use of a particular model, generally, comes with a number of assumptions which are seldom verified.

ML工程师在各种抽象层工作。对于打算将其与整个系统集成的工程师来说，底层的机器学习算法通常是一个黑匣子。通常，使用特定模型时会带来一些很少验证的假设。

ML engineers tend to use a lot of open-source packages. Generally, in the case of the python environment, a particular package comes with a slew of dependencies. If the intended outcome is achieved, fine, else another package with another slew of dependencies is tried out. There is no investment in understanding why something worked or why it has not. The packages do not have standards and more often than not there is no rigorous testing and evaluation of these models. This kind of black-box approach is detrimental to ensure the reliability of AI systems. Moreover, when correctness is itself under question, efficiency is forced to take a backseat.

ML工程师倾向于使用许多开源软件包。通常，在python环境中，特定的程序包带有大量依赖项。如果实现了预期的结果，那就很好了，否则将尝试另一个带有其他依赖项的程序包。没有任何投资来了解为什么某事有效或为何无效。这些软件包没有标准，而且经常没有对这些模型进行严格的测试和评估。这种黑匣子方法不利于确保AI系统的可靠性。而且，当正确性本身受到质疑时，效率就不得不倒退。

But the problem is not just about the best practices to be followed by an engineer. The deep learning models are currently not transparent in their working and their underlying characteristics are still a mystery to researchers. In fact, the empirical results contradict the existing theories from statistics and optimization. In fact, it has alleged that deep learning researchers behave like medieval alchemists trying to create magic akin to them trying to make gold.

但是问题不仅仅在于工程师要遵循的最佳实践。深度学习模型目前在工作上并不透明，其基本特征对于研究人员来说仍然是个谜。实际上，实证结果与统计和优化中的现有理论相矛盾。实际上，它声称深度学习研究人员的行为就像中世纪的炼金术士一样，试图创造出类似于试图制造黄金的魔法。

This lack of understanding is also a contributor, in part, to another problem which is the lack of reproducibility.

缺乏理解也是造成另一个问题的部分原因，即缺乏可重复性。

缺乏再现性 (Lack of Reproducibility)

It is expected that an algorithm published by a reputed researcher would produce the same results when correctly reimplemented, independently by someone else (either a human or a machine).

期望由著名研究人员发布的算法在正确地重新实现时，由其他人(人或机器 )独立地产生相同的结果。

Due to poor academic practices partly arising due to the hype around AI many researchers have been taking shortcuts in developing algorithms. For instance, recently it was shown that many deep learning models which were expected to outperform the existing state-of-the-art either failed to do so convincingly or with a simple heuristic applied, the traditional ML algorithm it could do the same on the prescribed data set. Another example of the malpractice is reporting only the best results from multiple runs of the algorithm while not disclosing the details of poor results.

由于不良的学术实践，部分是由于对AI的大肆宣传，许多研究人员在开发算法方面一直走捷径。例如，最近的研究表明，许多预期会胜过现有最新技术的深度学习模型要么没有令人信服地做到这一点，要么应用了简单的启发式方法，传统的ML算法可以在规定的数据集。舞弊行为的另一个例子是，仅报告算法多次运行的最佳结果，而没有透露不良结果的细节。

Adding to that the lack of understanding of deep learning models, as explained above, researchers fail to explain the part of their algorithm which could be attributed to the improvement in the results. This makes it difficult for another researcher to analyze why the results vary on reimplementation.

此外，如上所述，由于缺乏对深度学习模型的理解，研究人员未能解释其算法的一部分，这可能归因于结果的改进。这使得另一位研究者很难分析为什么重新实现时结果会有所不同。

隐藏的技术债务 (The Hidden Technical Debt)

When building AI systems, the machine learning component is minuscule while the ‘plumbing’ around it consumes the most of the effort. Scully et al, from Google, presented their work in 2015 highlighting the risk factor in building ML systems which could lead to high maintenance cost in future due to factors like the undeclared data dependencies, entanglement, software anti-patterns etc.

在构建AI系统时，机器学习组件是微不足道的，而其周围的“管道”则消耗了大部分精力。来自Google的Scully等人在2015年介绍了他们的工作，重点介绍了构建ML系统的风险因素，由于未声明的数据依赖性，纠缠，软件反模式等因素，将来可能导致较高的维护成本。

As an example, consider a scenario where data is extracted in the form of logs for a particular purpose. Another group builds an ML system (or multiple systems interdependent on each other) on top of it assuming that the data will maintain its consistency. At some point in time, if the data capture methodology or the nature of data itself is altered as suitable to the original purpose it will lead to cascading failures in the hierarchy of dependent systems.

例如，考虑一种情况，其中出于特定目的以日志形式提取数据。假设数据将保持其一致性，另一个小组将在它之上构建一个ML系统(或彼此依赖的多个系统)。在某个时间点，如果将数据捕获方法或数据本身的性质更改为适合原始目的，则将导致从属系统层次结构中的级联故障。

什么是AI系统中的分析变异性？ (What is Analytical Variability in AI systems?)

With this background, let us discuss the key manifestation of these issues in the form of the analytical variability in ML systems. As a motivation, first let us discuss recent work in neuroimaging where seventy independent teams were tasked with testing the same set of hypotheses using the same dataset. The teams were given flexibility in using their own analytical workflows and pipelines. In the end, there were large differences in the conclusions which the teams arrived at owing to the difference in their approaches. This reminds of the famous Bertrand’s paradox explained earlier.

在此背景下，让我们以ML系统中分析可变性的形式讨论这些问题的关键表现。作为动机，首先让我们讨论神经成像的最新工作，其中有70个独立的团队负责使用相同的数据集测试相同的假设集。这些团队可以灵活地使用自己的分析工作流程和管道。最后，由于方法的不同，团队得出的结论存在很大差异。这使人想起了先前解释的著名的贝特朗悖论 。

This paradox is analogous to the scenarios which arise for a typical machine learning engineer who is trying to build a system with the goal of solving a particular real-world problem. Similar to the three completely valid arguments given by Bertrand, to a problem which was given in English but for which the given solutions turned out to be corresponding to three completely different mathematical problems; the ML engineer is culpable to developing completely different versions of a system trying to solve the same problem but essentially yielding different results.

这种悖论与典型的机器学习工程师所遇到的情况类似，他们试图构建一个系统来解决特定的现实问题。与Bertrand给出的三个完全有效的论证相似，该问题用英语给出，但是针对该给定的解却对应于三个完全不同的数学问题。 ML工程师有责任开发完全不同版本的系统，以尝试解决同一问题，但实际上会产生不同的结果。

This frequently happens when the three issues described above, manifest in various ways when a data scientist or an engineer analyzes, designs and implements the ML systems.

当上述三个问题在数据科学家或工程师分析，设计和实现ML系统时以各种方式体现出来时，就会经常发生这种情况。

As AI moves towards the plateau of productivity, we would expect it to move beyond the internet companies like Facebook, Google, Twitter, etc. and play a more active role in healthcare, medicine, education, transportation, etc.

随着AI朝着生产力的高峰期发展，我们希望AI会超越Facebook，Google，Twitter等互联网公司，并在医疗保健，医药，教育，交通等领域发挥更加积极的作用。

Consider an arbitrary scenario in the Health sector where a new vaccine has been successfully developed and a recommendation system is required to identify those segments for the population where the vaccination must be commenced in the first phase. This activity is of utmost importance and stakes are much higher than for the recommender system which suggests movies for pastime. If three data scientists are given this task, they would come up with three different systems yielding vastly different outputs. Are such AI systems reliable? Why should a policymaker trust this?

考虑一下卫生部门中任意情况，其中成功开发了新疫苗，并且需要推荐系统来识别那些必须在第一阶段开始接种疫苗的人群。这项活动至关重要，赌注比推荐电影娱乐活动的推荐系统要高得多。如果给三个数据科学家一个任务，他们将提出三个不同的系统，它们产生截然不同的输出。 这样的AI系统可靠吗？ 政策制定者为什么要相信这一点？

Furthermore, if the same data scientist applies the same process all over again, is there a guarantee that he will get the same solution? With no surprise, the answer is a NO! — as described above the ML systems face massive issues with reproducibility.

此外，如果同一位数据科学家再次应用相同的过程，是否可以保证他将获得相同的解决方案？毫不奇怪，答案是否定的！ — 如上所述，机器学习系统面临着重现性的巨大问题。

This analytical variability in AI systems is extremely dangerous because it directly controls the lives of people.

人工智能系统中的这种分析变异性极其危险，因为它直接控制着人们的生活。

潜在步骤 (Potential Steps)

There is no doubt about the fact that eventually, AI is going to reach its plateau of productivity. But in order to reach there in the shortest time while building the confidence of policymakers, it is of vital importance that the AI practitioners deal with the scourge of analytical variability in AI systems.

毫无疑问，人工智能最终将达到生产力的稳定水平。但是，为了在最短时间内到达决策者的信心，人工智能从业者应对人工智能系统中分析变异性的祸害至关重要。

The following are three potential approaches to achieve this goal.

以下是实现该目标的三种潜在方法。

Develop standards for pipelines and workflows to develop AI systems. Ensure that the Machine Learning APIs which act as building blocks are equipped with strict guidelines for its usage.制定用于开发AI系统的管道和工作流程的标准。确保充当构造块的机器学习API配备了严格的使用准则。
Before using sophisticated algorithms it should be made imperative for a data scientist or an ML engineer to develop baselines using traditional ML algorithms or even a simple heuristics. With time these baselines must be standardized in the industry.在使用复杂的算法之前，必须让数据科学家或ML工程师必须使用传统的ML算法甚至是简单的启发式方法来开发基线。随着时间的流逝，这些基准必须在行业中标准化。
Software testers and their existing paradigms are not particularly useful to ‘test’ AI systems. More sophisticated adversarial AI systems should be developed to do this. In essence, it amounts to AI systems being tested by a set of other AI systems. This approach deserves an article on its own.软件测试器及其现有范例对于“测试” AI系统并不是特别有用。为此，应开发更先进的对抗性AI系统。从本质上讲，它相当于由一组其他AI系统测试的AI系统。这种方法值得单独发表。

As we move to more responsible applications of AI in domains like Police, Medicine, Agriculture, Security, etc. we must ensure that the AI systems we build are reliable and efficient.

当我们转向在警察，医学，农业，安全等领域中更负责任的AI应用程序时，我们必须确保我们构建的AI系统可靠且高效。

翻译自: https://towardsdatascience.com/the-scourge-of-analytical-variability-in-ai-systems-fc6e1ec8daae

变异数分析

查看全文

http://www.taodudu.cc/news/show-1873895.html

ai时代大学生的机遇和挑战_评估AI对美术的影响：威胁或机遇
人工智能+智能运维解决方案_人工智能驱动的解决方案可以提升您的项目管理水平
c语言机器语言汇编语言_多语言机器人新闻记者
BrainOS —最像大脑的AI
赵本山政治敏锐_每天5分钟保持敏锐的7种方法
面试问到处理过什么棘手问题_为什么调节人工智能如此棘手？
python svm向量_支持向量机（SVM）及其Python实现
游戏世界观构建_我们如何构建技术落后的世界
信任的机器_您应该信任机器人吗？
ai第二次热潮:思维的转变_基于属性的建议：科技创业公司如何使用AI来转变在线评论和建议
建立RoBERTa模型以发现Reddit小组的情绪
谷歌浏览器老是出现花_Google全新的AI平台值得您花时间吗？
nlp gpt论文_开放AI革命性的新NLP模型GPT-3
语音匹配_什么是语音匹配？
传统量化与ai量化对比_量化AI偏差的风险
ai策略机器人研究a50_跟上AI研究的策略
ai人工智能工业运用_人工智能在老年人健康中的应用
人工智能民主化无关紧要，数据孤岛以及如何建立一家AI公司
心公正白壁无瑕什么意思？_人工智能可以编写无瑕的代码后，编码会变得无用吗？
人工智能+社交 csdn_关于AI和社交媒体虚假信息，我们需要尽快进行三大讨论
标记偏见_人工智能的影响，偏见和可持续性
gpt2 代码自动补全_如果您认为GPT-3使编码器过时，则您可能不编写代码
机器学习深度学习 ai_什么是AI？从机器学习到决策自动化
艺术与机器人
中国ai人工智能发展太快_中国的AI：开放采购和幕后玩家
让我们手动计算：深入研究Logistic回归
vcenter接管_人工智能接管广告创意
人工智能ai算法_当AI算法脱轨时
人工智能企业变革_我们如何利用（人工）情报变革医院的运营管理
ai 道德_AI如何提升呼叫中心的道德水平？