机器学习测试

Testing software is one of the most complex tasks in software engineering. While in traditional software engineering there are principles that define in a non-ambiguous way how software should be tested, the same does not hold for machine learning, where testing strategies are not always defined. In this post, I elucidate a testing approach that is not only highly influenced by one of the most recognized testing strategies in software engineering — that is test-driven development. But also seems to be an approach that is agnostic from the family of machine learning models under testing, and adapts very well to the typical production environments that lead to the large scale AI/ML services of today.

测试软件是软件工程中最复杂的任务之一。尽管在传统的软件工程中，有一些原则以明确的方式定义了如何测试软件，但是对于机器学习而言却并非如此，因为机器学习并不总是定义测试策略。在本文中，我将阐明一种测试方法，该方法不仅受到软件工程中最公认的测试策略之一的影响，即测试驱动的开发。但是，这似乎也是一种与正在测试的机器学习模型家族无关的方法，并且非常适合于导致当今大规模AI / ML服务的典型生产环境。

After reading this post, you will learn how to set up a testing strategy that works for machine learning models with production in mind. Production in mind means that the team you are operating in is heterogeneous, the project under testing is developed together with other data scientists, data engineers, business customers, developers, and testers. The goals of a good testing strategy are to achieve production readiness and improve code maintainability.

阅读这篇文章后，您将学习如何设置一个针对生产的机器学习模型的测试策略。考虑到生产，意味着您所在的团队是异构的，正在测试的项目是与其他数据科学家，数据工程师，业务客户，开发人员和测试人员一起开发的。好的测试策略的目标是实现生产就绪并提高代码可维护性。

An appropriate name of the approach is Test-First machine learning, in short TFML, because everything starts from writing tests, rather than models.

这种方法的合适名称是TFML(简称TFML)，是“ 测试优先”机器学习，因为一切都始于编写测试，而不是模型。

TFML的步骤 (Steps of TFML)

A characteristic of TFML is to start from writing tests, instead of machine learning models. The approach is based on mocking whatever is not yet available so that different actors involved in the project can proceed with their tasks anyway. It is known that data scientists and data engineers run at a different pace. Mocking a particular aspect of the world that is not yet available not only mitigates such difference but also reduces blockers within larger teams. This, in turn, increases efficiency. Below are the five essential steps of a TFML approach.

TFML的一个特征是从编写测试开始，而不是从机器学习模型开始。该方法基于模拟尚不可用的内容，以便项目中涉及的不同参与者无论如何都可以继续执行其任务。众所周知，数据科学家和数据工程师的运行速度不同。模拟世界上尚不存在的特定方面，不仅可以缓解这种差异，而且可以减少较大团队中的阻碍者。反过来，这提高了效率。以下是TFML方法的五个基本步骤。

1.编写测试 (1. Write a test)

As the name suggests, Test-First in TFML indicates that everything starts with writing a test. Even for a feature that does not yet exist. Such a test is usually very short and should stay so. Larger and more complex tests should be broken down to their essential and testable components. A test can be written after understanding the feature’s specs and requirements that are usually discussed earlier during requirement analysis (e.g. use cases and user stories).

顾名思义，TFML中的Test-First表示一切都始于编写测试。即使对于尚不存在的功能。这样的测试通常很短，应该保持下去。更大和更复杂的测试应该分解为它们的基本和可测试组件。可以在了解功能的规格和要求之后编写测试，这些功能通常在需求分析(例如，用例和用户案例)中进行过讨论。

A working test will fail or pass for the right reasons. This is the step in which such reasons are defined. Defining the happy path is essential to defining what should be observed and considered a success.

正常的测试会因正确的原因而失败或通过。这是定义此类原因的步骤。定义幸福的道路对于定义应观察和认为成功的事情至关重要。

3.编写代码 (3. Write the code)

In this step, the code that leads to the happy path is actually written. This code will cause the test to pass. No other code, beyond the test’s happy path, should be provided. For example, if a machine learning model is expected to return 42, one can just return 42 and force the test to succeed here. If time constraints are needed, adding sleep(milliseconds) is also acceptable. Such mocked values will provide engineers with visible constraints such that they can proceed with their tasks as if the model was complete and working.

在此步骤中，实际编写了通往幸福道路的代码。此代码将导致测试通过。不应提供超出测试满意范围的其他代码。例如，如果预期机器学习模型将返回42，则可以仅返回42并强制测试在此处成功。如果需要时间限制，则增加sleep(milliseconds)也是可以接受的。这样的模拟值将为工程师提供可见的约束，以便他们可以像完成模型和正常工作一样继续执行任务。

4.运行测试 (4. Run tests)

Adding new tests should never break the previous ones. Having tests that depend on each other is considered an anti-pattern in software engineering.

添加新测试永远不会破坏以前的测试。相互依赖的测试被认为是软件工程中的反模式。

5.添加功能(+清理+重构) (5. Add functionality (+ cleanup + refactor))

When values are mocked, success conditions are defined and tests are running, it’s time to show that the ML model under testing is training and performing predictions. Related to the example above, some questions that should find an answer in this step are:

当模拟值，定义成功条件并运行测试时，是时候表明正在测试的ML模型正在训练和执行预测。与上面的示例相关，在此步骤中应该找到答案的一些问题是：

Is the test breaking the constraints we set previously?测试是否突破了我们先前设定的限制？
Is our ML model returning 84 rather than 42?我们的ML模型返回84而不是42吗？
How about time constraints?时间限制如何？

Traditionally, in this step developers perform code cleanup, deduplication, and refactoring (whenever it applies), to improve both readability and maintainability. This strategy should be applied to ML developers too.

传统上，开发人员在此步骤中执行代码清除，重复数据删除和重构(只要适用)，以提高可读性和可维护性。该策略也应适用于ML开发人员。

Falling in the trap of alternative approaches is easier in machine learning due to its nature and the enthusiasm of data scientists who connect-train-analyze data in no time.

在替代方法的陷阱下降，由于其性质和谁的数据科学家的热情是在机器学习更容易connect-train-analyze在任何时间的数据。

The most common approach in the data science community is probably the Test-Last approach a.k.a. code now, test later. This approach can be extremely risky in ML model development, since even for a trivial linear regression there might be just too many moving parts, compared with traditional software (e.g. UI, API calls, data streams, databases, preprocessing steps, etc.) As a matter of fact, the Test-First approach encourages and forces developers to put the minimum amount of code into modules depending on such moving parts (e.g. UIs and databases) and to implement the logic that should belong to the testable section of the codebase.

数据科学界中最普遍的方法可能是现在的Test-Last方法，也称为代码，稍后再测试 。这种方法在ML模型开发中可能具有极大的风险，因为与传统软件(例如，UI，API调用，数据流，数据库，预处理步骤等)相比，即使对于微不足道的线性回归，移动部分也可能太多。实际上，“ 测试优先”方法鼓励并迫使开发人员根据此类活动部分(例如，UI和数据库)将最少的代码放入模块中，并实施应属于代码库可测试部分的逻辑。

One important pitfall to avoid is developer bias. Tests created in a Test-First environment are usually created by the same developer who is writing the code being tested. This can be a problem e.g. if a developer does not consider certain input parameters to be checked. In that case, neither the test nor the code will verify such parameters. There is a reason why in traditional software development, testing engineers and developers are usually not the same individuals.

要避免的一个重要陷阱是开发人员的偏见 。在“测试优先”环境中创建的测试通常由编写测试代码的同一开发人员创建。例如，如果开发人员不考虑某些输入参数，则可能会出现问题。在这种情况下，测试和代码都不会验证此类参数。在传统的软件开发中，测试工程师和开发人员通常不是同一个人，这是有原因的。

TFML反模式 (TFML anti-patterns)

Below are some anti-patterns in TFML.

以下是TFML中的一些反模式。

测试依赖 (Test dependence)

Tests should be standalone. Tests that depend on others can lead to cascading failures or success out of the developer’s control.

测试应该是独立的。依赖其他测试的测试可能会导致级联的失败或成功，而这是开发人员无法控制的。

精确测试模型 (Test model precisely)

As in traditional software engineering, testing precise execution behavior, timing or performance can lead to test failure. In machine learning, it is even more important to consider soft constraints because models can be probabilistic. Moreover, the ranges of output variables and input data can change. Such a dynamic and sometimes loosely defined behavior is the norm rather than the exception in ML.

与传统软件工程中一样，测试精确的执行行为，时序或性能可能会导致测试失败。在机器学习中，考虑软约束更为重要，因为模型可能是概率性的。而且，输出变量和输入数据的范围可以改变。这种动态的，有时是宽松定义的行为是规范，而不是ML中的例外。

测试模型的数学细节 (Test model’s mathematical details)

Testing model implementation details such as statistical and mathematical soundness are not part of the TFML strategy. Such details should be tested separately and are specific to the family of the model under consideration.

测试模型实现的详细信息(例如统计和数学上的正确性)不是TFML策略的一部分。此类详细信息应单独测试，并且特定于所考虑的模型系列。

大型测试装置 (Large testing unit)

The testing surface should always be minimal for the functionality under test. Keeping the testing unit small gives more control to the developer. Larger testing units should be broken down into smaller tests, specialized in one particular aspect of the models to be tested.

对于被测功能，测试表面应始终保持最小。保持测试单元较小可以为开发人员提供更多控制权。较大的测试单元应细分为较小的测试，专门针对要测试的模型的特定方面。

结论 (Conclusion)

The TFML approach forces developers to spend initial time defining the testing strategy for their models. This in turn facilitates the integration of such models in the bigger picture of complex engineering systems where larger teams are involved. It has been observed that programmers who write more tests tend to be more productive. Testing code is as important as developing software core functionality. Testing code should be produced and maintained with the same rigor as production code. In ML all this becomes even more critical, due to the heterogeneity of the systems and the people involved in ML projects.

TFML方法迫使开发人员花费初始时间来定义其模型的测试策略。反过来，这有助于在涉及较大团队的复杂工程系统的更大范围内集成此类模型。据观察，编写更多测试的程序员往往会提高工作效率。测试代码与开发软件核心功能一样重要。测试代码的生产和维护应与生产代码相同。在ML中，由于系统和参与ML项目的人员的异质性，所有这些变得更加关键。

Originally published at https://codingossip.github.io on August 4, 2020.

最初于 2020年8月4日 发布在 https://codingossip.github.io 。

翻译自: https://medium.com/swlh/test-first-machine-learning-8d2cadc3ffe

机器学习测试

查看全文

http://www.taodudu.cc/news/show-863420.html

azure机器学习_Microsoft Azure机器学习x Udacity —第4课笔记
机器学习嵌入式实现_机器学习中的嵌入
无监督学习 k-means_无监督学习-第3部分
linkedin爬虫_机器学习的学生和从业者的常见问题在LinkedIn上提问
lime 深度学习_用LIME解释机器学习预测并建立信任
神经网络梯度下降_梯度下降优化器对神经网络训练的影响
深度学习实践:计算机视觉_深度学习与传统计算机视觉技术：您应该选择哪个？
卷积神经网络如何解释和预测图像
深度学习正则化正则化率_何时以及如何在深度学习中使用正则化
杨超越微数据_资料来源同意：数据科学技能超越数据
统计概率分布_概率统计中的重要分布
人口预测和阻尼-增长模型_使用分类模型预测利率-第1部分
基于kb的问答系统_1KB以下基于表的Q学习
图论为什么这么难_图论是什么，为什么要关心？
使用RNN和TensorFlow创建自己的Harry Potter短故事
bitnami如何使用_使用Bitnami获取完全配置的Apache Airflow Docker开发堆栈
cox风险回归模型参数估计_信用风险管理：分类模型和超参数调整
支持向量机回归分析_支持向量机和回归分析
ai/ml_您本周应阅读的有趣的AI / ML文章（8月15日）
chime-4 lstm_CHIME-6挑战赛回顾
文本文件加密和解密_解密文本见解和相关业务用例
有关糖尿病模型建立的论文_预测糖尿病结果的模型比较
chi-squared检验_每位数据科学家都必须具备Chi-S方检验统计量：客户流失中的案例研究
深度学习：在图像上找到手势_使用深度学习的人类情绪和手势检测器：第2部分
爆破登录测试网页_预测危险的地震爆破第一部分：EDA，特征工程和针对不平衡数据集的列车测试拆分
概率论在数据挖掘_为什么概率论在数据科学中很重要
集合计数二项式反演_对计数数据使用负二项式
使用TorchElastic训练DeepSpeech
神经网络架构搜索_神经网络架构
raspberry pi_通过串行蓝牙从Raspberry Pi传感器单元发送数据

机器学习测试_测试优先机器学习相关推荐

机器学习训练验证测试_测试前验证| 机器学习
机器学习训练验证测试 In my previous article, we have discussed about the need to train and test our model and ...
机器学习股票_使用概率机器学习来改善您的股票交易
机器学习股票 Note from Towards Data Science's editors: While we allow independent authors to publish artic ...
机器学习初学者_绝对初学者的机器学习
机器学习初学者 In recent times, we hear these few things a lot: 最近,我们经常听到以下几件事: First is -The COVID-19 Pand ...
机器学习中qa测试_如何对机器学习做单元测试
作者:Chase Roberts 编译:ronghuaiyang 导读养成良好的单元测试的习惯,真的是受益终身的,特别是机器学习代码,有些bug真不是看看就能看出来的. 在过去的一年里,我把大部分的 ...
深度学习机器学习基础_实用的机器学习基础
深度学习机器学习基础 This article describes my attempt at the Titanic Machine Learning competition on Kaggle. ...
用python做问答测试_测试用户输入Python
我在用Python测试代码的输入时遇到了问题.我尝试了几个解决方案,但有一些东西我遗漏了,所以如果你能给我一些建议,我将不胜感激.在首先,这里是我要测试的主代码文件的一个片段:if __name__ ...
python编程能力等级测试_测试不得不知的python编程小技能-----升级版基础语法和优秀的编码习惯...
编程和学习python,最后快速上手.能写小工具,写自动化用例这类要求对鹅厂的测试人员来说都是一些基础的必备素质,但是一个优秀的测试,也是有着一颗开发完美测试工具的心的.但是罗马不是一天构建成,特别是 ...
键盘连击测试_测试梗欢迎补充
❝ 朋友间闲谈的谈资收集. 连击变蓝在网页端空白页面处,连续点击三次,页面文字内容变成了蓝色背景. 浏览器换图标客户安装IE反馈不兼容,投诉被移交到测试部门,苦口婆心劝解无效,一小哥用软件把Chr ...
对象测试_测试｜你挑恋爱对象的眼光准不准？
现在越来越多收入高,颜值高,要求高的妹纸都选择了单身一方面是觉得自己过得不错另一方面是因为自己年轻时受过的伤太多好男人确实是要万里挑一的想知道自己有没有挑出好男人的眼光除了谈N次恋爱以外下 ...

机器学习测试_测试优先机器学习

TFML的步骤 (Steps of TFML)

1.编写测试 (1. Write a test)

3.编写代码 (3. Write the code)

4.运行测试 (4. Run tests)

5.添加功能(+清理+重构) (5. Add functionality (+ cleanup + refactor))

TFML反模式 (TFML anti-patterns)

测试依赖 (Test dependence)

精确测试模型 (Test model precisely)

测试模型的数学细节 (Test model’s mathematical details)

大型测试装置 (Large testing unit)

结论 (Conclusion)

相关文章：

机器学习测试_测试优先机器学习相关推荐

最新文章

热门文章

机器学习 测试_测试优先机器学习

TFML的步骤 (Steps of TFML)

1.编写测试 (1. Write a test)

3.编写代码 (3. Write the code)

4.运行测试 (4. Run tests)

5.添加功能(+清理+重构) (5. Add functionality (+ cleanup + refactor))

TFML反模式 (TFML anti-patterns)

测试依赖 (Test dependence)

精确测试模型 (Test model precisely)

测试模型的数学细节 (Test model’s mathematical details)

大型测试装置 (Large testing unit)

结论 (Conclusion)

相关文章：

机器学习 测试_测试优先机器学习相关推荐

最新文章

热门文章

机器学习测试_测试优先机器学习

机器学习测试_测试优先机器学习相关推荐