Paper：《Hidden Technical Debt in Machine Learning Systems—机器学习系统中隐藏的技术债》翻译与解读

导读：机器学习系统中，隐藏多少技术债呢？这篇文章以讲述DS整个流程为案例，深刻剖析了DS的长期价值，从长期考虑如何避免维护成本的上升。文章还强调了一点，模型本身再整个产品链中只占很小的一块(虽然时核心模块)。

《Hidden Technical Debt in Machine Learning Systems》翻译与解读

Abstract

1 Introduction

2 Complex Models Erode Boundaries

3 Data Dependencies Cost More than Code Dependencies

4 Feedback Loops

5 ML-System Anti-Patterns

6 Configuration Debt

7 Dealing with Changes in the ExternalWorld

9 Conclusions: Measuring Debt and Paying it Off

Acknowledgments

《Hidden Technical Debt in Machine Learning Systems》翻译与解读

链接	https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf
作者	D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips {dsculley,gholt,dgg,edavydov,toddphillips}@google.com Google, Inc. Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Franc¸ois Crespo, Dan Dennison {ebner,vchaudhary,mwyoung,jfcrespo,dennison}@google.com
发布时间	NIPS， 2016年

Abstract

Machine learning offers a fantastically powerful toolkit for building useful com-plex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we ﬁnd it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-speciﬁc risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, conﬁguration issues, changes in the external world, and a variety of system-level anti-patterns.

机器学习为快速构建有用的复杂预测系统提供了一个非常强大的工具包。本篇论文认为，将这些快速的胜利视为免费的，是很危险的。利用技术债的软件工程框架，我们发现在真实世界的ML系统中，经常会产生大量的持续维护成本。我们探讨了在系统设计中考虑的几个ML特定的风险因素。这些问题包括边界侵蚀、纠缠、隐藏的反馈循环、未声明的访问者、数据依赖、配置问题、外部世界的更改以及各种系统层面的反模式。

1 Introduction

As the machine learning (ML) community continues to accumulate years of experience with live systems, a wide-spread and uncomfortable trend has emerged: developing and deploying ML sys-tems is relatively fast and cheap, but maintaining them over time is difﬁcult and expensive.

This dichotomy can be understood through the lens of technical debt, a metaphor introduced by Ward Cunningham in 1992 to help reason about the long term costs incurred by moving quickly in software engineering. As with ﬁscal debt, there are often sound strategic reasons to take on technical debt. Not all debt is bad, but all debt needs to be serviced. Technical debt may be paid down by refactoring code, improving unit tests, deleting dead code, reducing dependencies, tightening APIs, and improving documentation [8]. The goal is not to add new functionality, but to enable future improvements, reduce errors, and improve maintainability. Deferring such payments results in compounding costs. Hidden debt is dangerous because it compounds silently.

随着机器学习(ML)社区在使用实时系统方面持续积累了多年的经验，出现了一种普遍且令人不安的趋势：开发和部署ML系统相对快速和廉价，但随着时间的推移，维护它们既困难又昂贵。

这种二分法可以从技术债的角度来理解，技术债是Ward Cunningham在1992年提出的一个比喻，用来解释软件工程快速发展所产生的长期成本。与财政债务一样，承担技术债务通常有合理的战略原因。并非所有的债务都是坏的，但所有的债务都需要偿还。技术债务可以通过重构代码、改进单元测试、删除无用代码、减少依赖、收紧API和改进文档[8]来偿还。目标不是添加新功能，而是支持未来的改进、减少错误和提高可维护性。推迟支付会导致复利成本。隐性债务之所以危险，是因为它悄无声息地复利。

In this paper, we argue that ML systems have a special capacity for incurring technical debt, because they have all of the maintenance problems of traditional code plus an additional set of ML-speciﬁc issues. This debt may be difﬁcult to detect because it exists at the system level rather than the code level. Traditional abstractions and boundaries may be subtly corrupted or invalidated by the fact that data inﬂuences ML system behavior. Typical methods for paying down code level technical debt are not sufﬁcient to address ML-speciﬁc technical debt at the system level.

This paper does not offer novel ML algorithms, but instead seeks to increase the community’s aware-ness of the difﬁcult tradeoffs that must be considered in practice over the long term. We focus on system-level interactions and interfaces as an area where ML technical debt may rapidly accumulate. At a system-level, an ML model may silently erode abstraction boundaries. The tempting re-use or chaining of input signals may unintentionally couple otherwise disjoint systems. ML packages may be treated as black boxes, resulting in large masses of “glue code” or calibration layers that can lock in assumptions. Changes in the external world may inﬂuence system behavior in unintended ways. Even monitoring ML system behavior may prove difﬁcult without careful design.

在本文中，我们认为ML系统具有招致技术债务的特殊能力，因为它们具有传统代码的所有维护问题以及一组额外的ML特定问题。这种债务可能很难检测，因为它存在于系统级别而不是代码级别。由于数据影响ML系统行为，传统的抽象和边界可能会被微妙地破坏或失效。降低代码级技术债务的典型方法不足以解决系统级特定于ML的技术债务。

本文没有提供新的ML算法，而是试图提高社区对长期实践中必须考虑的困难权衡的认识。我们关注系统级的交互和接口，这是ML技术债可能迅速积累的领域。在系统级别，ML模型可能会悄悄地侵蚀抽象边界。诱人的重复使用或链接输入信号可能无意中耦合其他原本不相连的系统。ML包可能被视为黑盒，导致大量的“粘合代码”或校准层，这些层可能锁定在假设中。外部世界的变化可能会以意想不到的方式影响系统行为。如果没有精细的设计，甚至监视ML系统的行为也可能被证明是困难的。

2 Complex Models Erode Boundaries

Traditional software engineering practice has shown that strong abstraction boundaries using en-capsulation and modular design help create maintainable code in which it is easy to make isolated changes and improvements. Strict abstraction boundaries help express the invariants and logical consistency of the information inputs and outputs from an given component [8].

Unfortunately, it is difﬁcult to enforce strict abstraction boundaries for machine learning systems by prescribing speciﬁc intended behavior. Indeed, ML is required in exactly those cases when the desired behavior cannot be effectively expressed in software logic without dependency on external data. The real world does not ﬁt into tidy encapsulation. Here we examine several ways that the resulting erosion of boundaries may signiﬁcantly increase technical debt in ML systems.

传统的软件工程实践表明，使用封装和模块化设计的强抽象边界有助于创建可维护的代码，在这些代码中可以很容易地进行独立的更改和改进。严格的抽象边界有助于表达来自给定组件[8]的信息输入和输出的不变性和逻辑一致性。

不幸的是，很难通过规定特定的预期行为来对机器学习系统实施严格的抽象边界。实际上，当不依赖于外部数据而无法用软件逻辑有效地表达所需的行为时，ML正是需要的。现实世界并不适合整洁的封装。在这里，我们研究了几种导致边界侵蚀的方法，这些方法可能会显著增加ML系统中的技术债务。

Entanglement. Machine learning systems mix signals together, entangling them and making iso-lation of improvements impossible. For instance, consider a system that uses features x1, ...xn in a model. If we change the input distribution of values in x1, the importance, weights, or use of the remaining n − 1 features may all change. This is true whether the model is retrained fully in a batch style or allowed to adapt in an online fashion. Adding a new feature xn+1 can cause similar changes, as can removing any feature xj . No inputs are ever really independent. We refer to this here as the CACE principle: Changing Anything Changes Everything. CACE applies not only to input signals, but also to hyper-parameters, learning settings, sampling methods, convergence thresholds, data selection, and essentially every other possible tweak.

One possible mitigation strategy is to isolate models and serve ensembles. This approach is useful in situations in which sub-problems decompose naturally such as in disjoint multi-class settings like [14]. However, in many cases ensembles work well because the errors in the component models are uncorrelated. Relying on the combination creates a strong entanglement: improving an individual component model may actually make the system accuracy worse if the remaining errors are more strongly correlated with the other components.

A second possible strategy is to focus on detecting changes in prediction behavior as they occur. One such method was proposed in [12], in which a high-dimensional visualization tool was used to allow researchers to quickly see effects across many dimensions and slicings. Metrics that operate on a slice-by-slice basis may also be extremely useful.

纠缠。机器学习系统将信号混合在一起，将它们纠缠在一起，使得孤立的改进变得不可能。例如，考虑一个在模型中使用特征 x1, ...xn 的系统。如果我们改变 x1 中值的输入分布，其余 n-1 个特征的重要性、权重或使用都可能发生变化。无论模型以批处理方式完全重新训练，还是允许以在线方式进行调整，都是如此。添加新特征 xn+1 会导致类似的变化，删除任何特征 xj 也可能会导致类似的变化。没有任何输入是真正独立的。我们在此将其称为 CACE 原则：改改变任何事物都会改变一切。 CACE 不仅适用于输入信号，还适用于超参数、学习设置、采样方法、收敛阈值、数据选择，以及基本上所有其他可能的调整。

一种可能的缓解战略是隔离模型并为集成服务。这种方法在子问题自然分解的情况下非常有用，例如在不相交的多类设置中，如[14]。然而，在许多情况下，集成效果很好，因为组件模型中的错误是不相关的。依赖于这种组合会产生一种强纠缠:如果剩余的误差与其他组件的相关性更强，那么改进单个组件模型实际上可能会使系统的精度更差。

第二种可能的策略是，专注于检测预测行为发生的变化。其中一种方法是在[12]中提出的，在该方法中，使用了高维可视化工具，以便研究人员快速地看到跨多维和切片的效果。在逐个切片的基础上运行的指标也可能非常有用。

Correction Cascades. There are often situations in which model ma for problem A exists, but a solution for a slightly different problem A′ is required. In this case, it can be tempting to learn a model m′a that takes ma as input and learns a small correction as a fast way to solve the problem.

However, this correction model has created a new system dependency on ma, making it signiﬁcantly more expensive to analyze improvements to that model in the future. The cost increases when correction models are cascaded, with a model for problem A′′ learned on top of m′a, and so on, for several slightly different test distributions. Once in place, a correction cascade can create an improvement deadlock, as improving the accuracy of any individual component actually leads to system-level detriments. Mitigation strategies are to augment ma to learn the corrections directly within the same model by adding features to distinguish among the cases, or to accept the cost of creating a separate model for A′.

校正级联。经常存在问题A的模型ma存在的情况，但需要一个稍微不同的问题A'的解决方案。在这种情况下，学习以ma为输入的模型m 'a，并学习一个小的修正作为快速解决问题的方法是很有诱惑力的。

然而，这种修正模型已经对ma产生了新的系统依赖，使得分析未来对该模型的改进变得更加昂贵。当校正模型级联时，成本会增加，在m 'a的基础上学习问题A的模型，依此类推，对于几个略有不同的测试分布。纠正级联一旦就位，就会造成改进僵局，因为提高任何单个组件的准确性实际上会导致系统级的损害。缓解策略是通过添加特征来区分案例来增加 ma 以直接在同一模型内学习更正，或者接受为 A' 创建单独模型的成本。

Undeclared Consumers. Oftentimes, a prediction from a machine learning model ma is made widely accessible, either at runtime or by writing to ﬁles or logs that may later be consumed by other systems. Without access controls, some of these consumers may be undeclared, silently using the output of a given model as an input to another system. In more classical software engineering, these issues are referred to as visibility debt [13].

Undeclared consumers are expensive at best and dangerous at worst, because they create a hidden tight coupling of model ma to other parts of the stack. Changes to ma will very likely impact these other parts, potentially in ways that are unintended, poorly understood, and detrimental. In practice, this tight coupling can radically increase the cost and difﬁculty of making any changes to ma at all, even if they are improvements. Furthermore, undeclared consumers may create hidden feedback loops, which are described more in detail in section 4.

Undeclared consumers may be difﬁcult to detect unless the system is speciﬁcally designed to guard against this case, for example with access restrictions or strict service-level agreements (SLAs). In the absence of barriers, engineers will naturally use the most convenient signal at hand, especially when working against deadline pressures.

未申报的访问者。通常情况下，机器学习模型ma的预测可以被广泛访问，无论是在运行时还是通过写入文件或日志，这些文件或日志稍后可能会被其他系统使用。在没有访问控制的情况下，这些访问者中的一些可能是未声明的，默默地使用给定模型的输出作为另一个系统的输入。在更经典的软件工程中，这些问题被称为可见性债务[13]。

未声明的访问者在最好的情况下是昂贵的，在最坏的情况下是危险的，因为它们创建了模型ma与堆栈的其他部分的一个隐藏的紧密耦合。ma 的变化很可能会影响这些其他部分，可能会以意想不到的、知之甚少和有害的方式影响。在实践中，这种紧密耦合会从根本上增加对ma进行任何更改的成本和难度，即使这些更改是改进。此外，未声明的访问者可能会创建隐藏的反馈循环，这将在第4节中详细描述。

除非系统专门设计来防范这种情况，未声明的访问者可能很难检测到，例如使用访问限制或严格的服务水平协议(SLA)。在没有障碍的情况下，工程师自然会使用手头最方便的信号，尤其是在面临截止日期压力的情况下。

3 Data Dependencies Cost More than Code Dependencies

In [13], dependency debt is noted as a key contributor to code complexity and technical debt in classical software engineering settings. We have found that data dependencies in ML systems carry a similar capacity for building debt, but may be more difﬁcult to detect. Code dependencies can be identiﬁed via static analysis by compilers and linkers. Without similar tooling for data dependencies, it can be inappropriately easy to build large data dependency chains that can be difﬁcult to untangle.

在[13]中，依赖债被认为是经典软件工程设置中导致代码复杂性和技术债的关键因素。我们发现ML系统中的数据依赖具有类似的构建债务的能力，但可能更难检测。代码依赖可以通过编译器和链接器的静态分析来识别。如果没有类似的数据依赖关系工具，那么构建难以理清的大型数据依赖关系链就会变得不太容易。

Unstable Data Dependencies. To move quickly, it is often convenient to consume signals as input features that are produced by other systems. However, some input signals are unstable, meaning that they qualitatively or quantitatively change behavior over time. This can happen implicitly, when the input signal comes from another machine learning model itself that updates over time, or a data-dependent lookup table, such as for computing TF/IDF scores or semantic mappings. It can also happen explicitly, when the engineering ownership of the input signal is separate from the engineering ownership of the model that consumes it. In such cases, updates to the input signal may be made at any time. This is dangerous because even “improvements” to input signals may have arbitrary detrimental effects in the consuming system that are costly to diagnose and address. For example, consider the case in which an input signal was previously mis-calibrated. The model consuming it likely ﬁt to these mis-calibrations, and a silent update that corrects the signal will have sudden ramiﬁcations for the model.

One common mitigation strategy for unstable data dependencies is to create a versioned copy of a given signal. For example, rather than allowing a semantic mapping of words to topic clusters to change over time, it might be reasonable to create a frozen version of this mapping and use it until such a time as an updated version has been fully vetted. Versioning carries its own costs, however, such as potential staleness and the cost to maintain multiple versions of the same signal over time.

不稳定数据依赖关系。为了快速移动，通常可以方便地将信号作为其他系统产生的输入特征使用。然而，一些输入信号是不稳定的，这意味着它们会随着时间的推移而定性或定量地改变行为。当输入信号来自另一个机器学习模型本身(该模型会随着时间的推移而更新)或依赖于数据的查找表(如TF/IDF分数计算或语义映射)时，这种情况可能会隐式发生。当输入信号的工程所有权与使用信号的模型的工程所有权分离时，这种情况也可以显式发生。在这种情况下，可以随时更新输入信号。这是危险的，因为即使是对输入信号的“改进”也可能对消费系统产生任意的不利影响，而这些影响的诊断和解决成本很高。例如，考虑输入信号之前被错误校准的情况。使用它的模型很可能符合这些错误的校准，而一个无声的修正信号的更新将对模型产生突然的影响。

对于不稳定的数据依赖关系，一种常见的缓解策略是创建给定信号的版本控制副本。例如，与其允许词汇到主题集群的语义映射随着时间的推移而改变，不如创建此映射的冻结版本，并在完全审查更新版本之前使用它。然而，版本控制也有它自己的成本，比如潜在的过时性，以及随着时间的推移维护同一个信号的多个版本的成本。

Underutilized Data Dependencies. In code, underutilized dependencies are packages that are mostly unneeded [13]. Similarly, underutilized data dependencies are input signals that provide little incremental modeling beneﬁt. These can make an ML system unnecessarily vulnerable to change, sometimes catastrophically so, even though they could be removed with no detriment.

As an example, suppose that to ease the transition from an old product numbering scheme to new product numbers, both schemes are left in the system as features. New products get only a new number, but old products may have both and the model continues to rely on the old numbers for some products. A year later, the code that stops populating the database with the old numbers is deleted. This will not be a good day for the maintainers of the ML system.

Underutilized data dependencies can creep into a model in several ways.

(1)、Legacy Features. The most common case is that a feature F is included in a model early in its development. Over time, F is made redundant by new features but this goes undetected.

(2)、Bundled Features. Sometimes, a group of features is evaluated and found to be beneﬁcial. Because of deadline pressures or similar effects, all the features in the bundle are added to the model together, possibly including features that add little or no value.

(3)、ǫ-Features. As machine learning researchers, it is tempting to improve model accuracy even when the accuracy gain is very small or when the complexity overhead might be high.

(4)、Correlated Features. Often two features are strongly correlated, but one is more directly causal. Many ML methods have difﬁculty detecting this and credit the two features equally, or may even pick the non-causal one. This results in brittleness if world behavior later changes the correlations.

Underutilized dependencies can be detected via exhaustive leave-one-feature-out evaluations. These should be run regularly to identify and remove unnecessary features.

未充分利用数据依赖关系。在代码中，未被充分利用的依赖项是大多数不需要的包[13]。类似地，未充分利用的数据依赖性是提供很少增量建模收益的输入信号。这些可能会使ML系统不必要地容易受到更改的影响，有时甚至是灾难性的，即使它们可以被删除而不会造成损害。

例如，假设为了简化从旧产品编号方案到新产品编号的转换，两个方案都作为特性保留在系统中。新产品只有一个新编号，但旧产品可能两者都有，并且对于某些产品，模型继续依赖于旧编号。一年后，停止用旧号码填充数据库的代码被删除。对于ML系统的维护者来说，这将不是一个好日子。

未充分利用的数据依赖关系可以通过多种方式渗透到模型中。

(1)、遗留特性。最常见的情况是，特性F在开发的早期就包含在模型中。随着时间的推移，新特征使 F 变得多余，但这没有被发现。

(2)、捆绑功能。有时，一组特征被评估并发现是有益的。由于截止日期的压力或类似的影响，捆绑包中的所有特性都被添加到模型中，可能包括增加很少或没有价值的特性。

(3)、ǫ特性。作为机器学习研究人员，即使在精度增益非常小或复杂性开销可能很高的情况下，提高模型精度提高模型精度也是诱人的。

(4)、关联特征。通常两个特征是紧密相关的，但其中一个具有更直接的因果关系。许多ML方法很难检测到这一点，并且对这两个特征同等重视，甚至可能会选择非因果特征。如果世界行为后来改变了相关性，这会导致脆弱性。

可以通过详尽的留一功能评估来检测未充分利用的依赖项。这些应该定期运行以识别和删除不必要的功能。

Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.

图1:只有一小部分真实世界的ML系统由ML代码组成，如中间的小黑盒子所示。所需的周边基础设施庞大而复杂。

Static Analysis of Data Dependencies. In traditional code, compilers and build systems perform static analysis of dependency graphs. Tools for static analysis of data dependencies are far less common, but are essential for error checking, tracking down consumers, and enforcing migration and updates. One such tool is the automated feature management system described in [12], which enables data sources and features to be annotated. Automated checks can then be run to ensure that all dependencies have the appropriate annotations, and dependency trees can be fully resolved. This kind of tooling can make migration and deletion much safer in practice.

数据依赖的静态分析。在传统的代码中，编译器和构建系统执行依赖图的静态分析。用于数据依赖性静态分析的工具并不常见，但对于错误检查、追踪访问者以及强制迁移和更新至关重要。其中一个工具是[12]中描述的自动化特性管理系统，它支持对数据源和特性进行注释。然后可以运行自动检查，以确保所有依赖项都有适当的注释，并且可以完全解析依赖项树。这种工具可以使迁移和删除在实践中更加安全。

4 Feedback Loops

One of the key features of live ML systems is that they often end up inﬂuencing their own behavior if they update over time. This leads to a form of analysis debt, in which it is difﬁcult to predict the behavior of a given model before it is released. These feedback loops can take different forms, but they are all more difﬁcult to detect and address if they occur gradually over time, as may be the case when models are updated infrequently.

实时ML系统的关键特征之一是，如果它们随着时间的推移进行更新，它们通常最终会影响自己的行为。这导致了一种形式的分析债务，其中很难在发布之前预测给定模型的行为。这些反馈循环可以采取不同的形式，但如果它们随着时间的推移逐渐发生，则它们都更难以检测和解决，例如模型更新不频繁时的情况。

Direct Feedback Loops. A model may directly inﬂuence the selection of its own future training data. It is common practice to use standard supervised algorithms, although the theoretically correct solution would be to use bandit algorithms. The problem here is that bandit algorithms (such as contextual bandits [9]) do not necessarily scale well to the size of action spaces typically required for real-world problems. It is possible to mitigate these effects by using some amount of randomization [3], or by isolating certain parts of data from being inﬂuenced by a given model.

直接的反馈循环。一个模型可能直接影响它自己未来训练数据的选择。通常的做法是使用标准的监督算法，尽管理论上正确的解决方案是使用老虎机算法。这里的问题是，老虎机算法(如上下文老虎机[9])不一定能很好地适应现实世界问题通常需要的行动空间大小。可以通过使用一定数量的随机化[3]，或通过隔离特定部分的数据不受给定模型的影响来减轻这些影响。

Hidden Feedback Loops. Direct feedback loops are costly to analyze, but at least they pose a statistical challenge that ML researchers may ﬁnd natural to investigate [3]. A more difﬁcult case is hidden feedback loops, in which two systems inﬂuence each other indirectly through the world.

One example of this may be if two systems independently determine facets of a web page, such as one selecting products to show and another selecting related reviews. Improving one system may lead to changes in behavior in the other, as users begin clicking more or less on the other components in reaction to the changes. Note that these hidden loops may exist between completely disjoint systems. Consider the case of two stock-market prediction models from two different investment companies. Improvements (or, more scarily, bugs) in one may inﬂuence the bidding and buying behavior of the other.

隐藏的反馈循环。直接反馈循环的分析成本很高，但至少它们提出了一个统计挑战，ML 研究人员可能会发现研究 [3] 是很自然的。一个更困难的情况是隐藏的反馈回路，其中两个系统通过世界间接相互影响。

这方面的一个例子可能是两个系统独立地确定一个网页的方面，例如一个选择要显示的产品，另一个选择相关的评论。改进一个系统可能会导致另一个系统的行为发生变化，因为用户开始或多或少地点击其他组件以响应这些变化。注意，这些隐藏循环可能存在于完全不相连的系统之间。考虑一下来自两家不同投资公司的两种股市预测模型。其中一个的改进(或者更可怕的bug)可能会影响另一个的投标和购买行为。

5 ML-System Anti-Patterns

It may be surprising to the academic community to know that only a tiny fraction of the code in many ML systems is actually devoted to learning or prediction – see Figure 1. In the language of Lin and Ryaboy, much of the remainder may be described as “plumbing” [11].

It is unfortunately common for systems that incorporate machine learning methods to end up with high-debt design patterns. In this section, we examine several system-design anti-patterns [4] that can surface in machine learning systems and which should be avoided or refactored where possible.

学术界可能会惊讶地发现，在许多ML系统中，只有一小部分代码实际上用于学习或预测——参见图1。在 Lin 和 Ryaboy 的语言中，其余大部分可以描述为“管道”[11]。

不幸的是，对于采用机器学习方法的系统来说，最终会出现高负债的设计模式。在本节中，我们将研究机器学习系统中可能出现的几个系统设计反模式[4]，这些反模式应该在可能的情况下避免或重构。

Glue Code. ML researchers tend to develop general purpose solutions as self-contained packages. A wide variety of these are available as open-source packages at places like mloss.org, or from in-house code, proprietary packages, and cloud-based platforms.

Using generic packages often results in a glue code system design pattern, in which a massive amount of supporting code is written to get data into and out of general-purpose packages. Glue code is costly in the long term because it tends to freeze a system to the peculiarities of a speciﬁc package; testing alternatives may become prohibitively expensive. In this way, using a generic package can inhibit improvements, because it makes it harder to take advantage of domain-speciﬁc properties or to tweak the objective function to achieve a domain-speciﬁc goal. Because a mature system might end up being (at most) 5% machine learning code and (at least) 95% glue code, it may be less costly to create a clean native solution rather than re-use a generic package.

An important strategy for combating glue-code is to wrap black-box packages into common API’s. This allows supporting infrastructure to be more reusable and reduces the cost of changing packages.

胶水代码。ML研究人员倾向于将通用解决方案开发为自包含的包。在mloss.org这样的网站上可以找到各种各样的开源包，也可以从内部代码、专有包和基于云的平台上获得。

使用通用包通常会导致胶水代码系统设计模式，其中编写了大量支持代码以将数据传入和传出通用包。从长远来看，粘合代码的成本很高，因为它倾向于将系统冻结在特定包的特性上;测试替代品可能变得非常昂贵。以这种方式，使用通用包会抑制改进，因为它使得利用特定领域的属性或调整目标函数来实现特定领域的目标变得更加困难。因为一个成熟的系统最终可能是（最多）5% 的机器学习代码和（至少）95% 的胶水代码，所以创建一个干净的本地解决方案可能比重用一个通用的包成本更低。

对付粘接代码的一个重要策略是将黑盒包封装到通用API中。这使得支持的基础设施更加可重用，并降低了更改包的成本。

Pipeline Jungles. As a special case of glue code, pipeline jungles often appear in data prepara-tion. These can evolve organically, as new signals are identiﬁed and new information sources added incrementally. Without care, the resulting system for preparing data in an ML-friendly format may become a jungle of scrapes, joins, and sampling steps, often with intermediate ﬁles output. Man-aging these pipelines, detecting errors and recovering from failures are all difﬁcult and costly [1]. Testing such pipelines often requires expensive end-to-end integration tests. All of this adds to technical debt of a system and makes further innovation more costly.

Pipeline jungles can only be avoided by thinking holistically about data collection and feature ex-traction. The clean-slate approach of scrapping a pipeline jungle and redesigning from the ground up is indeed a major investment of engineering effort, but one that can dramatically reduce ongoing costs and speed further innovation.

管道丛林。作为胶水代码的一种特例，管道丛林常常出现在数据准备中。随着新信号的识别和新信息来源的逐渐增加，这些可以有机地发展。如果不小心，以 ML 友好格式准备数据的最终系统可能会变成一堆刮擦、连接和采样步骤，通常带有中间文件输出。管理这些管道、检测错误和从故障中恢复都是非常困难和昂贵的。测试这类管道通常需要昂贵的端到端集成测试。所有这些都增加了系统的技术债务，并使进一步创新的成本更高。

只有从整体上考虑数据收集和特征提取，才能避免管道丛林。这是一种全新的方法，即拆除丛林般的管道，从头开始重新设计，这确实是一项重大的工程投资，但它可以显著降低持续成本，加速进一步的创新。

Glue code and pipeline jungles are symptomatic of integration issues that may have a root cause in overly separated “research” and “engineering” roles. When ML packages are developed in an ivory-tower setting, the result may appear like black boxes to the teams that employ them in practice. A hybrid research approach where engineers and researchers are embedded together on the same teams (and indeed, are often the same people) can help reduce this source of friction signiﬁcantly [16].

胶水代码和管道丛林是集成问题的症状，这些问题的根本原因可能是过度分离的“研究”和“工程”角色。当ML包在象牙塔环境中开发时，对于实际使用它们的团队来说，结果可能看起来像黑盒。一种混合的研究方法，即工程师和研究人员被嵌入到同一个团队中(实际上，通常是同一个人)，可以大大减少这种摩擦的来源。

Dead Experimental Codepaths. A common consequence of glue code or pipeline jungles is that it becomes increasingly attractive in the short term to perform experiments with alternative methods by implementing experimental codepaths as conditional branches within the main production code. For any individual change, the cost of experimenting in this manner is relatively low—none of the surrounding infrastructure needs to be reworked. However, over time, these accumulated codepaths can create a growing debt due to the increasing difﬁculties of maintaining backward compatibility and an exponential increase in cyclomatic complexity. Testing all possible interactions between codepaths becomes difﬁcult or impossible. A famous example of the dangers here was Knight Capital’s system losing $465 million in 45 minutes, apparently because of unexpected behavior from obsolete experimental codepaths [15].

As with the case of dead ﬂags in traditional software [13], it is often beneﬁcial to periodically re-examine each experimental branch to see what can be ripped out. Often only a small subset of the possible branches is actually used; many others may have been tested once and abandoned.

死实验代码路径。粘合代码或管道丛林的一个常见后果是，通过在主要生产代码中实现实验代码路径作为条件分支，在短期内使用替代方法进行实验变得越来越有吸引力。对于任何单独的更改，以这种方式进行试验的成本相对较低——周围的基础设施都不需要重新设计。然而，随着时间的推移，由于保持向后兼容性的难度越来越大，圈复杂度呈指数级增长，这些累积的代码路径可能会导致债务不断增加。测试代码路径之间所有可能的交互变得困难或不可能。关于这种危险的一个著名例子是，骑士资本(Knight Capital)的系统在45分钟内损失了4.65亿美元，显然是因为过时的实验代码路径[15]的意外行为。

与传统软件[13]中的死标志的情况一样，定期重新检查每个实验分支以查看可以删除什么内容通常是有益的。通常只有可能分支的一小部分被实际使用;许多其他的可能已经被测试过一次并被抛弃了。

Abstraction Debt. The above issues highlight the fact that there is a distinct lack of strong ab-stractions to support ML systems. Zheng recently made a compelling comparison of the state ML abstractions to the state of database technology [17], making the point that nothing in the machine learning literature comes close to the success of the relational database as a basic abstraction. What is the right interface to describe a stream of data, or a model, or a prediction?

For distributed learning in particular, there remains a lack of widely accepted abstractions. It could be argued that the widespread use of Map-Reduce in machine learning was driven by the void of strong distributed learning abstractions. Indeed, one of the few areas of broad agreement in recent years appears to be that Map-Reduce is a poor abstraction for iterative ML algorithms.

The parameter-server abstraction seems much more robust, but there are multiple competing speci-ﬁcations of this basic idea [5, 10]. The lack of standard abstractions makes it all too easy to blur the lines between components.

抽象的债务。上述问题突出了一个事实，即明显缺乏支持ML系统的强抽象。Zheng最近做了一个令人信服的比较，将ML抽象的状态与数据库技术[17]的状态进行了比较，指出机器学习文献中没有任何东西能与关系数据库作为基本抽象的成功相提并论。描述数据流、模型或预测的正确接口是什么?

特别是对于分布式学习，仍然缺乏被广泛接受的抽象。可以这样说，Map-Reduce在机器学习中的广泛使用是由于缺乏强大的分布式学习抽象。事实上，近年来得到广泛认同的少数领域之一似乎是Map-Reduce是迭代ML算法的一个糟糕抽象。

参数服务器抽象似乎更健壮，但这个基本思想有多个相互竞争的规范[5,10]。缺乏标准的抽象使得组件之间的界限变得很容易模糊。

Common Smells. In software engineering, a design smell may indicate an underlying problem in a component or system [7]. We identify a few ML system smells, not hard-and-fast rules, but as subjective indicators.

(1)、Plain-Old-Data Type Smell. The rich information used and produced by ML systems is all to often encoded with plain data types like raw ﬂoats and integers. In a robust system, a model parameter should know if it is a log-odds multiplier or a decision threshold, and a prediction should know various pieces of information about the model that produced it and how it should be consumed.

(2)、Multiple-Language Smell. It is often tempting to write a particular piece of a system in a given language, especially when that language has a convenient library or syntax for the task at hand. However, using multiple languages often increases the cost of effective testing and can increase the difﬁculty of transferring ownership to other individuals.

(3)、Prototype Smell. It is convenient to test new ideas in small scale via prototypes. How-ever, regularly relying on a prototyping environment may be an indicator that the full-scale system is brittle, difﬁcult to change, or could beneﬁt from improved abstractions and inter-faces. Maintaining a prototyping environment carries its own cost, and there is a signiﬁcant danger that time pressures may encourage a prototyping system to be used as a production solution. Additionally, results found at small scale rarely reﬂect the reality at full scale.

常见的气味。在软件工程中，设计气味可能表明组件或系统[7]中的潜在问题。我们识别一些ML系统气味，不是硬性规则，而是作为主观指标。

(1)、Plain-Old-Data Type Smell。ML系统使用和产生的丰富信息通常都是用普通数据类型(如原始浮点数和整数)编码的。在稳健的系统中，模型参数应该知道它是log-odds乘数还是决策阈值，而预测应该知道关于生成它的模型的各种信息，以及应该如何使用这些信息。

(2)、多语言气味。用给定的语言编写系统的特定部分通常是很诱人的，特别是当该语言具有方便的库或语法用于手头的任务时。然而，使用多种语言通常会增加有效测试的成本，并增加将所有权转移给其他人的难度。

(3)、原型气味。通过原型在小范围内测试新想法是很方便的。然而，经常依赖原型环境可能是一个指标，表明完整的系统是脆弱的，难以更改的，或者可以从改进的抽象和接口中受益。维护原型环境有其自身的成本，并且存在一个重要的危险，即时间压力可能会促使原型系统被用作生产解决方案。此外，在小范围内发现的结果很少反映全面范围内的现实情况。

6 Configuration Debt

Another potentially surprising area where debt can accumulate is in the conﬁguration of machine learning systems. Any large system has a wide range of conﬁgurable options, including which features are used, how data is selected, a wide variety of algorithm-speciﬁc learning settings, poten-tial pre- or post-processing, veriﬁcation methods, etc. We have observed that both researchers and engineers may treat conﬁguration (and extension of conﬁguration) as an afterthought. Indeed, veri-ﬁcation or testing of conﬁgurations may not even be seen as important. In a mature system which is being actively developed, the number of lines of conﬁguration can far exceed the number of lines of the traditional code. Each conﬁguration line has a potential for mistakes.

Consider the following examples. Feature A was incorrectly logged from 9/14 to 9/17. Feature B is not available on data before 10/7. The code used to compute feature C has to change for data before and after 11/1 because of changes to the logging format. Feature D is not available in production, so a substitute features D′ and D′′ must be used when querying the model in a live setting. If feature Z is used, then jobs for training must be given extra memory due to lookup tables or they will train inefﬁciently. Feature Q precludes the use of feature R because of latency constraints.

另一个可能会累积债务的潜在令人惊讶的领域是机器学习系统的配置。任何大型系统都有广泛的可配置选项，包括使用哪些特征、如何选择数据、各种特定算法的学习设置、潜在的预处理或后处理、验证方法等。我们已经注意到，研究人员和工程师都可能将配置(和配置的扩展)视为事后的想法。事实上，配置的验证或测试甚至可能不被视为重要。在一个正在积极开发的成熟系统中，配置的行数可以远远超过传统代码的行数。每个配置行都有出错的可能。

考虑下面的例子。特征A从9/14错误地记录到9/17。特征B在10/7之前的数据上不可用。由于日志格式的更改，用于计算特征C的代码必须在11/1之前和之后更改数据。特征D在生产中不可用，因此当在实时设置中查询模型时，必须使用替代特征D '和D '。如果使用了特征Z，那么由于查找表的原因，必须为训练工作提供额外的内存，否则训练效率会很低。由于延迟限制，特征Q排除了特征R的使用。

All this messiness makes conﬁguration hard to modify correctly, and hard to reason about. How-ever, mistakes in conﬁguration can be costly, leading to serious loss of time, waste of computing resources, or production issues. This leads us to articulate the following principles of good conﬁgu-ration systems:

It should be easy to specify a conﬁguration as a small change from a previous conﬁguration.
It should be hard to make manual errors, omissions, or oversights.
It should be easy to see, visually, the difference in conﬁguration between two models.
It should be easy to automatically assert and verify basic facts about the conﬁguration: number of features used, transitive closure of data dependencies, etc.
It should be possible to detect unused or redundant settings.
Conﬁgurations should undergo a full code review and be checked into a repository.

所有这些混乱使配置难以正确修改，也难以进行推理。然而，配置中的错误可能代价高昂，导致严重的时间损失、计算资源的浪费或生产问题。这使我们阐明了良好配置系统的以下原则:

将配置指定为对先前配置的微小更改应该很容易。
应不易出现人工错误、遗漏或疏忽。
应该很容易从视觉上看出两个模型之间的配置差异。
应该很容易自动断言和验证关于配置的基本事实：使用的特征数量、数据依赖的传递闭包等。
应该能够检测到未使用或冗余的设置。
配置应该经过完整的代码审查，并被签入存储库。

7 Dealing with Changes in the ExternalWorld

One of the things that makes ML systems so fascinating is that they often interact directly with the external world. Experience has shown that the external world is rarely stable. This background rate of change creates ongoing maintenance cost.

Fixed Thresholds in Dynamic Systems. It is often necessary to pick a decision threshold for a given model to perform some action: to predict true or false, to mark an email as spam or not spam, to show or not show a given ad. One classic approach in machine learning is to choose a threshold from a set of possible thresholds, in order to get good tradeoffs on certain metrics, such as precision and recall. However, such thresholds are often manually set. Thus if a model updates on new data, the old manually set threshold may be invalid. Manually updating many thresholds across many models is time-consuming and brittle. One mitigation strategy for this kind of problem appears in [14], in which thresholds are learned via simple evaluation on heldout validation data.

Monitoring and Testing. Unit testing of individual components and end-to-end tests of running systems are valuable, but in the face of a changing world such tests are not sufﬁcient to provide evidence that a system is working as intended. Comprehensive live monitoring of system behavior in real time combined with automated response is critical for long-term system reliability.

ML系统之所以如此吸引人的原因之一是它们经常与外部世界直接交互。经验表明，外部世界很少是稳定的。这种背景变更率会产生持续的维护成本。

动态系统中的固定阈值。通常需要为给定的模型选择一个决策阈值来执行某些操作:预测正确或错误，将电子邮件标记为垃圾邮件或非垃圾邮件，显示或不显示给定的广告。机器学习中一个经典的方法是从一组可能的阈值中选择一个阈值，以便在某些指标(比如精度和召回率)上获得良好的权衡。然而，这种阈值通常是手工设置的。因此，如果模型更新了新数据，那么手工设置的旧阈值可能无效。跨多个模型手动更新多个阈值既耗时又脆弱。针对这类问题的一种缓解策略出现在[14]中，其中阈值是通过对留存验证数据的简单评估来学习的。

监控和测试。单个组件的单元测试和运行系统的端到端测试是有价值的，但是面对一个不断变化的世界，这样的测试不足以提供系统按预期工作的证据。对系统行为进行实时全面实时监控并结合自动响应对于系统的长期可靠性至关重要。

The key question is: what to monitor? Testable invariants are not always obvious given that many ML systems are intended to adapt over time. We offer the following starting points.

Prediction Bias. In a system that is working as intended, it should usually be the case that the distribution of predicted labels is equal to the distribution of observed labels. This is by no means a comprehensive test, as it can be met by a null model that simply predicts average values of label occurrences without regard to the input features. However, it is a surprisingly useful diagnostic, and changes in metrics such as this are often indicative of an issue that requires attention. For example, this method can help to detect cases in which the world behavior suddenly changes, making training distributions drawn from historical data no longer reﬂective of current reality. Slicing prediction bias by various dimensions isolate issues quickly, and can also be used for automated alerting.

Action Limits. In systems that are used to take actions in the real world, such as bidding on items or marking messages as spam, it can be useful to set and enforce action limits as a sanity check. These limits should be broad enough not to trigger spuriously. If the system hits a limit for a given action, automated alerts should ﬁre and trigger manual intervention or investigation.

Up-Stream Producers. Data is often fed through to a learning system from various up-stream producers. These up-stream processes should be thoroughly monitored, tested, and routinely meet a service level objective that takes the downstream ML system needs into account. Further any up-stream alerts must be propagated to the control plane of an ML system to ensure its accuracy. Similarly, any failure of the ML system to meet established service level objectives be also propagated down-stream to all consumers, and directly to their control planes if at all possible.

关键问题是:监控什么?可测试不变量并不总是明显的，因为许多ML系统都打算随着时间的推移而适应。我们提供以下出发点。

预测偏差。在按预期工作的系统中，通常情况下预测标签的分布等于观察标签的分布。这绝不是一个全面的测试，因为它可以由一个空模型来满足，该模型可以简单地预测标签出现的平均值，而不考虑输入特征。然而，这是一种非常有用的诊断方法，像这样的度量标准的变化通常表明存在需要注意的问题。例如，这种方法可以帮助检测世界行为突然改变的情况，使得从历史数据中提取的训练分布不再反映当前的现实。按各种维度切片预测偏差可以快速隔离问题，也可用于自动警报。

行动的限制。在用于在现实世界中采取行动的系统中，例如对物品进行竞价或将消息标记为垃圾邮件，设置和执行行动限制作为一种健全的检查可能很有用。这些限制应该足够宽，不会误触发。如果系统达到给定操作的限制，则应触发自动警报并触发手动干预或调查。

上游生产商。数据通常是通过来自上游生产商的学习系统。这些上游流程应该被彻底监控、测试，并定期满足服务水平目标，将下游 ML 系统需求考虑在内。此外，任何上游警报必须传播到ML系统的控制平面，以确保其准确性。类似地，ML 系统在满足既定服务级别目标方面的任何失败也会在下游传播到所有访问者，如果可能的话，直接传播到他们的控制平面。

Because external changes occur in real-time, response must also occur in real-time as well. Relying on human intervention in response to alert pages is one strategy, but can be brittle for time-sensitive issues. Creating systems to that allow automated response without direct human intervention is often well worth the investment.

由于外部变化是实时发生的，因此响应也必须实时发生。依靠人工干预来响应警告页面是一种策略，但对于时间敏感的问题来说可能很脆弱。创建无需直接人工干预即可自动响应的系统通常非常值得投资。

We now brieﬂy highlight some additional areas where ML-related technical debt may accrue.

我们现在简要地强调一些可能产生ML相关技术债务的其他领域。

Data Testing Debt. If data replaces code in ML systems, and code should be tested, then it seems clear that some amount of testing of input data is critical to a well-functioning system. Basic sanity checks are useful, as more sophisticated tests that monitor changes in input distributions.

Reproducibility Debt. As scientists, it is important that we can re-run experiments and get similar results, but designing real-world systems to allow for strict reproducibility is a task made difﬁcult by randomized algorithms, non-determinism inherent in parallel learning, reliance on initial conditions, and interactions with the external world.

Process Management Debt. Most of the use cases described in this paper have talked about the cost of maintaining a single model, but mature systems may have dozens or hundreds of models running simultaneously [14, 6]. This raises a wide range of important problems, including the problem of updating many conﬁgurations for many similar models safely and automatically, how to manage and assign resources among models with different business priorities, and how to visualize and detect blockages in the ﬂow of data in a production pipeline. Developing tooling to aid recovery from production incidents is also critical. An important system-level smell to avoid are common processes with many manual steps.

Cultural Debt. There is sometimes a hard line between ML research and engineering, but this can be counter-productive for long-term system health. It is important to create team cultures that reward deletion of features, reduction of complexity, improvements in reproducibility, stability, and monitoring to the same degree that improvements in accuracy are valued. In our experience, this is most likely to occur within heterogeneous teams with strengths in both ML research and engineering.

数据测试的债务。如果数据取代了 ML 系统中的代码，并且应该测试代码，那么很明显，对输入数据进行一定量的测试对于运行良好的系统至关重要。基本的健全性检查很有用，因为更复杂的测试可以监控输入分布的变化。

再现性的债务。作为科学家，我们能够重新运行实验并得到相似的结果是很重要的，但是设计真实世界的系统以允许严格的重现性是一项困难的任务，因为随机算法、并行学习中固有的非确定性、对初始条件的依赖以及与外部世界的交互。

流程管理债务。本文中描述的大多数用例都讨论了维护单个模型的成本，但是成熟的系统可能同时运行数十或数百个模型[14,6]。这就提出了一个广泛的重要问题,包括安全自动更新许多相似模型的许多配置的问题，如何在具有不同业务优先级的模型之间管理和分配资源，以及如何可视化和检测数据流中的阻塞问题。开发工具以帮助从生产事故中恢复也是至关重要的。需要避免的一个重要的系统级异味是具有许多手动步骤的常见流程。

文化上的债务。有时ML研究和工程之间有一条强硬的界线，但这可能对长期的系统健康产生反作用。重要的是要创建这样的团队文化，即奖励删除特征、减少复杂性、改进可再现性、稳定性和监控，使其达到与改进准确性同等重要的程度。根据我们的经验，这种情况最有可能发生在ML研究和工程方面都有优势的异构团队中。

9 Conclusions: Measuring Debt and Paying it Off

Technical debt is a useful metaphor, but it unfortunately does not provide a strict metric that can be tracked over time. How are we to measure technical debt in a system, or to assess the full cost of this debt? Simply noting that a team is still able to move quickly is not in itself evidence of low debt or good practices, since the full cost of debt becomes apparent only over time. Indeed, moving quickly often introduces technical debt. A few useful questions to consider are:

How easily can an entirely new algorithmic approach be tested at full scale?

What is the transitive closure of all data dependencies?

How precisely can the impact of a new change to the system be measured?

Does improving one model or signal degrade others?

How quickly can new members of the team be brought up to speed?

技术债务是一个有用的隐喻，但不幸的是，它没有提供可以随时间跟踪的严格指标。我们如何衡量一个体系中的技术债务，或评估这种债务的全部成本?仅仅注意到一个团队仍然能够快速行动本身并不是低债务或良好实践的证据，因为债务的全部成本只有随着时间的推移才会变得明显。事实上，快速行动常常会带来技术债务。需要考虑的几个有用的问题是：

全面测试全新的算法方法有多容易？

所有数据依赖的传递闭包是什么？

衡量新变化对系统的影响的精确度如何？

改进一种模型或信号是否会降低其他模型或信号？

团队的新成员能多快上手？

We hope that this paper may serve to encourage additional development in the areas of maintainable ML, including better abstractions, testing methodologies, and design patterns. Perhaps the most important insight to be gained is that technical debt is an issue that engineers and researchers both need to be aware of. Research solutions that provide a tiny accuracy beneﬁt at the cost of massive increases in system complexity are rarely wise practice. Even the addition of one or two seemingly innocuous data dependencies can slow further progress.

Paying down ML-related technical debt requires a speciﬁc commitment, which can often only be achieved by a shift in team culture. Recognizing, prioritizing, and rewarding this effort is important for the long term health of successful ML teams.

我们希望本文能够鼓励在可维护性ML领域进行更多的开发，包括更好的抽象、测试方法和设计模式。也许要获得的最重要的见解是技术债务是工程师和研究人员都需要注意的问题。以大量增加系统复杂性为代价提供微小精度优势的研究解决方案很少是明智的做法。即使添加一两个看似无害的数据依赖关系也会减缓进一步的进展。

偿还与机器学习相关的技术债务需要做出具体的承诺，这通常只能通过团队文化的转变来实现。认可、优先考虑和奖励这项工作对于成功的 ML 团队的长期健康很重要

Acknowledgments

This paper owes much to the important lessons learned day to day in a culture that values both innovative ML research and strong engineering practice. Many colleagues have helped shape our thoughts here, and the beneﬁt of accumulated folk wisdom cannot be overstated. We would like to speciﬁcally recognize the following: Roberto Bayardo, Luis Cobo, Sharat Chikkerur, Jeff Dean, Philip Henderson, Arnar Mar Hrafnkelsson, Ankur Jain, Joe Kovac, Jeremy Kubica, H. Brendan McMahan, Satyaki Mahalanabis, Lan Nie, Michael Pohl, Abdul Salem, Sajid Siddiqi, Ricky Shan, Alan Skelly, Cory Williams, and Andrew Young.

A short version of this paper was presented at the SE4ML workshop in 2014 in Montreal, Canada.

这篇论文很大程度上归功于在一个重视创新的ML研究和强大的工程实践的文化中，每天学习到的重要的经验教训。许多同事在这里为我们的思想形成做出了贡献，积累的民间智慧的好处是不容小觑的。我们特别要表扬的是:Roberto Bayardo, Luis Cobo, Sharat Chikkerur, Jeff Dean, Philip Henderson, Arnar Mar Hrafnkelsson, Ankur Jain, Joe Kovac, Jeremy Kubica, H. Brendan McMahan, Satyaki Mahalanabis, Lan Nie, Michael Pohl, Abdul Salem, Sajid Siddiqi, Ricky Shan, Alan Skelly, Cory Williams和Andrew Young。

本文的简短版本于2014年在加拿大蒙特利尔举行的SE4ML研讨会上发表。

Paper：《Hidden Technical Debt in Machine Learning Systems—机器学习系统中隐藏的技术债》翻译与解读相关推荐

Paper：《Graph Neural Networks: A Review of Methods and Applications》翻译与解读
Paper:<Graph Neural Networks: A Review of Methods and Applications>翻译与解读目录 <Graph Neural N ...
Paper：《Graph Neural Networks: A Review of Methods and Applications—图神经网络:方法与应用综述》翻译与解读
Paper:<Graph Neural Networks: A Review of Methods and Applications-图神经网络:方法与应用综述>翻译与解读目录 < ...
Graph Neural Networks: A Review of Methods and Applications（图神经网络：方法与应用综述）
Graph Neural Networks: A Review of Methods and Applications 图神经网络:方法与应用综述 Jie Zhou , Ganqu Cui , Zhe ...
Graph Neural Networks: A Review of Methods and Applications(Semantic Segmentation应用解读)
Graph Neural Networks Graph Neural Networks: A Review of Methods and Applications 下载链接:https://arxiv ...
《Graph Neural Networks: A Review of Methods and Applications》图网络综述 -- 解析笔记
Graph Neural Networks: A Review of Methods and Applications 相关推荐概要原文链接内容概括要点分析图神经网络生疏词汇笔记符号表 ...
《图神经网络综述》Graph Neural Networks: A Review of Methods and Applications
作者:清华大学 zhoujie 等 *表示相同的贡献.周洁,张正艳,程阳,刘志远(通讯作者),孙茂松,清华大学计算机科学与技术学院,北京100084.电子邮件:fzhoujie18, zhangzhe ...
《Graph neural networks A review of methods and applications》翻译
1.介绍论文研究了gnn中的其他计算模块,如跳过连接和池化操作. Skip Connections(或 Shortcut Connections),跳过连接,会跳过神经网络中的某些层,并将一层的输出 ...
一文轻松了解Graph Neural Networks
作者:Joao Schapke 编译: Congqing He ❝ 图神经网络有哪些应用领域?又有哪些图神经网络?这篇文章告诉你! ❞ 图结构数据在各个领域都很常见,例如{分子.社会.引用.道路}网络 ...
[论文翻译]-A Comprehensive Survey on Graph Neural Networks《图神经网络GNN综述》
文章目录摘要 1 简介 1.1 GNN简史 1.2 Related surveys on graph neural networks 1.3 Graph neural networks vs. ne ...
图神经网络（Graph neural networks）综述
论文链接:Graph Neural Networks: A Review of Methods and Applications Abstract:图(Graph)数据包含着十分丰富的关系型信息.从文 ...

Paper：《Hidden Technical Debt in Machine Learning Systems—机器学习系统中隐藏的技术债》翻译与解读

《Hidden Technical Debt in Machine Learning Systems》翻译与解读

Abstract

1 Introduction

2 Complex Models Erode Boundaries

3 Data Dependencies Cost More than Code Dependencies

4 Feedback Loops

5 ML-System Anti-Patterns

6 Configuration Debt

7 Dealing with Changes in the ExternalWorld

9 Conclusions: Measuring Debt and Paying it Off

Acknowledgments

Paper：《Hidden Technical Debt in Machine Learning Systems—机器学习系统中隐藏的技术债》翻译与解读相关推荐

最新文章

热门文章

Paper：《Hidden Technical Debt in Machine Learning Systems—机器学习系统中隐藏的技术债》翻译与解读

《Hidden Technical Debt in Machine Learning Systems》翻译与解读

Abstract

1 Introduction

2 Complex Models Erode Boundaries

3 Data Dependencies Cost More than Code Dependencies

4 Feedback Loops

5 ML-System Anti-Patterns

6 Configuration Debt

7 Dealing with Changes in the ExternalWorld

8 Other Areas of ML-related Debt

9 Conclusions: Measuring Debt and Paying it Off

Acknowledgments

Paper：《Hidden Technical Debt in Machine Learning Systems—机器学习系统中隐藏的技术债》翻译与解读相关推荐

最新文章

热门文章