I came across an article from NVIDIA talking about their TPCx-BB benchmark results on A100. As a data scientist, I was immediately intrigued because I’m a big fan of the Transaction Processing Performance Council (TPC) benchmarks, which provide reasonable and objective performance metrics. Also, the TPC has clear rules about how their benchmarks are used and how results are reported to ensure that results from different vendors can be directly compared. I’ll say more about this later, but first let’s talk about the end-to-end data analytics workflow.

我碰到了NVIDIA的一篇文章,谈论了他们在A100上的TPCx-BB基准测试结果。 作为数据科学家,我立即被吸引,因为我非常喜欢事务处理性能委员会(TPC)基准,该基准提供了合理和客观的性能指标。 此外,TPC对于如何使用其基准以及如何报告结果有明确的规则,以确保可以直接比较来自不同供应商的结果。 稍后我将详细说明,但首先让我们谈谈端到端数据分析工作流。

I’ve drawn a rough sketch of the end-to-end data analytics workflow based on my experience as a data scientist (Figure 1). Not all of my data science projects pass through every stage of this workflow, but it represents the sum total of my projects. Consequently, my computing environment must be able to handle all stages, especially the early stages: OLTP (online transactional processing) and OLAP (online analytical processing). As every data scientist knows, by the time you get to modeling, the hard work is already done. OLTP deals with managing data stores, while OLAP deals mainly with information retrieval. TPCx-BB is mainly an OLAP benchmark.

根据我作为数据科学家的经验,我已经绘制了端到端数据分析工作流的粗略草图(图1)。 并非我所有的数据科学项目都贯穿此工作流程的每个阶段,但它代表了我的项目总数。 因此,我的计算环境必须能够处理所有阶段,尤其是早期阶段:OLTP(在线事务处理)和OLAP(在线分析处理)。 每个数据科学家都知道,到您进行建模时,辛苦的工作已经完成。 OLTP处理数据存储管理,而OLAP主要处理信息检索。 TPCx-BB主要是OLAP基准。

Figure 1. Rough breakdown of stages in the end-to-end data analytics workflow

It’s always best to assess a computing environment using your specific workflows, but data science is highly variable. Analytics workflows change from one project to the next. A system architecture that performs well in one stage of the end-to-end workflow may perform poorly in another. Therefore, data analytics requires generality. This is why standard, off-the-shelf benchmarks like TPCx-BB are valuable.

始终最好使用特定的工作流程来评估计算环境,但是数据科学具有很大的可变性。 Analytics工作流程从一个项目更改为下一个项目。 在端到端工作流程的一个阶段中表现良好的系统体系结构在另一阶段中可能表现不佳。 因此,数据分析需要通用性。 这就是为什么像TPCx-BB这样的标准,现成的基准很有价值的原因。

The benchmarks shown in Table 1 were created by experts to objectively assess different stages of the end-to-end data analytics workflow. They’re easy to evaluate (i.e., most have built-in correctness evaluators), their performance metrics are clearly defined, and most offer auditing. To quote TPC, this helps “…protect users from misleading or false performance claims…” With that in mind, let’s return to NVIDIA’s TPCx-BB results.

表1中所示的基准是由专家创建的,目的是客观地评估端到端数据分析工作流的不同阶段。 它们易于评估(即,大多数具有内置的正确性评估器),明确定义了其绩效指标,并且大多数提供了审计。 引用TPC,这有助于“ ……保护用户免受误导或虚假的性能声明…… ”记住这一点,让我们返回NVIDIA的TPCx-BB结果。

Table 1. Standard benchmarks for the end-to-end data analytics workflow

TPCx-BB is a big data benchmark that contains elements of OLAP and data modeling. It is designed to measure the performance of Apache Hadoop systems using a mix of 30 SQL queries, user-defined functions, and machine learning functions. NVIDIA posted their code on GitHub, so I took a look at their query implementations to see if they actually ran TPCx-BB. They didn’t.

TPCx-BB是包含OLAP和数据建模元素的大数据基准。 它旨在结合使用30个SQL查询,用户定义的函数和机器学习函数来衡量Apache Hadoop系统的性能。 NVIDIA在GitHub上发布了他们的代码,因此我看了看他们的查询实现,看他们是否真正运行过TPCx-BB。 他们没有。

First, they replaced Spark with Dask, which defeats the purpose of a Hadoop-based benchmark. Dask is a nice technology but Spark is far more common in data analytics workflows. Second, some of their query implementations ignored the user-defined and/or machine learning functions. Finally, they do not report the required TPCx-BB performance metrics: BBQpm (queries per minute throughput) and Price/BBQpm. The former is critical for a true assessment of overall performance because TPCx-BB models a system under load rather than the performance of isolated queries. The NVIDIA measurements ignore load and throughput, which isn’t realistic.

首先,他们用Dask代替了Spark,这违反了基于Hadoop基准测试的目的。 Dask是一项不错的技术,但Spark在数据分析工作流中更为常见。 其次,他们的某些查询实现忽略了用户定义的和/或机器学习功能。 最后,他们没有报告所需的TPCx-BB性能指标:BBQpm(每分钟吞吐量查询)和Price / BBQpm。 前者对于真实评估整体性能至关重要,因为TPCx-BB对负载下的系统进行建模,而不是对孤立查询的性能进行建模。 NVIDIA的测量忽略了负载和吞吐量,这是不现实的。

The current, audited TPCx-BB results (as of September 25, 2020) from several major hardware vendors are shown in Figure 2. All of their benchmarking systems used Intel Xeon processors at various scale factors and price points. There is no current or historical data for NVIDIA processors.

图2中显示了来自几个主要硬件供应商的最新,经审计的TPCx-BB结果(截至2020年9月25日)。他们所有的基准测试系统均使用了各种比例因子和价格点的Intel Xeon处理器。 没有NVIDIA处理器的当前或历史数据。

http://www.tpc.org/tpcx-bb/results/tpcxbb_perf_results5.asp, used with permission from TPC)http ://www.tpc.org/tpcx-bb/results/tpcxbb_perf_results5.asp,经TPC许可使用)

While I applaud NVIDIA’s attempt to use a standard, off-the-shelf benchmark like TPCx-BB, please run the actual benchmark suite and report the primary metrics — if you can. As I said above, the TPC has strict rules about how their benchmarks are used:

我为NVIDIA尝试使用标准的现成基准(如TPCx-BB)表示赞赏,但请运行实际的基准套件并报告主要指标(如果可以)。 就像我在上面所说的,TPC对基准的使用有严格的规定:

“…it should be noted that the TPC benchmark specifications and policies require the submittal of complete documentation on these tests, which are then reviewed by the TPC Council. If a vendor’s TPC benchmark test is determined to be executed improperly or unfairly, a vendor will have to withdraw the result and can no longer use that result publicly. These rules protect users from misleading or false performance claims and preserves the credibility of TPC benchmark results.” (Source: Running a TPC Benchmark)

“……应注意,TPC基准规范和政策要求提交有关这些测试的完整文档,然后由TPC理事会进行审查。 如果确定供应商的TPC基准测试不正确或不公平地执行,则供应商将不得不撤回该结果,并且无法再公开使用该结果。 这些规则可以保护用户免受误导或虚假的性能要求,并保持TPC基准测试结果的可信度。” (来源:运行TPC基准测试)

I’ve taken NVIDIA to task once before for using contrived tests to represent an entire stage of the end-to-end workflow:


Don’t be fooled. Generality is critical in data science. Xeon-based systems scale better and provide best performance and TCO for the end-to-end data analytics workflow.

不要上当。 通用性在数据科学中至关重要。 基于Xeon的系统可更好地扩展,并为端到端数据分析工作流提供最佳性能和TCO。

