Benchmark tool library for c++ code

Benchmark是什么？

Wikipedia解释

如下几类：

Business and economics（商业和经济）
- Benchmarking, evaluating performance within organizations（基准测试是将业务流程和绩效指标与其他公司的行业最佳和最佳实践进行比较的做法。通常测量的维度是质量、时间和成本）
- Benchmark price (基准价格 (BP) 是国际市场特定部分中每单位数量的价格，由在伦敦金属交易所等市场上一贯出口最大数量或数量的国家或生产者组织设定。 [1] [2] 这个价格是定期设定的，通常是每月一次，作为国际贸易的指导方针。)
- Benchmark (crude oil), oil-specific practices (基准原油或标记原油是用作原油买卖双方参考价格的原油。有三个主要基准，即西德克萨斯中质原油 (WTI)、布伦特混合原油和迪拜原油。)
- Benchmark, an investment performance attribution （投资业绩归因是一组技术，业绩分析师用来解释为什么投资组合的业绩与基准不同）
Science and technology (科学与技术)
- Benchmark (surveying), a point of known elevation marked for the purpose of surveying （基准、基准或测量基准这一术语起源于测量员在石头结构中凿出的水平标记）
- Benchmarking (geolocating), an activity involving finding benchmarks （是参与者寻找基准（也称为调查标记或大地控制点）的业余爱好活动）
- Benchmark (computing), the result of running a computer program to assess performance （在计算中，基准是运行一个计算机程序、一组程序或其他操作的行为，以评估一个对象的相对性能，通常是通过对它运行许多标准测试和试验。 [1] 术语基准也通常用于精心设计的基准测试程序本身。）
- Benchmark, a best-performing, or gold standard test in medicine and statistics （医学和统计学中表现最佳或黄金标准的测试）
和我们相关的是Science and technology中的Benchmark (computing)，主要通过考察一个计算机程序、一组程序、其他操作行为，然后对一个对象进行性能评估。

Benchmark (computing)

Benchmark原则
1. Relevance: Benchmarks should measure relatively vital features.（基准应该衡量相对重要的特征）
2. Representativeness: Benchmark performance metrics should be broadly accepted by industry and academia.（代表性：基准性能指标应被业界和学术界广泛接受）
3. Equity: All systems should be fairly compared.（公平：所有系统都应该被公平地比较）
4. Repeatability: Benchmark results can be verified.（重复性：可以验证基准测试结果）
5. Cost-effectiveness: Benchmark tests are economical.（成本效益：基准测试是经济的）
6. Scalability: Benchmark tests should work across systems possessing a range of resources from low to high.（可扩展性：基准测试应该适用于拥有从低到高的一系列资源的系统）
7. Transparency: Benchmark metrics should be easy to understand.（透明度：基准指标应该易于理解）
Benchmark类型
1. Real program (真正的程序)
  - word processing software（文本处理程序）
  - tool software of CAD（CAD软件工具）
  - user’s application software (i.e.: MIS) （用户应用程序）
2. Component Benchmark / Microbenchmark（组件benchmark，微benchmark）
  - core routine consists of a relatively small and specific piece of code. （由一段相对较小且特定的代码组成的核心例程）
  - measure performance of a computer’s basic components （测量计算机基本组件的性能）
  - may be used for automatic detection of computer’s hardware parameters like number of registers, cache size, memory latency, etc. （可用于自动检测计算机的硬件参数）
3. Kernel（内核测试）
  - contains key codes
  - normally abstracted from actual program
  - popular kernel: Livermore loop
  - linpack benchmark (contains basic linear algebra subroutine written in FORTRAN language)
  - results are represented in Mflop/s.
4. Synthetic Benchmark
  - Procedure for programming synthetic benchmark:
    - take statistics of all types of operations from many application programs
    - get proportion of each operation
    - write program based on the proportion above
  - Types of Synthetic Benchmark are:
    - Whetstone
    - Dhrystone
  - These were the first general purpose industry standard computer benchmarks. They do not necessarily obtain high scores on modern pipelined computers.
5. I/O benchmarks
6. Database benchmarks
  - measure the throughput and response times of database management systems (DBMS)
7. Parallel benchmarks
  - used on machines with multiple cores and/or processors, or systems consisting of multiple machines
一些常用的benchmark测试工具

内存、文件系统benchmark工具
- Iometer – I/O subsystem measurement and characterization tool for single and clustered systems.
- IOzone – Filesystem benchmark
- 更多的参考Wikipedia链接
一些个人想法

这里主要介绍了benchmark的原则（做基准测试要遵循的规则）、benchmark的类型（主要存在哪些类型的benchmark，当我们要进行benchmark测试时，首先要知道我们测试的主体是什么，在上述的类型中应该有它的归属）、一些常用的benchmark工具(针对那些通用的测试主体，已经前人开发的各个工具)；那么我认为benchmark测试的流程应该如下：

针对C++代码进行Benchmark测试

回到我们主题，如何对C++代码进行Benchmark测试，这里以C++编写的logger日志库为例；按照我们上述Benchmark测试流程：

Benchmark主体是什么？

C++编写的logger日志库，应该属于Component Benchmark / Microbenchmark这一类；一段相对较小且特定的代码组成的核心例程

ps: 多数我们用户态的C++代码，都能归属在Real program、Component Benchmark / Microbenchmark这两类。
是否存在现有的测试工具？

目前没有。
需要自己编写测试工具

按照我们梳理的流程，需要自己编写测试工具。

那么针对我们C++代码Benchmark，自己可以编写简单的代码进行测试，比如如下代码，Timer用于测量shared_ptr使用make_shared和new两种方式初始化的执行时间：

#include <array>
#include <chrono>
#include <iostream>
#include <memory>using TimePoint = std::chrono::high_resolution_clock::time_point;
class Timer {public:Timer() { start_time_point_ = std::chrono::high_resolution_clock::now(); }~Timer() { Stop(); }void Stop() {TimePoint end_time_point = std::chrono::high_resolution_clock::now();auto start = std::chrono::time_point_cast<std::chrono::nanoseconds>(start_time_point_).time_since_epoch().count();auto end = std::chrono::time_point_cast<std::chrono::nanoseconds>(end_time_point).time_since_epoch().count();auto duration = end - start;std::cout << duration << "ns(" << duration * 0.001 << "us)" << std::endl;}private:TimePoint start_time_point_;
};struct Point {float x{0};float y{0};
};int main() {std::cout << "shared_ptr make_shared:";{std::array<std::shared_ptr<Point>, 1000> ptr_array;Timer timer;for (int i = 0; i < ptr_array.size(); ++i) {ptr_array[i] = std::make_shared<Point>();}}std::cout << "shared_ptr new:";{std::array<std::shared_ptr<Point>, 1000> ptr_array;Timer timer;for (int i = 0; i < ptr_array.size(); ++i) {ptr_array[i] = std::shared_ptr<Point>(new Point());}}return 0;
}

避免重复造轮子及使用好的轮子，是否存在一些library来帮助我们进行测试呢？

C++ Benchmark tool library

存在哪些c++ Benchmark库呢，通过google搜索和github搜索, 列出如下Benchmark library

名称	简介	源码地址
google/benchmark	A microbenchmark support library	https://github.com/google/benchmark
Celero	C++ Benchmark Authoring Library/Framework	https://github.com/DigitalInBlue/Celero
hayai	C++ benchmarking framework	https://github.com/nickbruun/hayai
nonius	A C++ micro-benchmarking framework	https://github.com/libnonius/nonius
sltbench	C++ benchmark tool. Practical, stable and fast performance testing framework.	https://github.com/ivafanas/sltbench
CppBenchmark	Performance benchmark framework for C++ with nanoseconds measure precision	https://github.com/chronoxor/CppBenchmark

Benchmark tool library使用

CppBenchmark

CMakeLists.txt

set(CppBenchmarkPath "/path/CppBenchmark")include_directories(${CppBenchmarkPath}/include)
link_directories(${CppBenchmarkPath}/bin ${CppBenchmarkPath}/temp/modules)add_executable(function_call_benchmark function_call_benchmark.cpp)
target_link_libraries(function_call_benchmark PUBLIC cppbenchmark cpp-optparse HdrHistogram)

example code

#include "benchmark/cppbenchmark.h"#include <math.h>// Benchmark sin() call for 1 seconds.
// Make 3 attemtps and choose one with the best time result.
BENCHMARK("sin", Settings().Attempts(3).Duration(1))
{std::cout << "xxx\n";sin(123.456);
}BENCHMARK_MAIN()

运行效果

[ 33%] Launching sin. Attempt 1...Done!
[ 66%] Launching sin. Attempt 2...Done!
[100%] Launching sin. Attempt 3...Done!
===============================================================================
CppBenchmark report. Version 1.0.1.0
===============================================================================
CPU architecture: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz
CPU logical cores: 16
CPU physical cores: 16
CPU clock speed: 4.867 GHz
CPU Hyper-Threading: disabled
RAM total: 31.209 GiB
RAM free: 807.604 MiB
===============================================================================
OS version: Ubuntu 18.04.5 LTS
OS bits: 64-bit
Process bits: 64-bit
Process configuration: release
Local timestamp: Tue Nov 23 17:25:42 2021
UTC timestamp: Tue Nov 23 09:25:42 2021
===============================================================================
Benchmark: sin
Attempts: 3
Duration: 1 seconds
-------------------------------------------------------------------------------
Phase: sin
Average time: 0 ns/op
Minimal time: 1 ns/op
Maximal time: 1 ns/op
Total time: 96.781 ms
Total operations: 133027371
Operations throughput: 1374517987 ops/s
===============================================================================

需要cpp-optparse HdrHistogram两个库的额外依赖。

sltbench

CMakeLists.txt

set(SltBenchmarkPath "/Path/sltbench/install")include_directories(${SltBenchmarkPath}/include)
link_directories(${SltBenchmarkPath}/lib)add_executable(slt_benchmark slt_benchmark.cpp)
target_link_libraries(slt_benchmark PUBLIC sltbench)

example code

#include "sltbench/Bench.h"
#include <thread>
#include <chrono>void my_function()
{std::this_thread::sleep_for(std::chrono::microseconds(1000));
}SLTBENCH_FUNCTION(my_function);SLTBENCH_MAIN();

运行效果

输出内容较少。

nonius

CMakeLists.txt

set(NoniusBenchmarkPath "/Path/nonius")include_directories(${NoniusBenchmarkPath}/include)add_executable(nonius_benchmark nonius_benchmark.cpp)
target_link_libraries(nonius_benchmark PUBLIC pthread)

example code

#define NONIUS_RUNNER
#include <nonius/nonius.h++>
#include <thread>
#include <chrono>NONIUS_BENCHMARK("DemoSleep", []{std::this_thread::sleep_for(std::chrono::microseconds(1000));
})int main()
{nonius::configuration cfg;cfg.samples = 1;cfg.resamples = 1;nonius::go(cfg);return 0;
}

运行效果

只有头文件，轻量级；配置项好像有点少。

hayai

CmakeLists.txt

set(HayaiBenchmarkPath "/Path/hayai/install")include_directories(${HayaiBenchmarkPath}/include)
link_directories(${HayaiBenchmarkPath}/lib)add_executable(hayai_benchmark hayai_benchmark.cpp)
target_link_libraries(hayai_benchmark PUBLIC hayai_main)

example code

#include <hayai/hayai.hpp>
#include <thread>
#include <chrono>BENCHMARK(DemoSleep, DemoSleep, 1, 100)
{std::this_thread::sleep_for(std::chrono::microseconds(1000));
}

运行效果

输出很像gtest(如它所说)。

Celero

CmakeLists.txt

set(CeleroBenchmarkPath "/sec/yms/benchmark/Celero/install")include_directories(${CeleroBenchmarkPath}/include)
link_directories(${CeleroBenchmarkPath}/lib)add_executable(celero_benchmark celero_benchmark.cpp)
target_link_libraries(celero_benchmark PUBLIC celero-d)

example code

#include <celero/Celero.h>#include <chrono>
#include <thread>CELERO_MAINBASELINE(DemoSleep, DemoSleep, 1, 100) // 这个必不可少，缺少运行会coredump
{std::this_thread::sleep_for(std::chrono::microseconds(1000));
}BENCHMARK(DemoSleep, HalfBaseline, 1, 100)
{std::this_thread::sleep_for(std::chrono::microseconds(500));
}

运行效果

需要添加一个BASELINE，只添加BENCHMARK会crash。

google/benchmark

CmakeLists.txt

# 使用的源码安装到了系统目录，所以没有上述的include_directories、link_directories
add_executable(google_benchmark google_benchmark.cpp)
target_link_libraries(google_benchmark PRIVATE benchmark pthread)

example code

#include <benchmark/benchmark.h>
#include <chrono>
#include <thread>void BM_DemoSleep(benchmark::State& state) {for (auto _ : state){std::this_thread::sleep_for(std::chrono::microseconds(1000));}
}
BENCHMARK(BM_DemoSleep);BENCHMARK_MAIN();

运行效果

源码编译需要依赖gtest(默认会使用gtest对代码进行测试，也可以使用cmake设置变量去掉)；

一些指标

名称	编译依赖	时间精度（linux平台）
google/benchmark	依赖自身库，需额外引入pthread	包含两个时间time: 时钟默认是std::chrono::high_resolution_clock，反馈时间精度是nscpu_time: 时钟根据参数设定，是CLOCK_PROCESS_CPUTIME_ID 或者 CLOCK_THREAD_CPUTIME_ID（clock_gettime)
Celero	依赖自身库，无额外引入	时钟是std::chrono::high_resolution_clock，反馈时间精度是std::chrono::microseconds
hayai	依赖自身库，无额外引入	时钟是CLOCK_MONOTONIC_RAW->CLOCK_MONOTONIC->CLOCK_REALTIME(查看宏定义，按照这个顺序进行退化)，反馈时间精度是ns
nonius	header only，需额外引入pthread	时钟是std::chrono::high_resolution_clock, 反馈时间精度是ns
sltbench	依赖自身库，无额外引入	时钟是std::chrono::high_resolution_clock, 反馈时间精度是ns
CppBenchmark	依赖自身库，需额外引入cpp-optparse、HdrHistogram	时钟是CLOCK_MONOTONIC，反馈时间精度是ns

查看源码，上述软件框架模型都很类似：用户将要测试的函数对象注册到容器中，然后三方库再对容器进行遍历执行，最后统计结果；