


split+atof 和sscanf性能对比


The following are sequential results measured on a PC (Core i7 920 @2.67Ghz), where u32toa() is compiled by Visual C++ 2013 and run on Windows 64-bit. The speedup is based on sprintf().

Function Time (ns) Speedup
sprintf 194.225 1.00x
vc 61.522 3.16x
naive 26.743 7.26x
count 20.552 9.45x
lut 17.810 10.91x
countlut 9.926 19.57x
branchlut 8.430 23.04x
sse2 7.614 25.51x
null 2.230 87.09x


Function  Description
ostringstream std::ostringstream in C++ standard library.
ostrstream std::ostrstream in C++ standard library.
to_string std::to_string() in C++11 standard library.
sprintf sprintf() in C standard library
vc Visual C++'s _itoa()_i64toa()_ui64toa()
naive Compute division/modulo of 10 for each digit, store digits in temp array and copy to buffer in reverse order.
unnamed Compute division/modulo of 10 for each digit, store directly in buffer
count Count number of decimal digits first, using technique from [1].
lut Uses lookup table (LUT) of digit pairs for division/modulo of 100. Mentioned in [2]
countlut Combines count and lut.
branchlut Use branching to divide-and-conquer the range of value, make computation more parallel.
sse2 Based on branchlut scheme, use SSE2 SIMD instructions to convert 8 digits in parallel. The algorithm is designed by Wojciech Muła [3]. (Experiment shows it is useful for values equal to or more than 9 digits)
null Do nothing.


Function  Time (ns)  Speedup 
ostringstream 2,778.748 0.45x
ostrstream 2,628.365 0.48x
gay 1,646.310 0.76x
sprintf 1,256.376 1.00x
fpconv 273.822 4.59x
grisu2 220.251 5.70x
doubleconv 201.645 6.23x
milo 138.021 9.10x
null 2.146 585.58x


Function  Description
ostringstream std::ostringstream in C++ standard library with setprecision(17).
ostrstream std::ostrstream in C++ standard library with setprecision(17).
sprintf sprintf() in C standard library with "%.17g" format.
stb_sprintf fast sprintf replacement with "%.17g" format.
gay David M. Gay's dtoa() C implementation.
grisu2 Florian Loitsch's Grisu2 C implementation [1].
doubleconv C++ implementation extracted from Google's V8 JavaScript Engine with EcmaScriptConverter().ToShortest() (based on Grisu3, fall back to slower bignum algorithm when Grisu3 failed to produce shortest implementation).
fpconv night-shift's Grisu2 C implementation.
milo miloyip's Grisu2 C++ header-only implementation.
null Do nothing.


  1. tostring() is not tested as it does not fulfill the roundtrip requirement.

  2. Grisu2 is chosen because it can generate better human-readable number and >99.9% of results are in shortest. Grisu3 needs another dtoa() implementation for not meeting the shortest requirement.



