举例opencv v4.5.5版本源码,Windows x64,VS2019,CMake

https://github.com/opencv/opencv/tree/4.5.5

https://sourceforge.net/projects/opencvlibrary/files/4.5.5/

下载安装opencv-4.5.5-vc14_vc15.exe,得到官方编译的动态库opencv_world455.dll

1、多线程并行库的使用情况

PPL

\opencv\sources\modules\core\src\parallel.cpp,里面默认使用的是微软PPL加速方式,

#if defined _MSC_VER && _MSC_VER >= 1600#define HAVE_CONCURRENCY
#endif

cmake里面的编译选项,其他方式如TBB,HPX,OPENMP默认是关闭的。

\opencv\sources\x64\cvconfig.h,里面有宏定义的。

/* Halide support */
/* #undef HAVE_HALIDE *//* Intel Integrated Performance Primitives */
/* #undef HAVE_IPP */
/* #undef HAVE_IPP_ICV */
/* #undef HAVE_IPP_IW */
/* #undef HAVE_IPP_IW_LL *//* Intel Threading Building Blocks */
/* #undef HAVE_TBB *//* Ste||ar Group High Performance ParallelX */
/* #undef HAVE_HPX *//* OpenVX */
/* #undef HAVE_OPENVX */

2、GPU之OpenCL

cmake里面的编译选项,默认是开启OpenCL的。

\opencv\sources\x64\cvconfig.h

/* OpenCL Support */
#define HAVE_OPENCL
/* #undef HAVE_OPENCL_STATIC */
/* #undef HAVE_OPENCL_SVM *//* NVIDIA OpenCL D3D Extensions support */
#define HAVE_OPENCL_D3D11_NV

GPU之CUDA

cmake里面的编译选项,默认是禁用CUDA的。需要用户自己勾选,并下载库编译。

3、SIMD指令集优化选项

cmake编译选项,\opencv\sources\x64\cvconfig.h,宏定义CV_ENABLE_INTRINSICS默认是开启的

#ifndef OPENCV_CVCONFIG_H_INCLUDED
#define OPENCV_CVCONFIG_H_INCLUDED/* OpenCV compiled as static or dynamic libs */
#define BUILD_SHARED_LIBS/* OpenCV intrinsics optimized code */
#define CV_ENABLE_INTRINSICS/* OpenCV additional optimized code */
/* #undef CV_DISABLE_OPTIMIZATION */

cmake编译选项,\opencv\sources\x64\cv_cpu_config.h

// OpenCV CPU baseline features#define CV_CPU_COMPILE_SSE 1
#define CV_CPU_BASELINE_COMPILE_SSE 1#define CV_CPU_COMPILE_SSE2 1
#define CV_CPU_BASELINE_COMPILE_SSE2 1#define CV_CPU_COMPILE_SSE3 1
#define CV_CPU_BASELINE_COMPILE_SSE3 1#define CV_CPU_BASELINE_FEATURES 0 \, CV_CPU_SSE \, CV_CPU_SSE2 \, CV_CPU_SSE3 \// OpenCV supported CPU dispatched features#define CV_CPU_DISPATCH_COMPILE_SSE4_1 1
#define CV_CPU_DISPATCH_COMPILE_SSE4_2 1
#define CV_CPU_DISPATCH_COMPILE_FP16 1
#define CV_CPU_DISPATCH_COMPILE_AVX 1
#define CV_CPU_DISPATCH_COMPILE_AVX2 1
#define CV_CPU_DISPATCH_COMPILE_AVX512_SKX 1#define CV_CPU_DISPATCH_FEATURES 0 \, CV_CPU_SSE4_1 \, CV_CPU_SSE4_2 \, CV_CPU_FP16 \, CV_CPU_AVX \, CV_CPU_AVX2 \, CV_CPU_AVX512_SKX \

关于OpenCV硬件加速,OpenCV Hardware Acceleration Layer(HAL),请参考博文:

OpenCV中的HAL方法调用流程分析 - willhua - 博客园OpenCV中的HAL方法调用流程分析 在OpenCV中有一些所谓HAL(Hardware Acceleration Layer)实现,看名字好像和硬件相关,其实也不尽然,可以理解为比常规的OCV实现https://www.cnblogs.com/willhua/p/12521581.html

4、IPP加速

Intel® Integrated Performance Primitives (简称Intel® IPP)是一个软件库,提供了大量的函数。包括信号处理、图像处理、计算机视觉、数据压缩和字符串操作。通过对函数的优化,比如适配指令集操作等来提升运行效率。

完整版的IPP下载地址:

https://www.intel.com/content/www/us/en/developer/tools/oneapi/ipp.html

在 Learning OpenCV 这本书中,作者提到OpenCV可以利用Intel的IPP性能库来提升程序的运行速度,而这个IPP库是要另外进行购买的。实际上,Intel为当前的OpenCV免费提供了IPP加速库的一部分,在此我们称之为ippicv。ippicv会在cmake的时候自动从github上下载,但是在网络状况不佳的情况下会下载失败。这时候我们只能采用手动安装的方式。ippicv的下载地址其实就藏在ippicv.cmake文件中。

https://github.com/opencv/opencv/blob/4.5.5/3rdparty/ippicv/ippicv.cmake

里面的变量组合起来就是ippicv的下载网址:

https://raw.githubusercontent.com/opencv/opencv_3rdparty/a56b6ac6f030c312b2dce17430eef13aed9af274/ippicv/ippicv_2020_win_intel64_20191018_general.zip

或者直接访问这个网址也可以获得:

https://github.com/opencv/opencv_3rdparty/tree/ippicv/master_20191018/ippicv

如果在英特尔的处理器上使用,OpenCV就会自动使用一种免费的英特尔集成性能原语库(IPP)的子集,IPP 8.x(IPPICV)。IPPICV可以在编译阶段链接到OpenCV,这样一来,会替代相应的低级优化的C语言代码(在cmake中设置WITH_IPP=ON/OFF开启或者关闭这一功能,默认情况为开启)。使用IPP获得的速度提升非常可观,如下图显示了使用IPP之后得到的加速效果。

5、怎么知道当前OpenCV开启了哪些加速模式?

源码\opencv\sources\modules\core\src\system.cpp,有专门的函数设置和查看优化的情况:

volatile bool useOptimizedFlag = true;void setUseOptimized( bool flag )
{useOptimizedFlag = flag;currentFeatures = flag ? &featuresEnabled : &featuresDisabled;ipp::setUseIPP(flag);
#ifdef HAVE_OPENCLocl::setUseOpenCL(flag);
#endif
}bool useOptimized(void)
{return useOptimizedFlag;
}void setUseIPP(bool flag)
{CoreTLSData& data = getCoreTlsData();
#ifdef HAVE_IPPdata.useIPP = (getIPPSingleton().useIPP)?flag:false;
#elseCV_UNUSED(flag);data.useIPP = false;
#endif
}

\opencv\sources\modules\core\src\ocl.cpp

void setUseOpenCL(bool flag)
{if (!flag)useOpenCL_ = 0;elseuseOpenCL_ = -1;
}

已知宏HAVE_OPENCL是已定义的,HAVE_IPP也是已定义的。所以OpenCL和IPP的功能默认是启动的。

可以通过以下测试用例,来查询当前的opencv是否开启了OPENCL和IPP

#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/core/cuda.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>void checkIPP()
{bool use = cv::ipp::useIPP();bool useNE = cv::ipp::useIPP_NotExact();int status = cv::ipp::getIppStatus();cv::String ver = cv::ipp::getIppVersion();cv::String loc = cv::ipp::getIppErrorLocation();std::cout << "IPP use:" << use << std::endl;std::cout << "IPP useNE:" << useNE << std::endl;std::cout << "IPP status:" << status << std::endl;std::cout << "IPP ver:" << ver << std::endl;std::cout << "IPP loc:" << loc << std::endl;
}void checkOCL()
{//查询opencv当前是否开启了OpenCL功能bool ret1 = cv::ocl::haveOpenCL();bool ret2 = cv::ocl::useOpenCL();std::cout << "default haveOpenCL:" << ret1 << std::endl;std::cout << "default useOpenCL:" << ret2 << std::endl;
}

6、最后,举例来看看canny算子的优化情况是什么?

\opencv\sources\modules\imgproc\src\canny.cpp,源码有宏定义#if CV_SIMD,

如果使能了宏定义CV_SIMD,就会开启指令集的优化源码。CV_SIMD默认是开启的,在头文件

\opencv\sources\modules\core\include\opencv2\core\hal\intrin.hpp,有定义

#if (CV_SSE2 || CV_NEON || CV_VSX || CV_MSA || CV_WASM_SIMD || CV_RVV071 || CV_RVV) && !defined(CV_FORCE_SIMD128_CPP)
#define CV__SIMD_FORWARD 128
#include "opencv2/core/hal/intrin_forward.hpp"
#endif#if CV_SSE2 && !defined(CV_FORCE_SIMD128_CPP)#include "opencv2/core/hal/intrin_sse_em.hpp"
#include "opencv2/core/hal/intrin_sse.hpp"namespace CV__SIMD_NAMESPACE {#define CV_SIMD CV_SIMD128

另外,canny算子默认是开启了PPL并行计算的,并且也默认开启了对OpenCL和IPP的支持,一起来看看源码\opencv\sources\modules\imgproc\src\canny.cpp,

void Canny( InputArray _src, OutputArray _dst,double low_thresh, double high_thresh,int aperture_size, bool L2gradient )
{......CV_OCL_RUN(_dst.isUMat() && (_src.channels() == 1 || _src.channels() == 3),ocl_Canny<false>(_src, UMat(), UMat(), _dst, (float)low_thresh, (float)high_thresh, aperture_size, L2gradient, _src.channels(), size))......CALL_HAL(canny, cv_hal_canny, src.data, src.step, dst.data, dst.step, src.cols, src.rows, src.channels(),low_thresh, high_thresh, aperture_size, L2gradient);......CV_IPP_RUN_FAST(ipp_Canny(src, Mat(), Mat(), dst, (float)low_thresh, (float)high_thresh, L2gradient, aperture_size))......parallel_for_(Range(0, src.rows), parallelCanny(src, map, stack, low, high, aperture_size, L2gradient), numOfThreads);......
}void Canny( InputArray _dx, InputArray _dy, OutputArray _dst,double low_thresh, double high_thresh,bool L2gradient )
{......CV_OCL_RUN(_dst.isUMat(),ocl_Canny<true>(UMat(), _dx.getUMat(), _dy.getUMat(), _dst, (float)low_thresh, (float)high_thresh, 0, L2gradient, _dx.channels(), size))......CV_IPP_RUN_FAST(ipp_Canny(Mat(), dx, dy, dst, (float)low_thresh, (float)high_thresh, L2gradient, 0))......parallel_for_(Range(0, dx.rows), parallelCanny(dx, dy, map, stack, low, high, L2gradient), numOfThreads);......
}

\opencv\sources\modules\imgproc\src\hal_replacement.hpp

inline int hal_ni_canny(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step,
int width, int height, int cn, double lowThreshold, double highThreshold, int ksize, bool L2gradient)
{ return CV_HAL_ERROR_NOT_IMPLEMENTED; }//! @cond IGNORED
#define cv_hal_canny hal_ni_canny
//! @endcond

注意:

(1)CV_OCL_RUN有参数_dst.isUMat(),表明OpenCL需要使用UMat作为图像格式的输入,否则会失效;

(2)CALL_HAL是硬件加速,但是深入源码却发现返回值是CV_HAL_ERROR_NOT_IMPLEMENTED,表明该功能官方暂时未实现。

(3)以上两个如果都不执行,那么就执行CV_IPP_RUN_FAST(ipp_Canny...

\opencv\sources\modules\core\include\opencv2\core\private.hpp

#define CV_IPP_RUN_(condition, func, ...)                                   \{                                                                       \if (cv::ipp::useIPP() && (condition))                               \{                                                                   \CV__TRACE_REGION_("IPP:" #func, CV_TRACE_NS::details::REGION_FLAG_IMPL_IPP) \if(func)                                                        \{                                                               \CV_IMPL_ADD(CV_IMPL_IPP);                                   \}                                                               \else                                                            \{                                                               \setIppErrorStatus();                                        \CV_Error(cv::Error::StsAssert, #func);                      \}                                                               \return __VA_ARGS__;                                             \}                                                                   \}
#else
#define CV_IPP_RUN_(condition, func, ...)                                   \if (cv::ipp::useIPP() && (condition))                               \{                                                                   \CV__TRACE_REGION_("IPP:" #func, CV_TRACE_NS::details::REGION_FLAG_IMPL_IPP) \if(func)                                                        \{                                                               \CV_IMPL_ADD(CV_IMPL_IPP);                                   \return __VA_ARGS__;                                         \}                                                               \}
#endif
#else
#define CV_IPP_RUN_(condition, func, ...)
#endif#define CV_IPP_RUN_FAST(func, ...) CV_IPP_RUN_(true, func, __VA_ARGS__)
#define CV_IPP_RUN(condition, func, ...) CV_IPP_RUN_((condition), (func), __VA_ARGS__)

x1、写个测试用例,请重点关注函数cv::getBuildInformation()

//OpenCV算法加速(4)官方源码v4.5.5的默认并行加速的编译选项是什么?
//https://libaineu2004.blog.csdn.net/article/details/122677388
//OpenCV 加速图像处理
//https://www.cnblogs.com/ybqjymy/p/13691132.html
//为什么OpenCV用GPU/cuda跑得比用CPU慢?
//https://blog.csdn.net/libaineu2004/article/details/129801112//执行这些简单算子,CPU比GPU更快
//cvtColor,GaussianBlur,Canny
//执行这些耗时算子,GPU比CPU更快
//HoughCircles,HoughLines,matchTemplate#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/core/cuda.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/cudaimgproc.hpp>
#include <opencv2/cudaarithm.hpp>
#include <opencv2/cudafilters.hpp>
#include <opencv2/cudawarping.hpp>#define IMAGE_TEST_PATHNAME "D:\\test_src.jpg"
#define IMAGE_SOURCE        "D:\\test_src.jpg"
#define IMAGE_TEMPLATE      "D:\\test_templ.jpg"void checkBuild()
{//查询opencv编译时配置std::cout << cv::getBuildInformation() << std::endl;
}void checkIPP()
{//Intel® Integrated Performance Primitives//https://www.intel.com/content/www/us/en/developer/tools/oneapi/ipp.htmlbool use = cv::ipp::useIPP();bool useNE = cv::ipp::useIPP_NotExact();int status = cv::ipp::getIppStatus();cv::String ver = cv::ipp::getIppVersion();cv::String loc = cv::ipp::getIppErrorLocation();std::cout << "IPP use:" << use << std::endl;std::cout << "IPP useNE:" << useNE << std::endl;std::cout << "IPP status:" << status << std::endl;std::cout << "IPP ver:" << ver << std::endl;std::cout << "IPP loc:" << loc << std::endl;
}void checkSimd()
{//查询opencv线程int numTh = cv::getNumThreads(); //默认值是cpu的逻辑线程数int numCore = cv::getNumberOfCPUs();std::cout << "getNumThreads=" << numTh << std::endl;std::cout << "getNumberOfCPUs=" << numCore << std::endl;//查询opencv当前是否开启了并行优化功能bool opt = cv::useOptimized(); //默认值是truestd::cout << "useOptimized=" << opt << std::endl;//查询opencv当前是否开启了OpenCL功能bool ret1 = cv::ocl::haveOpenCL();bool ret2 = cv::ocl::useOpenCL();std::cout << "default haveOpenCL:" << ret1 << std::endl;std::cout << "default useOpenCL:" << ret2 << std::endl;//查询opencv当前是否支持具体的CPU指令集bool check1 = cv::checkHardwareSupport(CV_CPU_SSE4_1);bool check2 = cv::checkHardwareSupport(CV_CPU_SSE4_2);bool check3 = cv::checkHardwareSupport(CV_CPU_AVX2);std::cout << "CV_CPU_SSE4_1=" << check1 << std::endl;std::cout << "CV_CPU_SSE4_2=" << check2 << std::endl;std::cout << "CV_CPU_AVX2=" << check3 << std::endl;//查询完整的硬件支持清单std::cout << "HardwareSupport:" << std::endl;std::cout << "CV_CPU_MMX: " << cv::checkHardwareSupport(CV_CPU_MMX) << std::endl;std::cout << "CV_CPU_SSE: " << cv::checkHardwareSupport(CV_CPU_SSE) << std::endl;std::cout << "CV_CPU_SSE2: " << cv::checkHardwareSupport(CV_CPU_SSE2) << std::endl;std::cout << "CV_CPU_SSE3: " << cv::checkHardwareSupport(CV_CPU_SSE3) << std::endl;std::cout << "CV_CPU_SSSE3: " << cv::checkHardwareSupport(CV_CPU_SSSE3) << std::endl;std::cout << "CV_CPU_SSE4_1: " << cv::checkHardwareSupport(CV_CPU_SSE4_1) << std::endl;std::cout << "CV_CPU_SSE4_2: " << cv::checkHardwareSupport(CV_CPU_SSE4_2) << std::endl;std::cout << "CV_CPU_POPCNT: " << cv::checkHardwareSupport(CV_CPU_POPCNT) << std::endl;std::cout << "CV_CPU_FP16: " << cv::checkHardwareSupport(CV_CPU_FP16) << std::endl;std::cout << "CV_CPU_AVX: " << cv::checkHardwareSupport(CV_CPU_AVX) << std::endl;std::cout << "CV_CPU_AVX2: " << cv::checkHardwareSupport(CV_CPU_AVX2) << std::endl;std::cout << "CV_CPU_FMA3: " << cv::checkHardwareSupport(CV_CPU_FMA3) << std::endl;std::cout << "CV_CPU_AVX_512F: " << cv::checkHardwareSupport(CV_CPU_AVX_512F) << std::endl;std::cout << "CV_CPU_AVX_512BW: " << cv::checkHardwareSupport(CV_CPU_AVX_512BW) << std::endl;std::cout << "CV_CPU_AVX_512CD: " << cv::checkHardwareSupport(CV_CPU_AVX_512CD) << std::endl;std::cout << "CV_CPU_AVX_512DQ: " << cv::checkHardwareSupport(CV_CPU_AVX_512DQ) << std::endl;std::cout << "CV_CPU_AVX_512ER: " << cv::checkHardwareSupport(CV_CPU_AVX_512ER) << std::endl;std::cout << "CV_CPU_AVX_512IFMA512: " << cv::checkHardwareSupport(CV_CPU_AVX_512IFMA512) << std::endl;std::cout << "CV_CPU_AVX_512IFMA: " << cv::checkHardwareSupport(CV_CPU_AVX_512IFMA) << std::endl;std::cout << "CV_CPU_AVX_512PF: " << cv::checkHardwareSupport(CV_CPU_AVX_512PF) << std::endl;std::cout << "CV_CPU_AVX_512VBMI: " << cv::checkHardwareSupport(CV_CPU_AVX_512VBMI) << std::endl;std::cout << "CV_CPU_AVX_512VL: " << cv::checkHardwareSupport(CV_CPU_AVX_512VL) << std::endl;std::cout << "CV_CPU_NEON: " << cv::checkHardwareSupport(CV_CPU_NEON) << std::endl;std::cout << "CV_CPU_VSX: " << cv::checkHardwareSupport(CV_CPU_VSX) << std::endl;std::cout << "CV_CPU_AVX512_SKX: " << cv::checkHardwareSupport(CV_CPU_AVX512_SKX) << std::endl;std::cout << "CV_HARDWARE_MAX_FEATURE: " << cv::checkHardwareSupport(CV_HARDWARE_MAX_FEATURE) << std::endl;std::cout << std::endl;//cv::setUseOptimized(false);//cv::setNumThreads(1);
}void checkCuda() //旧版本的是cv::gpu,#include <opencv2/gpu/gpu.hpp>,已弃用
{int64 begintime, endtime;int num_devices = cv::cuda::getCudaEnabledDeviceCount();if (num_devices <= 0){std::cerr << "There is no cuda device" << std::endl;return;}int enable_device_id = -1;for (int i = 0; i < num_devices; i++){cv::cuda::DeviceInfo dev_info(i);if (dev_info.isCompatible()){enable_device_id = i;}}if (enable_device_id < 0){std::cerr << "GPU module isn't built for GPU" << std::endl;return;}cv::cuda::setDevice(enable_device_id); //指定显卡//有一个问题是,一般使用GPU加速的话,第一次调用GPU,会很慢很慢,一条简单的语句都用了10多秒左右。//治标不治本的解决方法是在程序的开头加上一句cv::gpu::GpuMata(10, 10, CV_8U);//这样会令耗时的操作放在一开头,不那么影响后面的操作时间//为什么第一次函数调用很慢//那是因为初始化开销;在第一个GPU函数调用Cuda Runtime API被隐式初始化;cv::cuda::GpuMat(10, 10, CV_8U);//测试用例cv::Mat src_image = cv::imread(IMAGE_TEST_PATHNAME);cv::Mat dst_image;cv::cuda::GpuMat d_src_img(src_image); //upload src image to gpu//或者d_src_img.upload(src_image);cv::cuda::GpuMat d_dst_img;begintime = cv::getTickCount();cv::cuda::cvtColor(d_src_img, d_dst_img, cv::COLOR_BGR2GRAY); //cannyd_dst_img.download(dst_image);                                //download dst image to cpuendtime = cv::getTickCount();std::cerr << 1000 * (endtime - begintime) / cv::getTickFrequency() << std::endl;cv::namedWindow("checkCuda", cv::WINDOW_NORMAL);cv::imshow("checkCuda", dst_image);
}void checkOpenCL() //Open Computing Language:开放计算语言,可以附加在主机处理器的CPU或GPU上执行
{std::vector<cv::ocl::PlatformInfo> info;cv::ocl::getPlatfomsInfo(info);cv::ocl::PlatformInfo sdk = info.at(0);int number = sdk.deviceNumber();if (number < 1){std::cout << "Number of devices:" << number << std::endl;return;}std::cout << "***********SDK************" << std::endl;std::cout << "Name:" << sdk.name() << std::endl;std::cout << "Vendor:" << sdk.vendor() << std::endl;std::cout << "Version:" << sdk.version() << std::endl;std::cout << "Version:" << sdk.version() << std::endl;std::cout << "Number of devices:" << number << std::endl;for (int i = 0; i < number; i++){std::cout << std::endl;cv::ocl::Device device;sdk.getDevice(device, i);std::cout << "***********Device " << i + 1 << "***********" << std::endl;std::cout << "Vendor Id:" << device.vendorID() << std::endl;std::cout << "Vendor name:" << device.vendorName() << std::endl;std::cout << "Name:" << device.name() << std::endl;std::cout << "Driver version:" << device.vendorID() << std::endl;if (device.isAMD())std::cout << "Is AMD device" << std::endl;if (device.isIntel())std::cout << "Is Intel device" << std::endl;if (device.isNVidia())std::cout << "Is NVidia device" << std::endl;std::cout << "Global Memory size:" << device.globalMemSize() << std::endl;std::cout << "Memory cache size:" << device.globalMemCacheSize() << std::endl;std::cout << "Memory cache type:" << device.globalMemCacheType() << std::endl;std::cout << "Local Memory size:" << device.localMemSize() << std::endl;std::cout << "Local Memory type:" << device.localMemType() << std::endl;std::cout << "Max Clock frequency:" << device.maxClockFrequency() << std::endl;}
}void calcEdgesCPU()
{cv::ocl::setUseOpenCL(false);bool ret1 = cv::ocl::haveOpenCL();bool ret2 = cv::ocl::useOpenCL();std::cout << "haveOpenCL:" << ret1 << std::endl;std::cout << "useOpenCL:" << ret2 << std::endl;double start = cv::getTickCount();cv::Mat cpuGray, cpuBlur, cpuEdges;cv::Mat cpuFrame = cv::imread(IMAGE_TEST_PATHNAME);cvtColor(cpuFrame, cpuGray, cv::COLOR_BGR2GRAY);cv::GaussianBlur(cpuGray, cpuBlur, cv::Size(3, 3), 15, 15);cv::Canny(cpuBlur, cpuEdges, 50, 100, 3);std::vector<cv::Vec3f> vtCir;cv::HoughCircles(cpuBlur, vtCir, cv::HOUGH_GRADIENT_ALT, 1.5, 15, 300, 0.8, 1, 100);cv::HoughCircles(cpuBlur, vtCir, cv::HOUGH_GRADIENT, 1, 15, 100, 30, 1, 100);std::cout << "CPU cost time:(s)" << ((cv::getTickCount() - start) / cv::getTickFrequency()) << std::endl;cv::namedWindow("Canny Edges CPU", cv::WINDOW_NORMAL);cv::imshow("Canny Edges CPU", cpuEdges);
}void calcEdgesOpenCL()
{cv::ocl::setUseOpenCL(true);bool ret1 = cv::ocl::haveOpenCL();bool ret2 = cv::ocl::useOpenCL();std::cout << "haveOpenCL:" << ret1 << std::endl;std::cout << "useOpenCL:" << ret2 << std::endl;//通过使用UMat对象,OpenCV会自动在支持OpenCL的设备上使用GPU运算,在不支持OpenCL的设备仍然使用CPU运算,这样就避免了程序运行失败,而且统一了接口。double start = cv::getTickCount();cv::UMat gpuFrame, gpuGray, gpuBlur, gpuEdges;cv::Mat cpuFrame = cv::imread(IMAGE_TEST_PATHNAME);cpuFrame.copyTo(gpuFrame); //Mat与UMat相互转换cvtColor(gpuFrame, gpuGray, cv::COLOR_BGR2GRAY);cv::GaussianBlur(gpuGray, gpuBlur, cv::Size(3, 3), 15, 15);cv::Canny(gpuBlur, gpuEdges, 50, 100, 3);std::vector<cv::Vec3f> vtCir;cv::HoughCircles(gpuBlur, vtCir, cv::HOUGH_GRADIENT_ALT, 1.5, 15, 300, 0.8, 1, 100);cv::HoughCircles(gpuBlur, vtCir, cv::HOUGH_GRADIENT, 1, 15, 100, 30, 1, 100);std::cout << "OpenCL cost time:(s)" << ((cv::getTickCount() - start) / cv::getTickFrequency()) << std::endl;cv::Mat matResult = gpuEdges.getMat(cv::ACCESS_READ); //Mat与UMat相互转换cv::namedWindow("Canny Edges OpenCL1", cv::WINDOW_NORMAL);cv::imshow("Canny Edges OpenCL1", matResult);cv::namedWindow("Canny Edges OpenCL2", cv::WINDOW_NORMAL);cv::imshow("Canny Edges OpenCL2", gpuEdges);
}void calcEdgesCuda()
{cv::ocl::setUseOpenCL(false);double start = cv::getTickCount();cv::cuda::GpuMat gpuGray, gpuBlur, gpuEdges;cv::Mat cpuEdges;cv::Mat cpuFrame = cv::imread(IMAGE_TEST_PATHNAME);cv::cuda::registerPageLocked(cpuFrame); //锁页内存cv::cuda::GpuMat gpuFrame;gpuFrame.upload(cpuFrame);cv::cuda::cvtColor(gpuFrame, gpuGray, cv::COLOR_BGR2GRAY);cv::Ptr<cv::cuda::Filter> gaussFilter = cv::cuda::createGaussianFilter(CV_8UC1, CV_8UC1, cv::Size(3, 3), 15, 15);gaussFilter->apply(gpuGray, gpuBlur);cv::Ptr<cv::cuda::CannyEdgeDetector> cannyEdge = cv::cuda::createCannyEdgeDetector(50, 100, 3);cannyEdge->detect(gpuBlur, gpuEdges);cv::cuda::GpuMat gpuLines; //This should be GpuMat...
#if 0                          //find linestd::vector<cv::Vec2f> vtLines;cv::Ptr<cv::cuda::HoughLinesDetector> hough = cv::cuda::createHoughLinesDetector(1, CV_PI / 180, 120);hough->detect(gpuEdges, gpuLines);hough->downloadResults(gpuLines, vtLines);
#elsecv::Ptr<cv::cuda::HoughCirclesDetector> hough1 = cv::cuda::createHoughCirclesDetector(1.5, 15, 300, 1, 1, 100);hough1->detect(gpuEdges, gpuLines);cv::Ptr<cv::cuda::HoughCirclesDetector> hough2 = cv::cuda::createHoughCirclesDetector(1, 15, 100, 30, 1, 100);hough2->detect(gpuEdges, gpuLines);
#endifgpuEdges.download(cpuEdges);cv::cuda::unregisterPageLocked(cpuFrame); //解除锁页std::cout << "Cuda cost time:(s)" << ((cv::getTickCount() - start) / cv::getTickFrequency()) << std::endl;cv::namedWindow("Canny Edges Cuda", cv::WINDOW_NORMAL);cv::imshow("Canny Edges Cuda", cpuEdges);
}//https://stackoverflow.com/questions/75571990/opencv-cuda-different-outcomes-on-cpu-vs-gpu
cv::cuda::GpuMat extractTargetsGPU(cv::cuda::GpuMat frame)
{std::vector<cv::cuda::GpuMat> channels;cv::cuda::split(frame, channels);cv::cuda::Stream bThresholdStream, gThresholdStream, rThresholdStream;cv::cuda::threshold(channels[0], channels[0], 127, 255, cv::THRESH_BINARY, bThresholdStream);cv::cuda::threshold(channels[1], channels[1], 127, 255, cv::THRESH_BINARY, gThresholdStream);cv::cuda::threshold(channels[2], channels[2], 127, 255, cv::THRESH_BINARY_INV, rThresholdStream);bThresholdStream.waitForCompletion();gThresholdStream.waitForCompletion();rThresholdStream.waitForCompletion();cv::cuda::bitwise_and(channels[1], channels[2], channels[1]);cv::cuda::bitwise_and(channels[0], channels[1], channels[0]);cv::Ptr<cv::cuda::Filter> gaussianFilter = cv::cuda::createGaussianFilter(CV_8UC1, CV_8UC1, cv::Size(9, 9), 1.0f);gaussianFilter->apply(channels[0], channels[0]);cv::cuda::GpuMat targets;cv::Ptr<cv::cuda::HoughCirclesDetector> houghCircles = cv::cuda::createHoughCirclesDetector(1, 20, 50, 30, 1, 75);houghCircles->detect(channels[0], targets);return targets;
}void matchTemplateCPU()
{cv::Mat src = cv::imread(IMAGE_SOURCE, cv::IMREAD_GRAYSCALE);cv::Mat templ = cv::imread(IMAGE_TEMPLATE, cv::IMREAD_GRAYSCALE);cv::Mat dst;double minVal = 0;double maxVal = 0;cv::Point minLoc;cv::Point maxLoc;double start = cv::getTickCount();cv::matchTemplate(src, templ, dst, cv::TM_CCOEFF_NORMED); //用6种匹配方式cv::normalize(dst, dst, 1, 0, cv::NORM_MINMAX);cv::minMaxLoc(dst, &minVal, &maxVal, &minLoc, &maxLoc); //找到最佳匹配点std::cout << "matchTemplateCPU cost time:(s)" << ((cv::getTickCount() - start) / cv::getTickFrequency()) << std::endl;cv::rectangle(src, cv::Rect(maxLoc.x, maxLoc.y, templ.cols, templ.rows), 1, 8, 0);cv::namedWindow("matchTemplateCPU", cv::WINDOW_NORMAL);cv::imshow("matchTemplateCPU", src);
}void matchTemplateCuda()
{cv::Mat src = cv::imread(IMAGE_SOURCE, cv::IMREAD_GRAYSCALE);cv::Mat templ = cv::imread(IMAGE_TEMPLATE, cv::IMREAD_GRAYSCALE);cv::Mat dst;double minVal = 0;double maxVal = 0;cv::Point minLoc;cv::Point maxLoc;cv::cuda::GpuMat gsrc, gtempl, gdst;double start = cv::getTickCount();gsrc.upload(src);gtempl.upload(templ);cv::Ptr<cv::cuda::TemplateMatching> matcher;matcher = cv::cuda::createTemplateMatching(CV_8U, cv::TM_CCOEFF_NORMED);matcher->match(gsrc, gtempl, gdst);cv::cuda::minMaxLoc(gdst, &minVal, &maxVal, &minLoc, &maxLoc);std::cout << "matchTemplateCuda cost time:(s)" << ((cv::getTickCount() - start) / cv::getTickFrequency()) << std::endl;cv::rectangle(src, cv::Rect(maxLoc.x, maxLoc.y, templ.cols, templ.rows), 1, 8, 0);cv::namedWindow("matchTemplateCuda", cv::WINDOW_NORMAL);cv::imshow("matchTemplateCuda", src);
}int main(int argc, char *argv[])
{checkBuild();checkIPP();checkSimd();checkCuda();checkOpenCL();calcEdgesCPU();calcEdgesOpenCL();calcEdgesCuda();matchTemplateCPU();matchTemplateCuda();cv::waitKey(0);return 0;
}

cv::getBuildInformation()得到的结果是:

General configuration for OpenCV 4.5.5 =====================================Version control:               4.5.5Platform:Timestamp:                   2021-12-25T04:33:15ZHost:                        Windows 10.0.19041 AMD64CMake:                       3.16.4CMake generator:             Visual Studio 15 2017CMake build tool:            C:/Program Files (x86)/Microsoft Visual Studio/2017/Professional/MSBuild/15.0/Bin/MSBuild.exeMSVC:                        1916Configuration:               Debug ReleaseCPU/HW features:Baseline:                    SSE SSE2 SSE3requested:                 SSE3Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKXrequested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKXSSE4_1 (16 files):         + SSSE3 SSE4_1SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVXAVX (4 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVXAVX2 (31 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2AVX512_SKX (5 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKXC/C++:Built as dynamic libs?:      YESC++ standard:                11C++ Compiler:                C:/Program Files (x86)/Microsoft Visual Studio/2017/Professional/VC/Tools/MSVC/14.16.27023/bin/Hostx86/x64/cl.exe  (ver 19.16.27042.0)C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP2  /MD /O2 /Ob2 /DNDEBUGC++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP2  /MDd /Zi /Ob0 /Od /RTC1C Compiler:                  C:/Program Files (x86)/Microsoft Visual Studio/2017/Professional/VC/Tools/MSVC/14.16.27023/bin/Hostx86/x64/cl.exeC flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP2   /MD /O2 /Ob2 /DNDEBUGC flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /MP2 /MDd /Zi /Ob0 /Od /RTC1Linker flags (Release):      /machine:x64  /INCREMENTAL:NOLinker flags (Debug):        /machine:x64  /debug /INCREMENTALccache:                      NOPrecompiled headers:         NOExtra dependencies:3rdparty dependencies:OpenCV modules:To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo stitching video videoio worldDisabled:                    python2 python3Disabled by dependency:      -Unavailable:                 java tsApplications:                appsDocumentation:               NONon-free algorithms:         NOWindows RT support:            NOGUI:Win32 UI:                    YESVTK support:                 NOMedia I/O:ZLib:                        build (ver 1.2.11)JPEG:                        build-libjpeg-turbo (ver 2.1.2-62)WEBP:                        build (ver encoder: 0x020f)PNG:                         build (ver 1.6.37)TIFF:                        build (ver 42 - 4.2.0)JPEG 2000:                   build (ver 2.4.0)OpenEXR:                     build (ver 2.3.0)HDR:                         YESSUNRASTER:                   YESPXM:                         YESPFM:                         YESVideo I/O:DC1394:                      NOFFMPEG:                      YES (prebuilt binaries)avcodec:                   YES (58.134.100)avformat:                  YES (58.76.100)avutil:                    YES (56.70.100)swscale:                   YES (5.9.100)avresample:                YES (4.0.0)GStreamer:                   NODirectShow:                  YESMedia Foundation:            YESDXVA:                      YESParallel framework:            ConcurrencyTrace:                         YES (with Intel ITT)Other third-party libraries:Intel IPP:                   2020.0.0 Gold [2020.0.0]at:                   C:/build/master_winpack-build-win64-vc15/build/3rdparty/ippicv/ippicv_win/icvIntel IPP IW:                sources (2020.0.0)at:                C:/build/master_winpack-build-win64-vc15/build/3rdparty/ippicv/ippicv_win/iwEigen:                       NOCustom HAL:                  NOProtobuf:                    build (3.19.1)OpenCL:                        YES (NVD3D11)Include path:                C:/build/master_winpack-build-win64-vc15/opencv/3rdparty/include/opencl/1.2Link libraries:              Dynamic loadPython (for build):            C:/utils/soft/python27-x64/python.exeJava:ant:                         C:/utils/soft/apache-ant-1.9.7/bin/ant.bat (ver 1.9.7)JNI:                         C:/Program Files/Java/jdk1.8.0_112/include C:/Program Files/Java/jdk1.8.0_112/include/win32 C:/Program Files/Java/jdk1.8.0_112/includeJava wrappers:               NOJava tests:                  NOInstall to:                    C:/build/master_winpack-build-win64-vc15/install
-----------------------------------------------------------------

x2、姊妹篇

一文彻底搞懂为什么OpenCV用GPU/cuda跑得比用CPU慢?_利白的博客-CSDN博客

OpenCV算法加速(4)官方源码v4.5.5的默认并行和优化加速的编译选项是什么?请重点关注函数cv::getBuildInformation()的返回值相关推荐

  1. 论文阅读: (ICDAR2021 海康威视)LGPMA(表格识别算法)及官方源码对应解读

    目录 引言 2022-06-08 update LGPMA整体结构 训练阶段 Aligned Boudning Box Detection(对齐的包围框检测) Local Pyramid Mask A ...

  2. Python基于OpenCV的实时疲劳检测[源码&演示视频&部署教程]

    1.图片演示 2.视频演示 [项目分享]Python基于OpenCV的实时疲劳检测[源码&演示视频&部署教程]_哔哩哔哩_bilibili 3.检测方法 1)方法 与用于计算眨眼的传统 ...

  3. 较高人工智能的人机博弈程序实现(多个算法结合)含C++源码

    较高人工智能的人机博弈程序实现(多个算法结合)含C++源码 本文由恋花蝶最初发表于http://blog.csdn.net/lanphaday 上,您可以转载.引用.打印和分发等,但必须保留本文完整和 ...

  4. [HOW TO]-下载android官方源码

    介绍下载android官方源码的方式: 使用每月更新的初始化包 传统初始化方法 1.使用每月更新的初始化包 下载初始化包->repo sync wget -c https://mirrors.t ...

  5. JavaScript实现唯一路径问题的动态编程方法的算法(附完整源码)

    JavaScript实现唯一路径问题的动态编程方法的算法(附完整源码) dpUniquePaths.js完整源代码 dpUniquePaths.test.js完整源代码 dpUniquePaths.j ...

  6. JavaScript实现唯一路径问题的回溯方法的算法(附完整源码)

    JavaScript实现唯一路径问题的回溯方法的算法(附完整源码) btUniquePaths.js完整源代码 btUniquePaths.test.js完整源代码 btUniquePaths.js完 ...

  7. JavaScript实现squareMatrixRotation方阵旋转算法(附完整源码)

    JavaScript实现squareMatrixRotation方阵旋转算法(附完整源码) squareMatrixRotation.js完整源代码 squareMatrixRotation.test ...

  8. JavaScript实现递归楼梯问题(带记忆的递归解决方案)算法(附完整源码)

    JavaScript实现递归楼梯问题(带记忆的递归解决方案)算法(附完整源码) recursiveStaircaseMEM.js完整源代码 recursiveStaircaseMEM.test.js完 ...

  9. JavaScript实现递归楼梯问题(迭代解决方案)算法(附完整源码)

    JavaScript实现递归楼梯问题(迭代解决方案)算法(附完整源码) recursiveStaircaseIT.js完整源代码 recursiveStaircaseIT.test.js完整源代码 r ...

最新文章

  1. IDEA新建springboot项目发生错误
  2. niceyoo的2020年终总结-2021年Flag
  3. Flash--元件和实例
  4. 为双十二灵感设计屯好素材!
  5. 工作后辞职原因是什么?
  6. linux下limits.conf 修改不生效的原因
  7. C语言丨约瑟夫问题(约瑟夫环)
  8. convertTo函数的用法
  9. 汇编语言第一课作业1.1
  10. 计算机网络五层结构要点以及功能,试述五层协议的网络体系结构的要点,包括各层的主要功能...
  11. virtualhost配置
  12. laravel 项目笔记之SendCloud 驱动
  13. 顺序队列,链队列的基本操作
  14. 杭州出租车改革何以值得肯定
  15. 洛谷P5804 [SEERC2019]Absolute Game
  16. “68道 Redis+168道 MySQL”精品面试题(带解析),你背废了吗?
  17. 批量转换中文名称为英文名称(注:一般为转换格式拼音)
  18. 或是独体字吗_独体字结构 独体结构的字有哪些字?
  19. SourceTree提交代码报错git -c diff.mnemonicprefix=false -c core.quotepath=false commit -q -F C:\Users\honry
  20. 做产品和运营必须深参这5大人性弱点

热门文章

  1. java向上取整函数_java取整函数,向上取整函数Math.ceil()
  2. 华为云主机云服务器备份的发展和现状
  3. android ndk 编译虚幻4,[UE4]Android 打包步骤与keystore生成设置
  4. React Native 项目iOS真机调试红屏
  5. 动态获取iphone键盘的高度
  6. android代码修改mp3文件封面,android-从音频文件Uri获取专辑封面
  7. 参会记录|2021 ACM Multimedia 学术会议参会总结
  8. 《最值得收藏的python3语法汇总》之运算符
  9. 开放式测试用户显示退订状态后如何进行重新订阅
  10. 强缓存和弱缓存是什么