windows+libtorch+vs2019+yolov5项目部署实践总结

  • 前言
    • 环境配置
    • 环境搭建参考:
    • 给出我的libtorch配置
    • GPU模型 导出 export代码
      • 效果展示
      • 结束

前言

这是本人第一篇博客,只是对近期学习工作的一些总结。主要是利用libtorch对pytorch训练的模型进行部署,之前也是成功使用pyinstaller将整个python项目进行打包成exe,但是不满足对方的需求才使用的libtorch。

环境配置

vs2019+opencv4.5+libtorch1.7.1:
1.vs2019下载:链接: link
2. opencv官网:链接: link
3. Libtorch下载:链接: link推荐下载release版本(版本需要与训练模型pytorch版本符合,cuda版本需要相符)

环境搭建参考:

https://blog.csdn.net/weixin_44936889/article/details/111186818
https://blog.csdn.net/zzz_zzz12138/article/details/109138805
https://blog.csdn.net/wenghd22/article/details/112512231
vs2019和opencv配置过程参考链接;
https://blog.csdn.net/sophies671207/article/details/89854368

给出我的libtorch配置

给出我的libtorch配置过程:新建空项目 新建main.cpp文件
1、新建项目->属性->VC++目录->包含目录

2新建项目->属性->VC++目录->库目录

3新建项目->属性->C/C++目录->常规->附加包含目录

4新建项目->属性->C/C++目录->常规->SDL检查 :否

5新建项目->属性->连接器->输入->附加依赖项:写入以下
E:\opencv\build\x64\vc15\lib\opencv_world450.lib
c10.lib
asmjit.lib
c10_cuda.lib
caffe2_detectron_ops_gpu.lib
caffe2_module_test_dynamic.lib
caffe2_nvrtc.lib
clog.lib
cpuinfo.lib
dnnl.lib
fbgemm.lib
libprotobuf.lib
libprotobuf-lite.lib
libprotoc.lib
mkldnn.lib
torch.lib
torch_cuda.lib
torch_cpu.lib
kernel32.lib
user32.lib
gdi32.lib
winspool.lib
comdlg32.lib
advapi32.lib
shell32.lib
ole32.lib
oleaut32.lib
uuid.lib
odbc32.lib
odbccp32.lib

6新建项目->属性->连接器->命令行:输入/INCLUDE:?warp_size@cuda@at@@YAHXZ

7新建项目->属性->C/C++目录->语言->符合模式 :否

配置好了以上环境,打包好的文件夹如下图:

权重文件:官方权重导出固定尺度模型即可 直接运行main.cpp即可。采用的samples文件夹下的图片进行测试。
main.cpp代码 十六行使用GPU时需注意


#include <opencv2/opencv.hpp>
#include <torch/script.h>
#include <torch/torch.h>
#include <algorithm>
#include <iostream>
#include <time.h>std::vector<torch::Tensor> non_max_suppression(torch::Tensor preds, float score_thresh = 0.01, float iou_thresh = 0.35)
{std::vector<torch::Tensor> output;for (size_t i = 0; i < preds.sizes()[0]; ++i){torch::Tensor pred = preds.select(0, i);//GPU推理结果为cuda数据类型,nms之前要转成cpu,否则会报错pred = pred.to(at::kCPU); //增加到函数里pred = pred.to(at::kCPU); 注意preds的数据类型,转成cpu进行后处理。// Filter by scorestorch::Tensor scores = pred.select(1, 4) * std::get<0>(torch::max(pred.slice(1, 5, pred.sizes()[1]), 1));pred = torch::index_select(pred, 0, torch::nonzero(scores > score_thresh).select(1, 0));if (pred.sizes()[0] == 0) continue;// (center_x, center_y, w, h) to (left, top, right, bottom)pred.select(1, 0) = pred.select(1, 0) - pred.select(1, 2) / 2;pred.select(1, 1) = pred.select(1, 1) - pred.select(1, 3) / 2;pred.select(1, 2) = pred.select(1, 0) + pred.select(1, 2);pred.select(1, 3) = pred.select(1, 1) + pred.select(1, 3);// Computing scores and classesstd::tuple<torch::Tensor, torch::Tensor> max_tuple = torch::max(pred.slice(1, 5, pred.sizes()[1]), 1);pred.select(1, 4) = pred.select(1, 4) * std::get<0>(max_tuple);pred.select(1, 5) = std::get<1>(max_tuple);torch::Tensor  dets = pred.slice(1, 0, 6);torch::Tensor keep = torch::empty({ dets.sizes()[0] });torch::Tensor areas = (dets.select(1, 3) - dets.select(1, 1)) * (dets.select(1, 2) - dets.select(1, 0));std::tuple<torch::Tensor, torch::Tensor> indexes_tuple = torch::sort(dets.select(1, 4), 0, 1);torch::Tensor v = std::get<0>(indexes_tuple);torch::Tensor indexes = std::get<1>(indexes_tuple);int count = 0;while (indexes.sizes()[0] > 0){keep[count] = (indexes[0].item().toInt());count += 1;// Computing overlapstorch::Tensor lefts = torch::empty(indexes.sizes()[0] - 1);torch::Tensor tops = torch::empty(indexes.sizes()[0] - 1);torch::Tensor rights = torch::empty(indexes.sizes()[0] - 1);torch::Tensor bottoms = torch::empty(indexes.sizes()[0] - 1);torch::Tensor widths = torch::empty(indexes.sizes()[0] - 1);torch::Tensor heights = torch::empty(indexes.sizes()[0] - 1);for (size_t i = 0; i < indexes.sizes()[0] - 1; ++i){lefts[i] = std::max(dets[indexes[0]][0].item().toFloat(), dets[indexes[i + 1]][0].item().toFloat());tops[i] = std::max(dets[indexes[0]][1].item().toFloat(), dets[indexes[i + 1]][1].item().toFloat());rights[i] = std::min(dets[indexes[0]][2].item().toFloat(), dets[indexes[i + 1]][2].item().toFloat());bottoms[i] = std::min(dets[indexes[0]][3].item().toFloat(), dets[indexes[i + 1]][3].item().toFloat());widths[i] = std::max(float(0), rights[i].item().toFloat() - lefts[i].item().toFloat());heights[i] = std::max(float(0), bottoms[i].item().toFloat() - tops[i].item().toFloat());}torch::Tensor overlaps = widths * heights;// FIlter by IOUstorch::Tensor ious = overlaps / (areas.select(0, indexes[0].item().toInt()) + torch::index_select(areas, 0, indexes.slice(0, 1, indexes.sizes()[0])) - overlaps);indexes = torch::index_select(indexes, 0, torch::nonzero(ious <= iou_thresh).select(1, 0) + 1);}keep = keep.toType(torch::kInt64);output.push_back(torch::index_select(dets, 0, keep.slice(0, 0, count)));}return output;
}#include <torch/script.h>
#include <iostream>
#include <memory>
//int main(int argc, const char* argv[]) {
//    std::cout << "cuda::is_available():" << torch::cuda::is_available() << std::endl;
//    torch::DeviceType device_type = at::kCPU; // 定义设备类型
//    if (torch::cuda::is_available())
//        device_type = at::kCUDA;
//}int main(int argc, char* argv[])
{std::cout << "cuda::is_available():" << torch::cuda::is_available() << std::endl;torch::DeviceType device_type = at::kCPU; // 定义设备类型if (torch::cuda::is_available())device_type = at::kCUDA;// Loading  Moduletorch::jit::script::Module module = torch::jit::load("yolov5x.torchscript.pt");//best.torchscript3.pt//yolov5x.torchscript.ptmodule.to(device_type); // 模型加载至GPUstd::vector<std::string> classnames;std::ifstream f("coco.names");std::string name = "";while (std::getline(f, name)){classnames.push_back(name);}if (argc < 2){std::cout << "Please run with test video." << std::endl;return -1;}std::string video = argv[1];cv::VideoCapture cap = cv::VideoCapture(video);// cap.set(cv::CAP_PROP_FRAME_WIDTH, 1920);// cap.set(cv::CAP_PROP_FRAME_HEIGHT, 1080);cv::Mat frame, img;cap.read(frame);int width = frame.size().width;int height = frame.size().height;int count = 0;while (cap.isOpened()){count++;clock_t start = clock();cap.read(frame);if (frame.empty()){std::cout << "Read frame failed!" << std::endl;break;}// Preparing input tensorcv::resize(frame, img, cv::Size(640, 640));// cv::cvtColor(img, img, cv::COLOR_BGR2RGB);// torch::Tensor imgTensor = torch::from_blob(img.data, {img.rows, img.cols,3},torch::kByte);// imgTensor = imgTensor.permute({2,0,1});// imgTensor = imgTensor.toType(torch::kFloat);// imgTensor = imgTensor.div(255);// imgTensor = imgTensor.unsqueeze(0);// imgTensor = imgTensor.to(device_type);cv::cvtColor(img, img, cv::COLOR_BGR2RGB);  // BGR -> RGBimg.convertTo(img, CV_32FC3, 1.0f / 255.0f);  // normalization 1/255auto imgTensor = torch::from_blob(img.data, { 1, img.rows, img.cols, img.channels() }).to(device_type);imgTensor = imgTensor.permute({ 0, 3, 1, 2 }).contiguous();  // BHWC -> BCHW (Batch, Channel, Height, Width)std::vector<torch::jit::IValue> inputs;inputs.emplace_back(imgTensor);// preds: [?, 15120, 9]torch::jit::IValue output = module.forward(inputs);auto preds = output.toTuple()->elements()[0].toTensor();// torch::Tensor preds = module.forward({ imgTensor }).toTensor();std::vector<torch::Tensor> dets = non_max_suppression(preds, 0.35, 0.5);if (dets.size() > 0){// Visualize resultfor (size_t i = 0; i < dets[0].sizes()[0]; ++i){float left = dets[0][i][0].item().toFloat() * frame.cols / 640;float top = dets[0][i][1].item().toFloat() * frame.rows / 640;float right = dets[0][i][2].item().toFloat() * frame.cols / 640;float bottom = dets[0][i][3].item().toFloat() * frame.rows / 640;float score = dets[0][i][4].item().toFloat();int classID = dets[0][i][5].item().toInt();cv::rectangle(frame, cv::Rect(left, top, (right - left), (bottom - top)), cv::Scalar(0, 255, 0), 2);cv::putText(frame,classnames[classID] + ": " + cv::format("%.2f", score),cv::Point(left, top),cv::FONT_HERSHEY_SIMPLEX, (right - left) / 200, cv::Scalar(0, 255, 0), 2);}}// std::cout << "-[INFO] Frame:" <<  std::to_string(count) << " FPS: " + std::to_string(float(1e7 / (clock() - start))) << std::endl;std::cout << "-[INFO] Frame:" << std::to_string(count) << std::endl;// cv::putText(frame, "FPS: " + std::to_string(int(1e7 / (clock() - start))),//     cv::Point(50, 50),//     cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(0, 255, 0), 2);cv::imshow("", frame);// cv::imwrite("../images/"+cv::format("%06d", count)+".jpg", frame);cv::resize(frame, frame, cv::Size(width, height));if (cv::waitKey(1) == 27) break;}cap.release();return 0;
}

GPU模型 导出 export代码

注意修改导出模型尺度。

"""Exports a YOLOv5 *.pt model to ONNX and TorchScript formatsUsage:$ export PYTHONPATH="$PWD" && python models/export.py --weights ./weights/yolov5s.pt --img 640 --batch 1
"""import argparse
import sys
import timesys.path.append('./')  # to run '$ python *.py' files in subdirectoriesimport torch
import torch.nn as nnimport models
from models.experimental import attempt_load
from utils.activations import Hardswish, SiLU
from utils.general import set_logging, check_img_sizeif __name__ == '__main__':parser = argparse.ArgumentParser()parser.add_argument('--weights', type=str, default='E:\\yolov5-master\\runs\\train\\exp\weights\\best.pt', help='weights path')  # from yolov5/models/parser.add_argument('--img-size', nargs='+', type=int, default=[352, 640], help='image size')  # height, widthparser.add_argument('--batch-size', type=int, default=1, help='batch size')opt = parser.parse_args()opt.img_size *= 2 if len(opt.img_size) == 1 else 1  # expandprint(opt)set_logging()t = time.time()# Load PyTorch modelmodel = attempt_load(opt.weights, map_location=torch.device('cuda'))  # load FP32 modellabels = model.names# Checksgs = int(max(model.stride))  # grid size (max stride)opt.img_size = [check_img_size(x, gs) for x in opt.img_size]  # verify img_size are gs-multiples# Inputimg = torch.zeros(opt.batch_size, 3, *opt.img_size).to(device='cuda')# image size(1,3,320,192) iDetection# Update modelfor k, m in model.named_modules():m._non_persistent_buffers_set = set()  # pytorch 1.6.0 compatibilityif isinstance(m, models.common.Conv):  # assign export-friendly activationsif isinstance(m.act, nn.Hardswish):m.act = Hardswish()elif isinstance(m.act, nn.SiLU):m.act = SiLU()# elif isinstance(m, models.yolo.Detect):#     m.forward = m.forward_export  # assign forward (optional)#model.model[-1].export = True  # set Detect() layer export=Truemodel.model[-1].export = Falsey = model(img)  # dry run# TorchScript exporttry:print('\nStarting TorchScript export with torch %s...' % torch.__version__)f = opt.weights.replace('.pt', '.torchscript.pt')  # filenamets = torch.jit.trace(model, img)ts.save(f)print('TorchScript export success, saved as %s' % f)except Exception as e:print('TorchScript export failure: %s' % e)# ONNX exporttry:import onnxprint('\nStarting ONNX export with onnx %s...' % onnx.__version__)f = opt.weights.replace('.pt', '.onnx')  # filenametorch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=['images'],output_names=['classes', 'boxes'] if y is None else ['output'])# Checksonnx_model = onnx.load(f)  # load onnx modelonnx.checker.check_model(onnx_model)  # check onnx model# print(onnx.helper.printable_graph(onnx_model.graph))  # print a human readable modelprint('ONNX export success, saved as %s' % f)except Exception as e:print('ONNX export failure: %s' % e)# CoreML exporttry:import coremltools as ctprint('\nStarting CoreML export with coremltools %s...' % ct.__version__)# convert model from torchscript and apply pixel scaling as per detect.pymodel = ct.convert(ts, inputs=[ct.ImageType(name='image', shape=img.shape, scale=1 / 255.0, bias=[0, 0, 0])])f = opt.weights.replace('.pt', '.mlmodel')  # filenamemodel.save(f)print('CoreML export success, saved as %s' % f)except Exception as e:print('CoreML export failure: %s' % e)# Finishprint('\nExport complete (%.2fs). Visualize with https://github.com/lutzroeder/netron.' % (time.time() - t))

效果展示

测试结果:

视频测试:

2021-01-20 14-31-49

libtorch+yolov5

另一个视频链接: link

结束

自己训练的模型效果就不展示了,感谢大家观看,请多多关注,共同学习进步!

###问题 关于forward耗时很长的问题 其实早在我批处理多张图片的时候发现了 如下图:前两张很慢 后面正常。自己查过也在GitHub咨询过 原因可能是libtorch1.7.1版本存在warm up的问题。

GitHub:
Gitee:

libtorch+yolov5相关推荐

  1. libtorch+YOLOV5配置踩坑记

    电脑配置: CPU:Intel i7-10750H 内存:16G 显卡:GeForce GTX 1650 Ti(4GB显存) 操作系统:Windows 10 家庭中文版 CUDA:10.2 CUDNN ...

  2. windows下使用libtorch对yolov5模型重构(CPU和GPU双版本)

    首先是对项目的环境配置 win10 libtorch1.6 debug版本 使用release或者gpu版本的自己设置就可以 opencv4.1 libtorch下载网址 https://downlo ...

  3. windows下基于libtorch的yolov5 6.0的c++部署

    windows下基于libtorch的yolov5 6.0的c++部署 1.概述 libtorch是pytorch的C++版本,在需要多进程.提高推理速度等需求下会比python语言更具有优势.本文根 ...

  4. tensorrt部署YOLOv5模型记录【附代码,支持视频检测】

    训练出来的模型最终都需要进行工业部署,现今部署方案有很多,tensorflow和pytorch官方也都有发布,比如现在pytorch有自己的Libtorch进行部署[可以看我另一篇文章有讲利用Libt ...

  5. yolov5目标检测神经网络——损失函数计算原理

    前面已经写了4篇关于yolov5的文章,链接如下: 1.基于libtorch的yolov5目标检测网络实现--COCO数据集json标签文件解析 2.基于libtorch的yolov5目标检测网络实现 ...

  6. Libtorch的介绍与使用方法

    Libtorch的介绍与使用方法 1.libtorch是什么 2.libtorch如何下载 3.libtorch在windows下如何使用 4.libtorch推理YOLOv5的例子 5.libtor ...

  7. YOLO系列梳理(三)YOLOv5

    前言 YOLOv5 是在 YOLOv4 出来之后没多久就横空出世了.今天笔者介绍一下 YOLOv5 的相关知识.目前 YOLOv5 发布了新的版本,6.0版本.在这里,YOLOv5 也在5.0基础上集 ...

  8. 深度学习目标检测YOLOV5的主要过程梳理和工程部署的具体手段

    一.YOLOV5训练数据基本格式以及格式的相互转换(labimg标注数据格式) 1.1 VOC数据格式 基本图片数据集,每张图片对应的.xml标注文件,类别classes.txt文件. 其中.xml的 ...

  9. c++读取yolov5模型进行目标检测(读取摄像头实时监测)

    文章介绍 本文是篇基于yolov5模型的一个工程,主要是利用c++将yolov5模型进行调用并测试,从而实现目标检测任务 任务过程中主要重点有两个,第一 版本问题,第二配置问题 一,所需软件及版本 训 ...

最新文章

  1. Myisamchk小工具使用手册
  2. python 登陆开心网图片批量下载-selenium实现
  3. CStopWatch计时器的用法实例
  4. 【入门基础】写给小白看的入门级 Java 基本语法
  5. 1716: 棒棒糖(暴力破解+优化)
  6. Sql Server函数全解二数学函数
  7. 计算机补丁的概念,补丁是什么意思?网上说的打补丁什么意思
  8. 创新方法(TRIZ)理论及应用
  9. 教你如何修改运行中的docker容器的端口映射
  10. iOS开发: info访问权限配置
  11. MultiDock——专门为 macOS 设计的增强型 Dock
  12. linux与pe到移动硬盘,PE下找不到移动硬盘?不用怕
  13. python可以做机器人吗_零基础如何用Python写一个简单的WeChat机器人?(内附代码)...
  14. GBase 8c产品架构
  15. Wrong JPEG library version: library is 80, caller expects 62 解决办法
  16. Spark rdd之sortBy
  17. 操作系统2015(四川大学软件学院)
  18. Python可视化 | Seaborn02
  19. 软件系统架构设计的六大原则
  20. 元宇宙人才发展白皮书

热门文章

  1. 假期旅行哪种蓝牙耳机佩戴舒适?长久佩戴舒适的蓝牙耳机
  2. TASSEL的MLM模型构建的kinship矩阵相关知识
  3. python shell怎么调字体_linux BASH shell下设置字体及背景颜色
  4. img-polaroid_宝丽来堆栈到网格简介动画
  5. 使用原始套接字实现ping操作
  6. 计算机软件技术冒泡排序,北航计算机软件技术基础实验报告计软实验报告3——冒泡排序和快速排序...
  7. mac brew install Error: No available formula with the name “*“的解决办法
  8. 机械制图——常见的机件表达
  9. win10必备的快捷键
  10. c语言 strcpy作用,C语言strcpy的用法。