libtorch+yolov5
windows+libtorch+vs2019+yolov5项目部署实践总结
- 前言
- 环境配置
- 环境搭建参考:
- 给出我的libtorch配置
- GPU模型 导出 export代码
- 效果展示
- 结束
前言
这是本人第一篇博客,只是对近期学习工作的一些总结。主要是利用libtorch对pytorch训练的模型进行部署,之前也是成功使用pyinstaller将整个python项目进行打包成exe,但是不满足对方的需求才使用的libtorch。
环境配置
vs2019+opencv4.5+libtorch1.7.1:
1.vs2019下载:链接: link
2. opencv官网:链接: link
3. Libtorch下载:链接: link推荐下载release版本(版本需要与训练模型pytorch版本符合,cuda版本需要相符)
环境搭建参考:
https://blog.csdn.net/weixin_44936889/article/details/111186818
https://blog.csdn.net/zzz_zzz12138/article/details/109138805
https://blog.csdn.net/wenghd22/article/details/112512231
vs2019和opencv配置过程参考链接;
https://blog.csdn.net/sophies671207/article/details/89854368
给出我的libtorch配置
给出我的libtorch配置过程:新建空项目 新建main.cpp文件
1、新建项目->属性->VC++目录->包含目录
2新建项目->属性->VC++目录->库目录
3新建项目->属性->C/C++目录->常规->附加包含目录
4新建项目->属性->C/C++目录->常规->SDL检查 :否
5新建项目->属性->连接器->输入->附加依赖项:写入以下
E:\opencv\build\x64\vc15\lib\opencv_world450.lib
c10.lib
asmjit.lib
c10_cuda.lib
caffe2_detectron_ops_gpu.lib
caffe2_module_test_dynamic.lib
caffe2_nvrtc.lib
clog.lib
cpuinfo.lib
dnnl.lib
fbgemm.lib
libprotobuf.lib
libprotobuf-lite.lib
libprotoc.lib
mkldnn.lib
torch.lib
torch_cuda.lib
torch_cpu.lib
kernel32.lib
user32.lib
gdi32.lib
winspool.lib
comdlg32.lib
advapi32.lib
shell32.lib
ole32.lib
oleaut32.lib
uuid.lib
odbc32.lib
odbccp32.lib
6新建项目->属性->连接器->命令行:输入/INCLUDE:?warp_size@cuda@at@@YAHXZ
7新建项目->属性->C/C++目录->语言->符合模式 :否
配置好了以上环境,打包好的文件夹如下图:
权重文件:官方权重导出固定尺度模型即可 直接运行main.cpp即可。采用的samples文件夹下的图片进行测试。
main.cpp代码 十六行使用GPU时需注意
#include <opencv2/opencv.hpp>
#include <torch/script.h>
#include <torch/torch.h>
#include <algorithm>
#include <iostream>
#include <time.h>std::vector<torch::Tensor> non_max_suppression(torch::Tensor preds, float score_thresh = 0.01, float iou_thresh = 0.35)
{std::vector<torch::Tensor> output;for (size_t i = 0; i < preds.sizes()[0]; ++i){torch::Tensor pred = preds.select(0, i);//GPU推理结果为cuda数据类型,nms之前要转成cpu,否则会报错pred = pred.to(at::kCPU); //增加到函数里pred = pred.to(at::kCPU); 注意preds的数据类型,转成cpu进行后处理。// Filter by scorestorch::Tensor scores = pred.select(1, 4) * std::get<0>(torch::max(pred.slice(1, 5, pred.sizes()[1]), 1));pred = torch::index_select(pred, 0, torch::nonzero(scores > score_thresh).select(1, 0));if (pred.sizes()[0] == 0) continue;// (center_x, center_y, w, h) to (left, top, right, bottom)pred.select(1, 0) = pred.select(1, 0) - pred.select(1, 2) / 2;pred.select(1, 1) = pred.select(1, 1) - pred.select(1, 3) / 2;pred.select(1, 2) = pred.select(1, 0) + pred.select(1, 2);pred.select(1, 3) = pred.select(1, 1) + pred.select(1, 3);// Computing scores and classesstd::tuple<torch::Tensor, torch::Tensor> max_tuple = torch::max(pred.slice(1, 5, pred.sizes()[1]), 1);pred.select(1, 4) = pred.select(1, 4) * std::get<0>(max_tuple);pred.select(1, 5) = std::get<1>(max_tuple);torch::Tensor dets = pred.slice(1, 0, 6);torch::Tensor keep = torch::empty({ dets.sizes()[0] });torch::Tensor areas = (dets.select(1, 3) - dets.select(1, 1)) * (dets.select(1, 2) - dets.select(1, 0));std::tuple<torch::Tensor, torch::Tensor> indexes_tuple = torch::sort(dets.select(1, 4), 0, 1);torch::Tensor v = std::get<0>(indexes_tuple);torch::Tensor indexes = std::get<1>(indexes_tuple);int count = 0;while (indexes.sizes()[0] > 0){keep[count] = (indexes[0].item().toInt());count += 1;// Computing overlapstorch::Tensor lefts = torch::empty(indexes.sizes()[0] - 1);torch::Tensor tops = torch::empty(indexes.sizes()[0] - 1);torch::Tensor rights = torch::empty(indexes.sizes()[0] - 1);torch::Tensor bottoms = torch::empty(indexes.sizes()[0] - 1);torch::Tensor widths = torch::empty(indexes.sizes()[0] - 1);torch::Tensor heights = torch::empty(indexes.sizes()[0] - 1);for (size_t i = 0; i < indexes.sizes()[0] - 1; ++i){lefts[i] = std::max(dets[indexes[0]][0].item().toFloat(), dets[indexes[i + 1]][0].item().toFloat());tops[i] = std::max(dets[indexes[0]][1].item().toFloat(), dets[indexes[i + 1]][1].item().toFloat());rights[i] = std::min(dets[indexes[0]][2].item().toFloat(), dets[indexes[i + 1]][2].item().toFloat());bottoms[i] = std::min(dets[indexes[0]][3].item().toFloat(), dets[indexes[i + 1]][3].item().toFloat());widths[i] = std::max(float(0), rights[i].item().toFloat() - lefts[i].item().toFloat());heights[i] = std::max(float(0), bottoms[i].item().toFloat() - tops[i].item().toFloat());}torch::Tensor overlaps = widths * heights;// FIlter by IOUstorch::Tensor ious = overlaps / (areas.select(0, indexes[0].item().toInt()) + torch::index_select(areas, 0, indexes.slice(0, 1, indexes.sizes()[0])) - overlaps);indexes = torch::index_select(indexes, 0, torch::nonzero(ious <= iou_thresh).select(1, 0) + 1);}keep = keep.toType(torch::kInt64);output.push_back(torch::index_select(dets, 0, keep.slice(0, 0, count)));}return output;
}#include <torch/script.h>
#include <iostream>
#include <memory>
//int main(int argc, const char* argv[]) {
// std::cout << "cuda::is_available():" << torch::cuda::is_available() << std::endl;
// torch::DeviceType device_type = at::kCPU; // 定义设备类型
// if (torch::cuda::is_available())
// device_type = at::kCUDA;
//}int main(int argc, char* argv[])
{std::cout << "cuda::is_available():" << torch::cuda::is_available() << std::endl;torch::DeviceType device_type = at::kCPU; // 定义设备类型if (torch::cuda::is_available())device_type = at::kCUDA;// Loading Moduletorch::jit::script::Module module = torch::jit::load("yolov5x.torchscript.pt");//best.torchscript3.pt//yolov5x.torchscript.ptmodule.to(device_type); // 模型加载至GPUstd::vector<std::string> classnames;std::ifstream f("coco.names");std::string name = "";while (std::getline(f, name)){classnames.push_back(name);}if (argc < 2){std::cout << "Please run with test video." << std::endl;return -1;}std::string video = argv[1];cv::VideoCapture cap = cv::VideoCapture(video);// cap.set(cv::CAP_PROP_FRAME_WIDTH, 1920);// cap.set(cv::CAP_PROP_FRAME_HEIGHT, 1080);cv::Mat frame, img;cap.read(frame);int width = frame.size().width;int height = frame.size().height;int count = 0;while (cap.isOpened()){count++;clock_t start = clock();cap.read(frame);if (frame.empty()){std::cout << "Read frame failed!" << std::endl;break;}// Preparing input tensorcv::resize(frame, img, cv::Size(640, 640));// cv::cvtColor(img, img, cv::COLOR_BGR2RGB);// torch::Tensor imgTensor = torch::from_blob(img.data, {img.rows, img.cols,3},torch::kByte);// imgTensor = imgTensor.permute({2,0,1});// imgTensor = imgTensor.toType(torch::kFloat);// imgTensor = imgTensor.div(255);// imgTensor = imgTensor.unsqueeze(0);// imgTensor = imgTensor.to(device_type);cv::cvtColor(img, img, cv::COLOR_BGR2RGB); // BGR -> RGBimg.convertTo(img, CV_32FC3, 1.0f / 255.0f); // normalization 1/255auto imgTensor = torch::from_blob(img.data, { 1, img.rows, img.cols, img.channels() }).to(device_type);imgTensor = imgTensor.permute({ 0, 3, 1, 2 }).contiguous(); // BHWC -> BCHW (Batch, Channel, Height, Width)std::vector<torch::jit::IValue> inputs;inputs.emplace_back(imgTensor);// preds: [?, 15120, 9]torch::jit::IValue output = module.forward(inputs);auto preds = output.toTuple()->elements()[0].toTensor();// torch::Tensor preds = module.forward({ imgTensor }).toTensor();std::vector<torch::Tensor> dets = non_max_suppression(preds, 0.35, 0.5);if (dets.size() > 0){// Visualize resultfor (size_t i = 0; i < dets[0].sizes()[0]; ++i){float left = dets[0][i][0].item().toFloat() * frame.cols / 640;float top = dets[0][i][1].item().toFloat() * frame.rows / 640;float right = dets[0][i][2].item().toFloat() * frame.cols / 640;float bottom = dets[0][i][3].item().toFloat() * frame.rows / 640;float score = dets[0][i][4].item().toFloat();int classID = dets[0][i][5].item().toInt();cv::rectangle(frame, cv::Rect(left, top, (right - left), (bottom - top)), cv::Scalar(0, 255, 0), 2);cv::putText(frame,classnames[classID] + ": " + cv::format("%.2f", score),cv::Point(left, top),cv::FONT_HERSHEY_SIMPLEX, (right - left) / 200, cv::Scalar(0, 255, 0), 2);}}// std::cout << "-[INFO] Frame:" << std::to_string(count) << " FPS: " + std::to_string(float(1e7 / (clock() - start))) << std::endl;std::cout << "-[INFO] Frame:" << std::to_string(count) << std::endl;// cv::putText(frame, "FPS: " + std::to_string(int(1e7 / (clock() - start))),// cv::Point(50, 50),// cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(0, 255, 0), 2);cv::imshow("", frame);// cv::imwrite("../images/"+cv::format("%06d", count)+".jpg", frame);cv::resize(frame, frame, cv::Size(width, height));if (cv::waitKey(1) == 27) break;}cap.release();return 0;
}
GPU模型 导出 export代码
注意修改导出模型尺度。
"""Exports a YOLOv5 *.pt model to ONNX and TorchScript formatsUsage:$ export PYTHONPATH="$PWD" && python models/export.py --weights ./weights/yolov5s.pt --img 640 --batch 1
"""import argparse
import sys
import timesys.path.append('./') # to run '$ python *.py' files in subdirectoriesimport torch
import torch.nn as nnimport models
from models.experimental import attempt_load
from utils.activations import Hardswish, SiLU
from utils.general import set_logging, check_img_sizeif __name__ == '__main__':parser = argparse.ArgumentParser()parser.add_argument('--weights', type=str, default='E:\\yolov5-master\\runs\\train\\exp\weights\\best.pt', help='weights path') # from yolov5/models/parser.add_argument('--img-size', nargs='+', type=int, default=[352, 640], help='image size') # height, widthparser.add_argument('--batch-size', type=int, default=1, help='batch size')opt = parser.parse_args()opt.img_size *= 2 if len(opt.img_size) == 1 else 1 # expandprint(opt)set_logging()t = time.time()# Load PyTorch modelmodel = attempt_load(opt.weights, map_location=torch.device('cuda')) # load FP32 modellabels = model.names# Checksgs = int(max(model.stride)) # grid size (max stride)opt.img_size = [check_img_size(x, gs) for x in opt.img_size] # verify img_size are gs-multiples# Inputimg = torch.zeros(opt.batch_size, 3, *opt.img_size).to(device='cuda')# image size(1,3,320,192) iDetection# Update modelfor k, m in model.named_modules():m._non_persistent_buffers_set = set() # pytorch 1.6.0 compatibilityif isinstance(m, models.common.Conv): # assign export-friendly activationsif isinstance(m.act, nn.Hardswish):m.act = Hardswish()elif isinstance(m.act, nn.SiLU):m.act = SiLU()# elif isinstance(m, models.yolo.Detect):# m.forward = m.forward_export # assign forward (optional)#model.model[-1].export = True # set Detect() layer export=Truemodel.model[-1].export = Falsey = model(img) # dry run# TorchScript exporttry:print('\nStarting TorchScript export with torch %s...' % torch.__version__)f = opt.weights.replace('.pt', '.torchscript.pt') # filenamets = torch.jit.trace(model, img)ts.save(f)print('TorchScript export success, saved as %s' % f)except Exception as e:print('TorchScript export failure: %s' % e)# ONNX exporttry:import onnxprint('\nStarting ONNX export with onnx %s...' % onnx.__version__)f = opt.weights.replace('.pt', '.onnx') # filenametorch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=['images'],output_names=['classes', 'boxes'] if y is None else ['output'])# Checksonnx_model = onnx.load(f) # load onnx modelonnx.checker.check_model(onnx_model) # check onnx model# print(onnx.helper.printable_graph(onnx_model.graph)) # print a human readable modelprint('ONNX export success, saved as %s' % f)except Exception as e:print('ONNX export failure: %s' % e)# CoreML exporttry:import coremltools as ctprint('\nStarting CoreML export with coremltools %s...' % ct.__version__)# convert model from torchscript and apply pixel scaling as per detect.pymodel = ct.convert(ts, inputs=[ct.ImageType(name='image', shape=img.shape, scale=1 / 255.0, bias=[0, 0, 0])])f = opt.weights.replace('.pt', '.mlmodel') # filenamemodel.save(f)print('CoreML export success, saved as %s' % f)except Exception as e:print('CoreML export failure: %s' % e)# Finishprint('\nExport complete (%.2fs). Visualize with https://github.com/lutzroeder/netron.' % (time.time() - t))
效果展示
测试结果:
视频测试:
2021-01-20 14-31-49
libtorch+yolov5
另一个视频链接: link
结束
自己训练的模型效果就不展示了,感谢大家观看,请多多关注,共同学习进步!
###问题 关于forward耗时很长的问题 其实早在我批处理多张图片的时候发现了 如下图:前两张很慢 后面正常。自己查过也在GitHub咨询过 原因可能是libtorch1.7.1版本存在warm up的问题。
GitHub:
Gitee:
libtorch+yolov5相关推荐
- libtorch+YOLOV5配置踩坑记
电脑配置: CPU:Intel i7-10750H 内存:16G 显卡:GeForce GTX 1650 Ti(4GB显存) 操作系统:Windows 10 家庭中文版 CUDA:10.2 CUDNN ...
- windows下使用libtorch对yolov5模型重构(CPU和GPU双版本)
首先是对项目的环境配置 win10 libtorch1.6 debug版本 使用release或者gpu版本的自己设置就可以 opencv4.1 libtorch下载网址 https://downlo ...
- windows下基于libtorch的yolov5 6.0的c++部署
windows下基于libtorch的yolov5 6.0的c++部署 1.概述 libtorch是pytorch的C++版本,在需要多进程.提高推理速度等需求下会比python语言更具有优势.本文根 ...
- tensorrt部署YOLOv5模型记录【附代码,支持视频检测】
训练出来的模型最终都需要进行工业部署,现今部署方案有很多,tensorflow和pytorch官方也都有发布,比如现在pytorch有自己的Libtorch进行部署[可以看我另一篇文章有讲利用Libt ...
- yolov5目标检测神经网络——损失函数计算原理
前面已经写了4篇关于yolov5的文章,链接如下: 1.基于libtorch的yolov5目标检测网络实现--COCO数据集json标签文件解析 2.基于libtorch的yolov5目标检测网络实现 ...
- Libtorch的介绍与使用方法
Libtorch的介绍与使用方法 1.libtorch是什么 2.libtorch如何下载 3.libtorch在windows下如何使用 4.libtorch推理YOLOv5的例子 5.libtor ...
- YOLO系列梳理(三)YOLOv5
前言 YOLOv5 是在 YOLOv4 出来之后没多久就横空出世了.今天笔者介绍一下 YOLOv5 的相关知识.目前 YOLOv5 发布了新的版本,6.0版本.在这里,YOLOv5 也在5.0基础上集 ...
- 深度学习目标检测YOLOV5的主要过程梳理和工程部署的具体手段
一.YOLOV5训练数据基本格式以及格式的相互转换(labimg标注数据格式) 1.1 VOC数据格式 基本图片数据集,每张图片对应的.xml标注文件,类别classes.txt文件. 其中.xml的 ...
- c++读取yolov5模型进行目标检测(读取摄像头实时监测)
文章介绍 本文是篇基于yolov5模型的一个工程,主要是利用c++将yolov5模型进行调用并测试,从而实现目标检测任务 任务过程中主要重点有两个,第一 版本问题,第二配置问题 一,所需软件及版本 训 ...
最新文章
- Myisamchk小工具使用手册
- python 登陆开心网图片批量下载-selenium实现
- CStopWatch计时器的用法实例
- 【入门基础】写给小白看的入门级 Java 基本语法
- 1716: 棒棒糖(暴力破解+优化)
- Sql Server函数全解二数学函数
- 计算机补丁的概念,补丁是什么意思?网上说的打补丁什么意思
- 创新方法(TRIZ)理论及应用
- 教你如何修改运行中的docker容器的端口映射
- iOS开发: info访问权限配置
- MultiDock——专门为 macOS 设计的增强型 Dock
- linux与pe到移动硬盘,PE下找不到移动硬盘?不用怕
- python可以做机器人吗_零基础如何用Python写一个简单的WeChat机器人?(内附代码)...
- GBase 8c产品架构
- Wrong JPEG library version: library is 80, caller expects 62 解决办法
- Spark rdd之sortBy
- 操作系统2015(四川大学软件学院)
- Python可视化 | Seaborn02
- 软件系统架构设计的六大原则
- 元宇宙人才发展白皮书
热门文章
- 假期旅行哪种蓝牙耳机佩戴舒适?长久佩戴舒适的蓝牙耳机
- TASSEL的MLM模型构建的kinship矩阵相关知识
- python shell怎么调字体_linux BASH shell下设置字体及背景颜色
- img-polaroid_宝丽来堆栈到网格简介动画
- 使用原始套接字实现ping操作
- 计算机软件技术冒泡排序,北航计算机软件技术基础实验报告计软实验报告3——冒泡排序和快速排序...
- mac brew install Error: No available formula with the name “*“的解决办法
- 机械制图——常见的机件表达
- win10必备的快捷键
- c语言 strcpy作用,C语言strcpy的用法。