题目要求:学习了解单目深度估计模型MonoDepthv2,根据python源码集成到现有ONNX系列模型中。
MonoDepthv2 论文:Digging Into Self-Supervised Monocular Depth Estimation
MonoDepthv2 源码:Monodepth2 GitHub

分析:
1)了解MonoDepthv2的基本原理和代码理解
2)将模型转化为更加方便高效的ONNX模型并在opencv中完成推理过程(并验证)

  • 结果展示:
  • Pytorch转ONNX模型
  1. 合并Encoder和Decoder为一个模型
 import matplotlib as mplimport matplotlib.cm as cmimport torchimport torch.nn as nnimport torchvisionfrom torchvision import transforms, datasetsimport networksfrom layers import disp_to_depthfrom utils import download_model_if_doesnt_existfrom evaluate_depth import STEREO_SCALE_FACTORfrom collections import OrderedDictfrom layers import *import cv2class Encoder_Decoder(nn.Module):def __init__(self, encoder, decoder):super(Encoder_Decoder, self).__init__()self.encoder = encoderself.depth_decoder = decoderdef forward(self, x):features = self.encoder(x)outputs = self.depth_decoder(features)return outputs
  1. Pytorch权重转ONNX权重
 from __future__ import absolute_import, division, print_functionfrom ctypes import resizeimport osimport sysimport globimport argparseimport numpy as npimport PIL.Image as pilimport matplotlib as mplimport matplotlib.cm as cmimport torchimport torchvisionfrom torchvision import transforms, datasetsimport networksfrom layers import disp_to_depthfrom utils import download_model_if_doesnt_existfrom evaluate_depth import STEREO_SCALE_FACTORfrom combine_model import Encoder_Decoderimport onnximport onnxruntime as ortimport  cv2def parse_args():parser = argparse.ArgumentParser(description='Simple testing funtion for Monodepthv2 models.')parser.add_argument('--image_path', type=str, default='assets/test_image.jpg',help='path to a test image or folder of images')parser.add_argument('--model_name', type=str, default='mono_640x192',help='name of a pretrained model to use',choices=["mono_640x192","stereo_640x192","mono+stereo_640x192","mono_no_pt_640x192","stereo_no_pt_640x192","mono+stereo_no_pt_640x192","mono_1024x320","stereo_1024x320","mono+stereo_1024x320"])parser.add_argument('--ext', type=str,help='image extension to search for in folder', default="jpg")parser.add_argument("--no_cuda",help='if set, disables CUDA',action='store_true')parser.add_argument("--pred_metric_depth",help='if set, predicts metric depth instead of disparity. (This only ''makes sense for stereo-trained KITTI models).',action='store_true')return parser.parse_args()def test_simple(args):"""Function to predict for a single image or folder of images"""assert args.model_name is not None, \"You must specify the --model_name parameter; see README.md for an example"# if torch.cuda.is_available() and not args.no_cuda:#     device = torch.device("cuda")# else:#     device = torch.device("cpu")device = torch.device("cpu")if args.pred_metric_depth and "stereo" not in args.model_name:print("Warning: The --pred_metric_depth flag only makes sense for stereo-trained KITTI ""models. For mono-trained models, output depths will not in metric space.")download_model_if_doesnt_exist(args.model_name)model_path = os.path.join("models", args.model_name)print("-> Loading model from ", model_path)encoder_path = os.path.join(model_path, "encoder.pth")depth_decoder_path = os.path.join(model_path, "depth.pth")# LOADING PRETRAINED MODELprint("   Loading pretrained encoder")encoder = networks.ResnetEncoder(18, False)loaded_dict_enc = torch.load(encoder_path, map_location=device)# extract the height and width of image that this model was trained withfeed_height = loaded_dict_enc['height']feed_width = loaded_dict_enc['width']filtered_dict_enc = {k: v for k, v in loaded_dict_enc.items() if k in encoder.state_dict()}encoder.load_state_dict(filtered_dict_enc)encoder.to(device)encoder.eval()print("   Loading pretrained decoder")depth_decoder = networks.DepthDecoder(num_ch_enc=encoder.num_ch_enc, scales=range(4))loaded_dict = torch.load(depth_decoder_path, map_location=device)depth_decoder.load_state_dict(loaded_dict)depth_decoder.to(device)depth_decoder.eval()# FINDING INPUT IMAGESif os.path.isfile(args.image_path):# Only testing on a single imagepaths = [args.image_path]output_directory = os.path.dirname(args.image_path)elif os.path.isdir(args.image_path):# Searching folder for imagespaths = glob.glob(os.path.join(args.image_path, '*.{}'.format(args.ext)))output_directory = args.image_pathelse:raise Exception("Can not find args.image_path: {}".format(args.image_path))print("-> Predicting on {:d} test images".format(len(paths)))# PREDICTING ON EACH IMAGE IN TURNwith torch.no_grad():for idx, image_path in enumerate(paths):if image_path.endswith("_disp.jpg"):# don't try to predict disparity for a disparity image!continue# Load image and preprocessinput_image = pil.open(image_path).convert('RGB')original_width, original_height = input_image.sizeinput_image = input_image.resize((feed_width, feed_height), pil.LANCZOS)input_image = transforms.ToTensor()(input_image).unsqueeze(0)# PREDICTIONinput_image = input_image.to(device)# features = encoder(input_image)# outputs = depth_decoder(features)model = Encoder_Decoder(encoder=encoder, decoder=depth_decoder)model.eval()outputs = model(input_image)# disp = outputs[("disp", 0)]disp = outputsprint('disp: ', disp.shape)disp_ = disp.squeeze().cpu().numpy()cv2.imwrite('disp_ori.png',disp_*255)disp_resized = torch.nn.functional.interpolate(disp, (original_height, original_width), mode="bilinear", align_corners=False)# Saving colormapped depth imagedisp_resized_np = disp_resized.squeeze().cpu().numpy()vmax = np.percentile(disp_resized_np, 95)normalizer = mpl.colors.Normalize(vmin=disp_resized_np.min(), vmax=vmax)mapper = cm.ScalarMappable(norm=normalizer, cmap='magma')colormapped_im = (mapper.to_rgba(disp_resized_np)[:, :, :3] * 255).astype(np.uint8)im = pil.fromarray(colormapped_im)name_dest_im = os.path.join(output_directory, "{}_disp.jpeg".format(output_name))im.save(name_dest_im)print(" Processed {:d} of {:d} images - saved predictions to:".format(idx + 1, len(paths)))print("   - {}".format(name_dest_im))# print("   - {}".format(name_dest_npy))print('-> Done!')x = torch.rand(1,3,192,640)input_names = ['input']output_names = ['output']torch.onnx.export(model, x, 'mono.onnx',input_names=input_names, output_names=output_names,opset_version=11, verbose='True')def onnx_inference(): img = cv2.imread("assets/test_image.jpg")print(img.shape)h, w, _ = img.shape## opencv testblobImage = cv2.dnn.blobFromImage(img, 1.0 / 255.0, (640, 192), None, True, False)net = cv2.dnn.readNet('mono.onnx')outNames = net.getUnconnectedOutLayersNames()net.setInput(blobImage)outs = net.forward(outNames)print('cv outs: ', outs[0].shape)outs = np.squeeze(outs, axis=(0,1))outs = outs * 255.0outs =outs.transpose((1,2,0)).astype(np.uint8)disp_resized_np = cv2.resize(outs,(640,192))cv2.imwrite('disp_cv.png',disp_resized_np)## onnxruntime test model = onnx.load('mono.onnx')onnx.checker.check_model(model)session = ort.InferenceSession('mono.onnx')img = cv2.resize(img, (640, 192))img = np.asarray(img) / 255.0img = img[np.newaxis, :].astype(np.float32)input_image = img.transpose((0,3,1,2))outs = session.run(None, input_feed={'input':input_image})outs = np.squeeze(outs, axis=(0,1))outs = outs * 255.0outs =outs.transpose((1,2,0)).astype(np.uint8)disp_resized_np = cv2.resize(outs,(640,192))cv2.imwrite('disp.png',disp_resized_np)outs = cv2.applyColorMap(outs,colormap=cv2.COLORMAP_SUMMER)cv2.imwrite('disp_color.png', outs)if __name__ == '__main__':args = parse_args()test_simple(args)onnx_inference()
  • Opencv Cmodel
#include <opencv2/opencv.hpp>
#include <opencv2/dnn.hpp>
#include <iostream>
#include <fstream>using namespace cv;
using namespace dnn;
using namespace std;class baseDepth
{public:baseDepth(int h, int w, const string& model_path = "model/mono.onnx") {this->inHeight = h;this->inWidth = w;cout << "start" << endl;this->net = readNetFromONNX(model_path);cout << "end" << endl;};Mat depth(Mat& frame);Mat viewer(vector<Mat> imgs, double alpha=0.80);private:Net net;int inWidth;int inHeight;
};Mat baseDepth::depth(Mat& frame) {int ori_h = frame.size[0];int ori_w = frame.size[1];cout << "ori: " << ori_h << " , " << ori_w << endl;Mat blobImage = blobFromImage(frame, 1.0 / 255.0, Size(this->inWidth, this->inHeight), Scalar(0, 0, 0), true, false);this->net.setInput(blobImage);cout << "read model" << endl;vector<Mat> scores;this->net.forward(scores, this->net.getUnconnectedOutLayersNames());int channel = scores[0].size[1];int h = scores[0].size[2];int w = scores[0].size[3];cout << "c: " << channel << " , h: " << h << " , w: " << w << endl;Mat depthMap(scores[0].size[2], scores[0].size[3], CV_32F, scores[0].ptr<float>(0, 0));cout << depthMap.size() << endl;depthMap *= 255.0;depthMap.convertTo(depthMap, CV_8UC1);resize(depthMap, depthMap, Size(ori_w, ori_h));applyColorMap(depthMap, depthMap, COLORMAP_MAGMA);imwrite("inference/depth_color.png", depthMap);return depthMap;
}Mat baseDepth::viewer(vector<Mat> imgs, double alpha){Size imgOriSize = imgs[0].size();Size imgStdSize(imgOriSize.width * alpha, imgOriSize.height * alpha);Mat imgStd;int delta_h = 2, delta_w = 2;Mat imgWindow(imgStdSize.height+2*delta_h, imgStdSize.width*2+3*delta_w, imgs[0].type());resize(imgs[0], imgStd, imgStdSize, alpha, alpha, INTER_LINEAR);imgStd.copyTo(imgWindow(Rect(Point2i(delta_w, delta_h), imgStdSize)));resize(imgs[1], imgStd, imgStdSize, alpha, alpha, INTER_LINEAR);imgStd.copyTo(imgWindow(Rect(Point2i(imgStdSize.width+2*delta_w, delta_h), imgStdSize)));return imgWindow;
}// model test
// int main(int argc, char** argv) {//  Mat frame = imread("inference/car.jpg", 1);
//  if (frame.empty()) {//      printf("could not load image...\n");
//      return -1;
//  }
//  int h = 192, w = 640;
//  baseDepth net(h, w);
//  net.depth(frame);
//  return 0;
// }
  • 测试代码 (框架整体代码见之前博客 OpenCV 检测/分割 兼容框架)
...
if(config.model_name == "monodepth"){int h = 192, w = 640;baseDepth model(h, w);Mat depthMap = model.depth(srcimg);static const string kWinName = "Deep learning Mono depth estimation in OpenCV";namedWindow(kWinName, WINDOW_NORMAL);Mat res = model.viewer({srcimg, depthMap}, 0.90);imshow(kWinName, res);waitKey(0);destroyAllWindows();
}
  • 小结
    转换过程主要遇到的问题:
    1)MonoDepth模型较丰富,内容上涉及单目和双目估计,模型结构上又分为Encoder和Decoder两部分,转ONNX时需要合并成一个模型测试;
    2)MonoDepth的Decoder部分需要输入多个特征层,而ONNX forward通常只支持单个输入,因此合并模型只forward了第一个特征层(实际也只用到了第一个特征层);
    3)PIL、matplotlib、cv2对图像的排列顺序不尽相同,可能存在ONNX转换成功而结果很奇怪,此时需要多定位图像的读取和存储方式的差异;
    4)深度估计只看深度结果图很难了解细节,需要跟原图放一起对比才能能清楚地理解深度,在输出时尽量保持在一起展示,添加颜色渲染,以提高辨识度。

OpenCV----MonoDepthv2单目深度估计ONNX推理相关推荐

  1. 基于高分辨率的单目深度估计网络(AAAI2021)

    点击上方"3D视觉工坊",选择"星标" 干货第一时间送达 作者丨图灵智库 来源丨 泡泡机器人SLAM 标题: HR-Depth:High Resolution ...

  2. 低成本测距方案—单目深度估计

    点击上方"3D视觉工坊",选择"星标" 干货第一时间送达 文章导读 导读:随着计算机视觉技术的不断发展,特别是自动驾驶等一些前沿的研究中,图像的深度信息至关重要 ...

  3. 最新开源无监督单目深度估计方法,解决复杂室内场景难训练问题,效果性能远超SOTA...

    点击上方"3D视觉工坊",选择"星标" 干货第一时间送达 一.摘要 无监督单目深度估计算法已经被证明能够在驾驶场景(如KITTI数据集)中得到精确的结果,然而最 ...

  4. 深度学习之单目深度估计:无监督学习篇

    点击上方"3D视觉工坊",选择"星标" 干货第一时间送达 作者:桔子毛 https://zhuanlan.zhihu.com/p/29968267 本文仅做学术 ...

  5. 【研究报告】从单目深度估计到单目三维场景重建-沈春华老师-VALSE Webinar 22-13(总第279期)

    从单目深度估计到单目三维场景重建-沈春华老师-VALSE Webinar 22-13(总第279期) 报告总结 & 相关论文 论文代码 相关术语 前言 研究问题 单目深度估计 单目三维场景重建 ...

  6. 最强无监督单目深度估计Baseline--MonoViT--简介与代码复现

    1. 无监督单目深度估计 单目深度估计是指,借助于深度神经网络,从单张输入视图中推理场景的稠密深度信息:该技术可以广泛用于自动驾驶.虚拟现实.增强现实等依赖于三维场景感知理解的领域,同时也可以为其他视 ...

  7. 2022最新 | 室外单目深度估计研究综述

    点击上方"3D视觉工坊",选择"星标" 干货第一时间送达 作者丨汽车人 来源丨 自动驾驶之心 论文标题:Outdoor Monocular Depth Esti ...

  8. 【2022集创赛】飞腾杯二等奖作品:基于单目深度估计网络的全息显示终端

    本篇文章是2022年第六届全国大学生集成电路创新创业大赛飞腾杯二等奖作品分享,参加极术社区的**[有奖征集]分享你的2022集创赛作品,秀出作品风采**活动. 1.团队介绍 参赛单位:西安电子科技大学 ...

  9. 基于分段平面性的单目深度估计 P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior

    P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior 面向可解释深度网络的单目深度估计 0 Abstract   单 ...

最新文章

  1. Roundgod and Milk Tea 贪心
  2. 数据结构与算法常用名词术语整理
  3. 腾讯Angel成世界顶级AI项目!中国首个从LF AI基金会的毕业项目
  4. sql数据库实例(c#查询登录界面)
  5. 【神经网络八股扩展】:自制数据集
  6. clion eap 预览版 免费版
  7. CKEditor 5 在线编辑 PDF
  8. java.util.concurrent.ExecutionException: org.apache.catalina.LifecycleException
  9. 钉钉小程序添加vant组件库
  10. WPS Excel将多个Excel文件合并到一个Excel文件中(sheet)
  11. 数据可视化大屏案例系列 3
  12. dn什么意思_钢管中的DN表示什么意思?
  13. 一度智信:这些方法,帮你解决网店权重下降问题
  14. Cannot forward to error page for request ......
  15. Studio 3T 试用期破解(含破解补丁) - 解决办法
  16. Ubuntu 21.04 如何进入命令行的登录界面
  17. wps画流程图交叉弧形_WPS3分钟画出高逼格的流程图
  18. fiery服务器系统安装,fiery服务器打印设置
  19. Unity【DoTween】- 如何使Transform Tween动画序列可编辑
  20. php中post是什么意思,php中$_post是什么意思

热门文章

  1. 友善之臂NanoPC-T4 RK3399 配置 安装TensorFlow2 Pytorch
  2. Django框架(八)--单表增删改查,在Python脚本中调用Django环境
  3. firefox无法使用yslow的解决方案
  4. 一首洗脱人们灵魂最深处的完美音符
  5. 全国邮编前缀归属省及其备注整理,血的代价整理输出,供大家参阅
  6. 投了10亿元发展“影游IP”的阿里游戏
  7. history 清空历史记录 或 history不记录历史命令
  8. Unity-Matcap材质捕捉和薄膜干涉效果
  9. FM信号的调制与解调
  10. 卸载ncurses_为什么在重新加载动态库时ncurses会失败?