基本思想:使用Jetson Xavier NX开发板进行人脸识别和认证比对,首先进行Jetson Xavier Nx的环境搭建,然后在进行人脸检测模型选择,最后进行移植Jetson Xavier NX开发板

一、配置Jetson Xavier NX开发板环境,因为系统镜像已将安装,这里就不在详细叙述了;

GitHub - linghu8812/tensorrt_inference 版本用https://github.com/linghu8812/tensorrt_inference/tree/demo/0.1.0

安装的mxnet==1.5.0 onnx==1.5.0 python3.6 建议anconda环境转模型

因此在系统镜像的基础上,进行必要的软件安装 百度云盘提供了两whl文件



# install the dependencies (if not already onboard)
nvidia@nvidia-desktop:~$ sudo apt-get install python3-pip libjpeg-dev libopenblas-dev libopenmpi-dev libomp-dev
nvidia@nvidia-desktop:~$ sudo -H pip3 install future
nvidia@nvidia-desktop:~$ sudo pip3 install -U --user wheel mock pillow
nvidia@nvidia-desktop:~$ sudo -H pip3 install testresources
# upgrade setuptools 47.1.1 -> 58.3.0
nvidia@nvidia-desktop:~$ sudo -H pip3 install --upgrade setuptools
nvidia@nvidia-desktop:~$ sudo -H pip3 install Cython
# install gdown to download from Google drive
nvidia@nvidia-desktop:~$ sudo -H pip3 install gdown
# download the wheel
nvidia@nvidia-desktop:~$ gdown https://drive.google.com/uc?id=1TqC6_2cwqiYacjoLhLgrZoap6-sVL2sd
# install PyTorch 1.10.0
nvidia@nvidia-desktop:~$ sudo -H pip3 install torch-1.10.0a0+git36449ea-cp36-cp36m-linux_aarch64.whl
# clean up
nvidia@nvidia-desktop:~$ rm torch-1.10.0a0+git36449ea-cp36-cp36m-linux_aarch64.whl

(2)、安装Torchvison 0.11.0

Used with PyTorch 1.10.0
# the dependencies
nvidia@nvidia-desktop:~$ sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev
nvidia@nvidia-desktop:~$ sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev
nvidia@nvidia-desktop:~$ sudo pip3 install -U pillow
# install gdown to download from Google drive, if not done yet
nvidia@nvidia-desktop:~$ sudo -H pip3 install gdown
# download TorchVision 0.11.0
nvidia@nvidia-desktop:~$ gdown https://drive.google.com/uc?id=1C7y6VSIBkmL2RQnVy8xF9cAnrrpJiJ-K
# install TorchVision 0.11.0
nvidia@nvidia-desktop:~$ sudo -H pip3 install torchvision-0.11.0a0+fa347eb-cp36-cp36m-linux_aarch64.whl
nvidia@nvidia-desktop:~$ pip3 install pycuda
# clean up
nvidia@nvidia-desktop:~$ rm torchvision-0.11.0a0+fa347eb-cp36-cp36m-linux_aarch64.whl


nvidia@nvidia-desktop:~$ sudo apt-get install git cmake
nvidia@nvidia-desktop:~$ sudo apt-get install python3-dev
nvidia@nvidia-desktop:~$ sudo apt-get install libhdf5-serial-dev hdf5-tools
nvidia@nvidia-desktop:~$ sudo apt-get install libatlas-base-dev gfortran
nvidia@nvidia-desktop:~$ sudo -H pip3 install -U jetson-stats
nvidia@nvidia-desktop:~$ sudo jtop

显示的Jetson Xavier NX开发板的使用率


nvidia@nvidia-desktop:~$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
>>> import torchvision
>>> torchvision.__version__
>>> import tensorrt
>>> tensorrt.__version__
>>> import pycuda


(3)配置一下Jetson Xavier NX开发板的vnc连接

sudo vim /usr/share/glib-2.0/schemas/org.gnome.Vino.gschema.xml

添加内容(不要和其它key重叠) 添加在



<key name='enabled' type='b'><summary>Enable remote access to the desktop</summary><description>If true, allows remote access to the desktop via the RFBprotocol. Users on remote machines may then connect to thedesktop using a VNC viewer.</description><default>true</default>


nvidia@nvidia-desktop:~$ sudo glib-compile-schemas /usr/share/glib-2.0/schemas



打开之后,点击ADD, 然后新建一个程序。 Name 设置为Vino, Command设置为/usr/lib/vino/vino-server . Comment设置为VNC Server。设置好之后保存即可。

终端输入以下指令来关闭VNC连接加密 使用时候设置全部设置true

gsettings set org.gnome.Vino require-encryption false
gsettings set org.gnome.Vino prompt-enabled false

测试结果 vnc viewer连接Jetson Xavier NX开发板



二、先在PC的笔记本上测试一下需要使用的人脸检测算法,然后在移植jetson nano nx ;


(1)下载源码 首先先对mtcnn测试,这个之前其实用过 15、JetBot进行目标跟踪及人脸匹配跟踪_sxj731533730-CSDN博客

git clone https://github.com/jkjung-avt/tensorrt_demos.git











ubuntu@ubuntu:~/tensorrt_demos/mtcnn$ make
../common/NvInfer.h:3250:22: note: declared here3250 | class TRT_DEPRECATED IRNNv2Layer : public ILayer|                      ^~~~~~~~~~~
../common/NvInfer.h:5662:85: warning: ‘IPluginLayer’ is deprecated [-Wdeprecated-declarations]5662 |  inputs, int32_t nbInputs, IPluginExt& plugin) TRTNOEXCEPT = 0;|                                                              ^../common/NvInfer.h:3454:22: note: declared here3454 | class TRT_DEPRECATED IPluginLayer : public ILayer|                      ^~~~~~~~~~~~
In file included from create_engines.cpp:30:
../common/NvCaffeParser.h:108:62: warning: ‘DimsNCHW’ is deprecated [-Wdeprecated-declarations]108 |     virtual nvinfer1::DimsNCHW getDimensions() TRTNOEXCEPT = 0;|                                                              ^
In file included from create_engines.cpp:29:
../common/NvInfer.h:346:22: note: declared here346 | class TRT_DEPRECATED DimsNCHW : public Dims4|                      ^~~~~~~~
Linking: create_engines
ubuntu@ubuntu:~/tensorrt_demos/mtcnn$ ./create_engines
Building det1.engine (PNet), maxBatchSize = 1
Building TensorRT engine in FP32 mode...
WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
Building det2.engine (RNet), maxBatchSize = 256
Building TensorRT engine in FP32 mode...
WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
Building det3.engine (ONet), maxBatchSize = 64
Building TensorRT engine in FP32 mode...
WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1Verifying engines...
WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
Bindings for det1 after deserializing:Input  0: data, 3x710x384Output 1: conv4-2, 4x350x187Output 2: prob1, 2x350x187
WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
Bindings for det2 after deserializing:Input  0: data, 3x24x24Output 1: conv5-2, 4x1x1Output 2: prob1, 2x1x1
WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
Bindings for det3 after deserializing:Input  0: data, 3x48x48Output 1: conv6-2, 4x1x1Output 2: conv6-3, 10x1x1Output 3: prob1, 2x1x1



六、修改 /home/ubuntu/tensorrt_demos/setup.py

'/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-',  # for my x86_64 PC
'/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-',  # for my x86_64 PC


buntu@ubuntu:~/tensorrt_demos$ make
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/pytrt.o -L/usr/local/cuda/lib64 -L/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT- -L/usr/local/lib -lnvinfer -lcudnn -lcublas -lcudart_static -lnvToolsExt -lcudart -lrt -o /home/ubuntu/tensorrt_demos/pytrt.cpython-38-x86_64-linux-gnu.so
rm -rf build

测试一下mtcnn  【需要修改一下识别尺寸 480 640 与centerface做对比】

ubuntu@ubuntu:~/tensorrt_demos$ python3 trt_mtcnn.py --image ../CenterFace/prj-python/000388.jpg

测试速度 20fps 测试效果图


git clone https://github.com/Star-Clouds/CenterFace
pip3 install pycuda

然后进入超级权限状态下 针对centerface tensorrt

sudo su
export PATH=/usr/local/cuda-11.1/bin:/usr/local/cuda/bin:$PATH   
pip install pycuda

遇到问题一 应该是模型的版本和我系统的Tensorrt版本不一致 无法解析模型

root@ubuntu:/home/ubuntu/CenterFace/prj-tensorrt# python3 demo.py
[TensorRT] ERROR: coreReadArchive.cpp (41) - Serialization Error in verifyHeader: 0 (Version tag does not match. Note: Current Version: 96, Serialized Engine Version: 87)
[TensorRT] ERROR: INVALID_STATE: std::exception
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
Traceback (most recent call last):File "demo.py", line 30, in <module>test_image_tensorrt()File "demo.py", line 13, in test_image_tensorrtdets, lms = centerface(frame, h, w, threshold=0.35)File "/home/ubuntu/CenterFace/prj-tensorrt/centerface.py", line 21, in __call__return self.inference_tensorrt(img, threshold)File "/home/ubuntu/CenterFace/prj-tensorrt/centerface.py", line 73, in inference_tensorrtcontext = engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'
root@ubuntu:/home/ubuntu/CenterFace/prj-tensorrt# python3 demo.py
[TensorRT] ERROR: coreReadArchive.cpp (41) - Serialization Error in verifyHeader: 0 (Version tag does not match. Note: Current Version: 96, Serialized Engine Version: 87)
[TensorRT] ERROR: INVALID_STATE: std::exception
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
Traceback (most recent call last):File "demo.py", line 30, in <module>test_image_tensorrt()File "demo.py", line 13, in test_image_tensorrtdets, lms = centerface(frame, h, w, threshold=0.35)File "/home/ubuntu/CenterFace/prj-tensorrt/centerface.py", line 21, in __call__return self.inference_tensorrt(img, threshold)File "/home/ubuntu/CenterFace/prj-tensorrt/centerface.py", line 73, in inference_tensorrtcontext = engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'



使用Clion生成一下模型(centerface.onnx 来自官方代码)

(1)使用onnx导出模型 设置为640 480大小输入

import onnx
import math
import argparseparser = argparse.ArgumentParser(description='Export CenterFace ONNX')
parser.add_argument('--pretrained', help='pretrained centerface model', default='./centerface.onnx', type=str)
parser.add_argument('--input_shape', nargs='+', default=[1, 3, 480, 640], type=int, help='input shape.')
parser.add_argument('--onnx', help='onnx model', default='./centerface_480_640.onnx', type=str)
args = parser.parse_args()model = onnx.load_model(args.pretrained)
input_shape = args.input_shaped = model.graph.input[0].type.tensor_type.shape.dim
rate = (input_shape[2] / d[2].dim_value, input_shape[3] / d[3].dim_value)
print("rate: ", rate)
d[0].dim_value = input_shape[0]
d[2].dim_value = int(d[2].dim_value * rate[0])
d[3].dim_value = int(d[3].dim_value * rate[1])
for output in model.graph.output:d = output.type.tensor_type.shape.dimd[0].dim_value = input_shape[0]d[2].dim_value = int(d[2].dim_value * rate[0])d[3].dim_value = int(d[3].dim_value * rate[1])onnx.save_model(model, args.onnx)


CenterFace:onnx_file:     "/home/ubuntu/tensorrt_inference/CenterFace/centerface_480_640.onnx"engine_file:   "/home/ubuntu/tensorrt_inference/CenterFace/centerface.trt"BATCH_SIZE:    1INPUT_CHANNEL: 3IMAGE_WIDTH:   640IMAGE_HEIGHT:  480obj_threshold: 0.5nms_threshold: 0.45

同时修改CMakelists.txt文件中的 原内容

set(TENSORRT_ROOT /usr/src/tensorrt/)


set(TENSORRT_ROOT /home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-

创建build文件夹下,执行cmake& make执行命令

ubuntu@ubuntu:~/tensorrt_inference/CenterFace/build$ ./CenterFace_trt ../config.yaml ../samples/
loading filename from:/home/ubuntu/tensorrt_inference/CenterFace/centerface.trt
[11/12/2021-10:50:02] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
deserialize done
[11/12/2021-10:50:02] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
binding0: 3686400
binding1: 76800
binding2: 153600
binding3: 153600
binding4: 768000
Processing: ../samples//worlds-largest-selfie.jpg
prepare image take: 31.9722 ms.
Inference take: 1.60191 ms.
execute success
post process
Post process take: 2583.01 ms.
Processing: ../samples//test4.jpg
prepare image take: 6.97756 ms.
Inference take: 1.42854 ms.
execute success
post process
Post process take: 313.282 ms.
Processing: ../samples//test5.jpg
prepare image take: 3.43697 ms.
Inference take: 1.39913 ms.
execute success
post process
Post process take: 72.2356 ms.
Processing: ../samples//test3.jpg
prepare image take: 3.55852 ms.
Inference take: 1.4428 ms.
execute success
post process
Post process take: 265.396 ms.
Processing: ../samples//test1.jpg
prepare image take: 3.9262 ms.
Inference take: 1.40752 ms.
execute success
post process
Post process take: 475.307 ms.
Processing: ../samples//test2.jpg
prepare image take: 4.02272 ms.
Inference take: 1.40933 ms.
execute success
post process
Post process take: 832.401 ms.
Average processing time is 767.369ms



/home/ubuntu/tensorrt_inference/CenterFace/cmake-build-debug/CenterFace_trt /home/ubuntu/tensorrt_inference/CenterFace/config.yaml /home/ubuntu/tensorrt_inference/CenterFace/sample
Input filename:   /home/ubuntu/tensorrt_inference/CenterFace/centerface.onnx
ONNX IR version:  0.0.4
Opset version:    9
Producer name:    pytorch
Producer version: 1.2
Model version:    0
Doc string:
[11/06/2021-16:53:16] start building engine
[11/06/2021-16:53:16] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] [TRT] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[11/06/2021-16:53:17] [11/06/2021-16:53:26] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[11/06/2021-16:53:57] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[11/06/2021-16:53:57] build engine done
[W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
writing engine file...
save engine file done
[11/06/2021-16:53:57] binding0: 4915200
binding1: 102400
binding2: 204800
binding3: 204800
binding4: 1024000
Average processing time is -nanms
[W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1Process finished with exit code 0


(3)因为生成模型是480 640 的模型,主要原因是MTCNN也是这样的大小进行检测的,为了对比,选择使用哪个模型进行关键点检测和比对

测试结果  注意 测试结果要在root权限运行,貌似需要调用NVCC需要系统权限


识别速度还是蛮快的 RTX2060

/usr/bin/python3.8 /home/ubuntu/CenterFace/prj-tensorrt/demo.py
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
gpu times =  2.311 ms
(1, 1, 120, 160) (1, 2, 120, 160) (1, 2, 120, 160) (1, 10, 120, 160)
count =  1
gpu times =  2.124 ms
(1, 1, 120, 160) (1, 2, 120, 160) (1, 2, 120, 160) (1, 10, 120, 160)
count =  1
gpu times =  2.316 ms
(1, 1, 120, 160) (1, 2, 120, 160) (1, 2, 120, 160) (1, 10, 120, 160)
count =  1
gpu times =  2.833 ms
(1, 1, 120, 160) (1, 2, 120, 160) (1, 2, 120, 160) (1, 10, 120, 160)
count =  1
gpu times =  2.516 ms
(1, 1, 120, 160) (1, 2, 120, 160) (1, 2, 120, 160) (1, 10, 120, 160)
count =  1
gpu times =  4.811 ms
(1, 1, 120, 160) (1, 2, 120, 160) (1, 2, 120, 160) (1, 10, 120, 160)
count =  1
gpu times =  2.261 ms
(1, 1, 120, 160) (1, 2, 120, 160) (1, 2, 120, 160) (1, 10, 120, 160)
count =  1
gpu times =  2.168 ms
(1, 1, 120, 160) (1, 2, 120, 160) (1, 2, 120, 160) (1, 10, 120, 160)
count =  1


八、生成arface的tengine模型,安装mxnet的gpu版和下载预训练模型进行转onnx (我的驱动版本是cuda-11.2)

(1)安装mxnet库之后,/home/ubuntu/tensorrt_inference/arcface/inference.py 可以测试在模型转完之后,测试下载的预训练模型是否可用 (非必要操作)

pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple mxnet-cu112

我将下载的mobilefacenet-res2-6-10-2-dim512文件夹放在了/home/ubuntu/tensorrt_inference/arcface文件中,然后使用arcface_retinaface_mxnet2onnx/mxnet2onnx_demo.py at master · zheshipinyinMc/arcface_retinaface_mxnet2onnx · GitHub 中的arcface_retinaface_mxnet2onnx.py脚本的以下三个函数对mxnet模型进行onnx生成

mxnet2onnx_test() #==mxnet2onnxonnx_modify_demo() #===onnx修改===onnx_inferred_demo() #===onnx前向推导===


ubuntu@ubuntu:~/tensorrt_inference/arcface/mobilefacenet-res2-6-10-2-dim512$ tree
├── mobilefacenet-res2-6-10-2-dim512.caffemodel
├── mobilefacenet-res2-6-10-2-dim512-emore.nchwbin
├── mobilefacenet-res2-6-10-2-dim512-minicaffe.prototxt
├── mobilefacenet-res2-6-10-2-dim512-opencv.prototxt
├── mobilefacenet-res2-6-10-2-dim512.zqparams
├── model-0000.params
├── model-symbol.json
└── onnx├── modelnew2_onnx.onnx└── modelnew_onnx.onnx1 directory, 10 files

(3)然后修改CMakeLists.txt 同样修改一下TensorRT的地址

set(TENSORRT_ROOT /usr/src/tensorrt/)


set(TENSORRT_ROOT /home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-


ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ cat ../config.yaml
arcface:onnx_file:     "/home/ubuntu/tensorrt_inference/arcface/mobilefacenet-res2-6-10-2-dim512/onnx/modelnew2_onnx.onnx"engine_file:   "/home/ubuntu/tensorrt_inference/arcface/mobilefacenet-res2-6-10-2-dim512/onnx/arcface_r100.trt"BATCH_SIZE:    1INPUT_CHANNEL: 3IMAGE_WIDTH:   112IMAGE_HEIGHT:  112


ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ ./arcface_trt ../config.yaml ../samples/
loading filename from:/home/ubuntu/tensorrt_inference/arcface/mobilefacenet-res2-6-10-2-dim512/onnx/arcface_r100.trt
[11/12/2021-16:31:51] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
deserialize done
[11/12/2021-16:31:51] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
binding0: 150528
binding1: 2048
Processing: ../samples//test9.jpg
prepare image take: 0.246129 ms.
Inference take: 349.969 ms.
execute success
post process
Post process take: 0.071033 ms.
Processing: ../samples//test10.jpg
prepare image take: 0.110323 ms.
Inference take: 0.938029 ms.
execute success
post process
Post process take: 0.016646 ms.
Processing: ../samples//test7.jpg
prepare image take: 0.085588 ms.
Inference take: 0.985768 ms.
execute success
post process
Post process take: 0.036193 ms.
Processing: ../samples//test4.jpg
prepare image take: 0.111465 ms.
Inference take: 0.919175 ms.
execute success
post process
Post process take: 0.023314 ms.
Processing: ../samples//test5.jpg
prepare image take: 0.103645 ms.
Inference take: 0.9132 ms.
execute success
post process
Post process take: 0.020485 ms.
Processing: ../samples//test3.jpg
prepare image take: 0.104911 ms.
Inference take: 0.909812 ms.
execute success
post process
Post process take: 0.022067 ms.
Processing: ../samples//test1.jpg
prepare image take: 0.103754 ms.
Inference take: 0.91965 ms.
execute success
post process
Post process take: 0.021067 ms.
Processing: ../samples//test8.jpg
prepare image take: 0.10762 ms.
Inference take: 0.911278 ms.
execute success
post process
Post process take: 0.021202 ms.
Processing: ../samples//test6.jpg
prepare image take: 0.085042 ms.
Inference take: 0.906856 ms.
execute success
post process
Post process take: 0.014921 ms.
Processing: ../samples//test2.jpg
prepare image take: 0.101656 ms.
Inference take: 0.918637 ms.
execute success
post process
Post process take: 0.012409 ms.
Average processing time is 35.9711ms
The similarity matrix of the image folder is:
[1, 0.54835492, 0.53202093, 0.50336587, 0.46709558, 0.52499074, 0.5145005, 0.55176359, 0.53271091, 0.50393772;0.54835492, 1, 0.48129362, 0.4762949, 0.60014594, 0.47247368, 0.52350342, 0.49729002, 0.48303184, 0.53011435;0.53202093, 0.48129362, 1, 0.47364214, 0.49696276, 0.52271658, 0.51096576, 0.48425445, 0.49267167, 0.52210295;0.50336587, 0.4762949, 0.47364214, 1, 0.51976317, 0.80811292, 0.48066908, 0.48449197, 0.45749086, 0.50374758;0.46709558, 0.60014594, 0.49696276, 0.51976317, 1, 0.47129455, 0.53576809, 0.46944529, 0.53187394, 0.5326156;0.52499074, 0.47247368, 0.52271658, 0.80811292, 0.47129455, 1, 0.47609299, 0.48077181, 0.4403888, 0.45222867;0.5145005, 0.52350342, 0.51096576, 0.48066908, 0.53576809, 0.47609299, 1, 0.53024608, 0.54449081, 0.83704132;0.55176359, 0.49729002, 0.48425445, 0.48449197, 0.46944529, 0.48077181, 0.53024608, 1, 0.51537269, 0.48809451;0.53271091, 0.48303184, 0.49267167, 0.45749086, 0.53187394, 0.4403888, 0.54449081, 0.51537269, 1, 0.54323018;0.50393772, 0.53011435, 0.52210295, 0.50374758, 0.5326156, 0.45222867, 0.83704132, 0.48809451, 0.54323018, 1]!


import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import cv2def get_engine(engine_path):# If a serialized engine exists, use it instead of building an engine.print("Reading engine from file {}".format(engine_path))with open(engine_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:return runtime.deserialize_cuda_engine(f.read())TRT_LOGGER = trt.Logger()# engine = get_engine("yolov4_1.trt")def compute_sim(emb1, emb2):from numpy.linalg import normemb1 = emb1.flatten()emb2 = emb2.flatten()sim = np.dot(emb1, emb2) / (norm(emb1) * norm(emb2))return sim# engine = get_engine("mobilefacenet-res2-6-10-2-dim512/onnx/face_reg_mnet.engine")
# print(engine)
# for binding in engine:
#         size = trt.volume(engine.get_binding_shape(binding)) * 1
#         dims = engine.get_binding_shape(binding)
#         print(size)
#         print(dims)
#         print(binding)
#         print(engine.binding_is_input(binding))
#         dtype = trt.nptype(engine.get_binding_dtype(binding))
#         print("dtype = ", dtype)engine = get_engine("/home/ubuntu/tensorrt_inference/arcface/mobilefacenet-res2-6-10-2-dim512/onnx/arcface_r100.trt")
context = engine.create_execution_context()def get_embedding(img):resized = cv2.resize(img, (112, 112), interpolation=cv2.INTER_LINEAR)img_in = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)img_in = np.transpose(img_in, (2, 0, 1)).astype(np.float32)img_in = np.expand_dims(img_in, axis=0)# img_in /= 255.0img_in = np.ascontiguousarray(img_in)print("Shape of the network input: ", img_in.shape)# print(img_in)# with get_engine("mobilefacenet-res2-6-10-2-dim512/onnx/face_reg_mnet.engine") as engine, engine.create_execution_context() as context:h_input = cuda.pagelocked_empty(trt.volume(context.get_binding_shape(0)), dtype=np.float32)h_output = cuda.pagelocked_empty(trt.volume(context.get_binding_shape(1)), dtype=np.float32)# Allocate device memory for inputs and outputs.d_input = cuda.mem_alloc(h_input.nbytes)d_output = cuda.mem_alloc(h_output.nbytes)# Create a stream in which to copy inputs/outputs and run inference.stream = cuda.Stream()# set the host input datah_input = img_in# print(h_input)# Transfer input data to the GPU.cuda.memcpy_htod_async(d_input, h_input, stream)# Run inference.context.execute_async_v2(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)# Transfer predictions back from the GPU.cuda.memcpy_dtoh_async(h_output, d_output, stream)# Synchronize the streamstream.synchronize()# Return the host output.# print(h_output)return h_outputimg1 = cv2.imread("/home/ubuntu/tensorrt_inference/arcface/samples/test1.jpg")
emb1 = get_embedding(img1)img2 = cv2.imread("/home/ubuntu/tensorrt_inference/arcface/samples/test2.jpg")
emb2 = get_embedding(img2)print(compute_sim(emb1, emb2))


/usr/bin/python3.8 /home/ubuntu/CenterFace/prj-tensorrt/arcface.py
Reading engine from file /home/ubuntu/tensorrt_inference/arcface/mobilefacenet-res2-6-10-2-dim512/onnx/arcface_r100.trt
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
Shape of the network input:  (1, 3, 112, 112)
Shape of the network input:  (1, 3, 112, 112)
0.6905318Process finished with exit code 0


/usr/bin/python3.8 /home/ubuntu/CenterFace/TensorRT_centerface_arcface/demo.py
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
Reading engine from file arcface_r100.trt
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
count =  1
face similarity=  0.11867439
gpu times =  26.207 ms
count =  1
face similarity=  0.14468496
gpu times =  21.175 ms
count =  1
face similarity=  0.14031047
gpu times =  20.836 ms
count =  1
face similarity=  0.13255903
gpu times =  20.659 ms
count =  1
face similarity=  0.15876028
gpu times =  20.742 ms

准备移植中 ;

八、移植Centerface和Arface到Jetson Xavier NX开发板上,

(1)Centerface移植Jetson Xavier Nx

需要将/home/ubuntu/tensorrt_inference 中的Centerface代码中yaml-cpp 依赖全部注释掉,因为开发板上没有这个静态库,我安装过程中发生了错误,索性不安装了,将参数写死就行,生tengine就可以了


cmake_minimum_required(VERSION 3.5)project(CenterFace_trt)set(CMAKE_CXX_STANDARD 14)# CUDA
find_package(CUDA REQUIRED)
message(STATUS "Find CUDA include at ${CUDA_INCLUDE_DIRS}")
message(STATUS "Find CUDA libraries: ${CUDA_LIBRARIES}")# TensorRT
set(TENSORRT_ROOT /usr/src/tensorrt/)
message(STATUS "Found TensorRT headers at ${TENSORRT_INCLUDE_DIR}")
message(STATUS "Find TensorRT libs: ${TENSORRT_LIBRARY}")# OpenCV
find_package(OpenCV REQUIRED)
message(STATUS "Find OpenCV include at ${OpenCV_INCLUDE_DIRS}")
message(STATUS "Find OpenCV libraries: ${OpenCV_LIBRARIES}")set(COMMON_INCLUDE ../includes/common)
#set(YAML_INCLUDE ../includes/yaml-cpp/include)#sxj731533730
#set(YAML_LIB_DIR ../includes/yaml-cpp/libs)#sxj731533730#include_directories(${CUDA_INCLUDE_DIRS} ${TENSORRT_INCLUDE_DIR} ${OpenCV_INCLUDE_DIRS} ${COMMON_INCLUDE} ${YAML_INCLUDE})#sxj731533730
#link_directories(${YAML_LIB_DIR}) #sxj731533730add_executable(CenterFace_trt main.cpp CenterFace.cpp)
#target_link_libraries(CenterFace_trt ${OpenCV_LIBRARIES} ${CUDA_LIBRARIES} ${TENSORRT_LIBRARY} yaml-cpp)  #sxj731533730
target_link_libraries(CenterFace_trt ${OpenCV_LIBRARIES} ${CUDA_LIBRARIES} ${TENSORRT_LIBRARY} )


nvidia@nvidia-desktop:~/tensorrt_inference/CenterFace/build$ cat ../CenterFace.cpp
#include "CenterFace.h"
//#include "yaml-cpp/yaml.h"
#include "common.hpp"CenterFace::CenterFace(const std::string &config_file) {//YAML::Node root = YAML::LoadFile(config_file);//YAML::Node config = root["CenterFace"];//YAML::Node root = YAML::LoadFile(config_file);//engine_file = config["engine_file"].as<std::string>();// BATCH_SIZE = config["BATCH_SIZE"].as<int>();// INPUT_CHANNEL = config["INPUT_CHANNEL"].as<int>();// IMAGE_WIDTH = config["IMAGE_WIDTH"].as<int>();// IMAGE_HEIGHT = config["IMAGE_HEIGHT"].as<int>();onnx_file = "../centerface_480_640.onnx";engine_file ="../centerface.trt";BATCH_SIZE = 1;INPUT_CHANNEL = 3;IMAGE_WIDTH = 640;IMAGE_HEIGHT = 480;obj_threshold = 0.5;nms_threshold = 0.45;

在Jetson Xavier Nx执行即可

nvidia@nvidia-desktop:~/tensorrt_inference/CenterFace/build$ make
Scanning dependencies of target CenterFace_trt
[ 33%] Building CXX object CMakeFiles/CenterFace_trt.dir/main.cpp.o
[ 66%] Building CXX object CMakeFiles/CenterFace_trt.dir/CenterFace.cpp.o
[100%] Linking CXX executable CenterFace_trt
[100%] Built target CenterFace_trt
nvidia@nvidia-desktop:~/tensorrt_inference/CenterFace/build$ ./CenterFace_trt ../config.yaml  ../samples/
/config.yaml  ../samples/
loading filename from:../centerface.trt
deserialize done
binding0: 3686400
binding1: 76800
binding2: 153600
binding3: 153600
binding4: 768000
Processing: ../samples//test2.jpg


vidia@nvidia-desktop:~/tensorrt_inference/arcface/build$ make
Scanning dependencies of target arcface_trt
[ 33%] Building CXX object CMakeFiles/arcface_trt.dir/main.cpp.o
[ 66%] Building CXX object CMakeFiles/arcface_trt.dir/arcface.cpp.o
[100%] Linking CXX executable arcface_trt
[100%] Built target arcface_trt
nvidia@nvidia-desktop:~/tensorrt_inference/arcface/build$ ./arcface_trt ../config.yaml ../samples/
Input filename:   ../modelnew2_onnx.onnx
ONNX IR version:  0.0.8
Opset version:    15
Producer name:
Producer version:
Model version:    0
Doc string:
start building engine

最终测试一下开发板的人脸检测、比对 效果蛮好的 PC端GTX1050最快达到30fps,

Jetson Xavier NX开发板最快达到15fps 很流畅

Jetson Xavier NX开发板

开发板的显存使用率和功耗都不是很大 NVIDIA yyds

整个代码上传GitHub: https://github.com/sxj731533730/TensorRT_centerface_arcface

里面含有两个文件夹模型 一个模型针对PC 的cuda11.2+TensorRT7.2.2  另一个文件夹模型针对Jetson Nano NX 的cuda10.2+TensorRT 7

八、突然想到使用Mediapipe进行人脸的关键点检测和比对 是否效果是不是也会不错,发现的确还挺好的,即使使用tensorflow-cpu也达到了14~20FPS 但是在开发板上,太吃cpu了,而且GPU的Tensorflow版本没搭建起来,有空再说


首先购买了京东商店的转接模块  六合一串口模块 - 丢石头百科 测试,下载驱动,安装驱动,然后测试



sudo chmod 777 /dev/ttyTHS1


import cv2
import serial
import struct,time
import time
import binascii
ser=serial.Serial("/dev/ttyTHS1",115200) #
def recv():print("receive test.......")while True:for i in range (0,5):data=str(binascii.b2a_hex(ser.read(1)))[2:-1]num[i]=dataif((num[0]=='5a') and (num[1]=='a5')):print(num)
def write():print("write test.......")while True:ser.write("666".encode("UTF-8"))print("36 36 36")time.sleep(1)
op=input("enter the operation:")
if op =="0":recv()
elif op=="1":write()


URAT的引针第三排 内侧为RTX 外侧为TXD 第三排最后一个外侧为GND (外侧指边缘)


橙色接线 连接USB转TTL开发板的 TXD 连接开发版的RXD引针

红色接线 连接USB转TTL开发板的 RXD 连接开发版的TXD引针

绿色接线 连接USB转TTL开发板的 GND 连接开发版的GND引针

其实没必要 连接USB转TTL开发板的 TXD 又不从PC端发数据给开发板



通过VNC搭建Ubuntu 18.04和20.04图形界面 - 轻量应用服务器 - 阿里云

CenterFace+TensorRT部署人脸和关键点检测400fps - 知乎

笔记(四)Jetson Nano 系统登录_SWORLD-CSDN博客_jetson nano密码

Install PyTorch on Jetson Nano - Q-engineering

GitHub - zheshipinyinMc/arcface_retinaface_mxnet2onnx: arcface and retinaface model convert mxnet to onnx.

