python图像人类检测_OpenCV人类行为识别（3D卷积神经网络）

1. 3D卷积神经网络

相比于2D 卷积神经网络，3D卷积神经网络更能很好的利用视频中的时序信息。因此，其主要应用视频、行为识别等领域居多。3D卷积神经网络是将时间维度看成了第三维。

人类行为识别的实际应用：

安防监控。(检测识别异常行为：如打架，偷东西等)

监视和培训新人工作来确保任务执行正确。(例如，鸡蛋灌饼制作程序：和面，擀面团，打鸡蛋，摊饼等动作)

判断检测食品服务人员是否按规定洗手。

自动对视频数据分类。

人类的行为识别，在实际生活环境中，在不同的场景会存在着背景杂乱、遮挡和视角变化等等情况，对于人来说，是很容易就可以辨识出来，但对于计算机，就不是一件简单的事了，比如目标尺度变化和视觉改变等。

2. 人类行为识别模型

abseiling

air drumming

answering questions

applauding

applying cream

archery

arm wrestling

arranging flowers

assembling computer

auctioning

baby waking up

baking cookies

balloon blowing

bandaging

barbequing

bartending

beatboxing

bee keeping

belly dancing

bench pressing

bending back

bending metal

biking through snow

blasting sand

blowing glass

blowing leaves

blowing nose

blowing out candles

bobsledding

bookbinding

bouncing on trampoline

bowling

braiding hair

breading or breadcrumbing

breakdancing

brush painting

brushing hair

brushing teeth

building cabinet

building shed

bungee jumping

busking

canoeing or kayaking

capoeira

carrying baby

...

import os

import numpy as np

import cv2 as cv

import argparse

from common import findFile

parser = argparse.ArgumentParser(description='Use this script to run action recognition using 3D ResNet34',

formatter_class=argparse.ArgumentDefaultsHelpFormatter)

parser.add_argument('--input', '-i', help='Path to input video file. Skip this argument to capture frames from a camera.')

parser.add_argument('--model', required=True, help='Path to model.')

parser.add_argument('--classes', default=findFile('action_recongnition_kinetics.txt'), help='Path to classes list.')

# To get net download original repository https://github.com/kenshohara/video-classification-3d-cnn-pytorch

# For correct ONNX export modify file: video-classification-3d-cnn-pytorch/models/resnet.py

# change

# - def downsample_basic_block(x, planes, stride):

# - out = F.avg_pool3d(x, kernel_size=1, stride=stride)

# - zero_pads = torch.Tensor(out.size(0), planes - out.size(1),

# - out.size(2), out.size(3),

# - out.size(4)).zero_()

# - if isinstance(out.data, torch.cuda.FloatTensor):

# - zero_pads = zero_pads.cuda()

# -

# - out = Variable(torch.cat([out.data, zero_pads], dim=1))

# - return out

# To

# + def downsample_basic_block(x, planes, stride):

# + out = F.avg_pool3d(x, kernel_size=1, stride=stride)

# + out = F.pad(out, (0, 0, 0, 0, 0, 0, 0, int(planes - out.size(1)), 0, 0), "constant", 0)

# + return out

# To ONNX export use torch.onnx.export(model, inputs, model_name)

def get_class_names(path):

class_names = []

with open(path) as f:

for row in f:

class_names.append(row[:-1])

return class_names

def classify_video(video_path, net_path):

SAMPLE_DURATION = 16

SAMPLE_SIZE = 112

mean = (114.7748, 107.7354, 99.4750)

class_names = get_class_names(args.classes)

net = cv.dnn.readNet(net_path)

net.setPreferableBackend(cv.dnn.DNN_BACKEND_INFERENCE_ENGINE)

net.setPreferableTarget(cv.dnn.DNN_TARGET_CPU)

winName = 'Deep learning image classification in OpenCV'

cv.namedWindow(winName, cv.WINDOW_AUTOSIZE)

cap = cv.VideoCapture(video_path)

while cv.waitKey(1) < 0:

frames = []

for _ in range(SAMPLE_DURATION):

hasFrame, frame = cap.read()

if not hasFrame:

exit(0)

frames.append(frame)

inputs = cv.dnn.blobFromImages(frames, 1, (SAMPLE_SIZE, SAMPLE_SIZE), mean, True, crop=True)

inputs = np.transpose(inputs, (1, 0, 2, 3))

inputs = np.expand_dims(inputs, axis=0)

net.setInput(inputs)

outputs = net.forward()

class_pred = np.argmax(outputs)

label = class_names[class_pred]

for frame in frames:

labelSize, baseLine = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 0.5, 1)

cv.rectangle(frame, (0, 10 - labelSize[1]),

(labelSize[0], 10 + baseLine), (255, 255, 255), cv.FILLED)

cv.putText(frame, label, (0, 10), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0))

cv.imshow(winName, frame)

if cv.waitKey(1) & 0xFF == ord('q'):

break

if __name__ == "__main__":

args, _ = parser.parse_known_args()

classify_video(args.input if args.input else 0, args.model)

3.代码

环境：

win10

pycharm

anaconda3

python3.7

文件结构：

代码：

from collections import deque

import numpy as np

import argparse

import imutils

import cv2

# 构造参数

ap = argparse.ArgumentParser()

ap.add_argument("-m", "--model", required=True, help="path to trained human activity recognition model")

ap.add_argument("-c", "--classes", required=True, help="path to class labels file")

ap.add_argument("-i", "--input", type=str, default="", help="optional path to video file")

args = vars(ap.parse_args())

# 类别，样本持续时间(帧数)，样本大小(空间尺寸)

CLASSES = open(args["classes"]).read().strip().split("\n")

SAMPLE_DURATION = 16

SAMPLE_SIZE = 112

print("处理中...")

# 创建帧队列

frames = deque(maxlen=SAMPLE_DURATION)

# 读取模型

net = cv2.dnn.readNet(args["model"])

# 待检测视频

vs = cv2.VideoCapture(args["input"] if args["input"] else 0)

writer = None

# 循环处理视频流

while True:

# 读取每帧

(grabbed, frame) = vs.read()

# 判断视频是否结束

if not grabbed:

print("无视频读取...")

break

# 调整大小，放入队列中

frame = imutils.resize(frame, width=640)

frames.append(frame)

# 判断是否填充到最大帧数

if len(frames) < SAMPLE_DURATION:

continue

# 队列填充满后继续处理

blob = cv2.dnn.blobFromImages(frames, 1.0, (SAMPLE_SIZE, SAMPLE_SIZE), (114.7748, 107.7354, 99.4750),

swapRB=True, crop=True)

blob = np.transpose(blob, (1, 0, 2, 3))

blob = np.expand_dims(blob, axis=0)

# 识别预测

net.setInput(blob)

outputs = net.forward()

label = CLASSES[np.argmax(outputs)]

# 绘制框

cv2.rectangle(frame, (0, 0), (300, 40), (255, 0, 0), -1)

cv2.putText(frame, label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2)

# cv2.imshow("Activity Recognition", frame)

# 检测是否保存

if writer is None:

# 初始化视频写入器

# fourcc = cv2.VideoWriter_fourcc(*"MJPG")

fourcc = cv2.VideoWriter_fourcc(*"mp4v")

writer = cv2.VideoWriter("E:\\work\\activity-recognition-demo\\videos\\output\\xishou1.mp4", fourcc, 30, (frame.shape[1], frame.shape[0]), True)

writer.write(frame)

# 按 q 键退出

# key = cv2.waitKey(1) & 0xFF

# if key == ord("q"):

# break

print("结束...")

writer.release()

vs.release()

4. 测试

测试1：洗手

OpenCV人类行为检测-洗手

https://www.bilibili.com/video/av96440536/

视频左上角打上了“washing hands”(洗手)标签。

测试2：瑜伽

上图视频测试地址：https://www.bilibili.com/video/BV13E411c7QK/

检测到视频中是“yoga”(瑜伽)，同时又识别到执行的动作是“stretching leg”(伸腿)。

python图像人类检测_OpenCV人类行为识别（3D卷积神经网络）相关推荐

python图像人类检测_Python 超简单实现人类面部情绪的识别
还记得我们之前写过一篇文章<手把手教你人脸识别自动开机>吗?里面用OpenCV对人脸进行简单的识别,让计算机训练认识到某个特定人物后识别对象.今天来做点高级的,识别出人脸的情绪. 本文分为 ...
OpenCV人类行为识别（3D卷积神经网络）
上图视频测试链接:https://www.bilibili.com/video/BV13E411c7Mv/ 1. 3D卷积神经网络相比于2D 卷积神经网络,3D卷积神经网络更能很好的利用视频中的时序 ...
《Python与硬件项目案例》— 基于Python的口罩检测与指纹识别签到系统设计
<Python与硬件项目案例>- 基于Python的口罩检测与指纹识别签到系统设计目录 <Python与硬件项目案例>- 基于Python的口罩检测与指纹识别签到系统设计 1 ...
多时间尺度 3D 卷积神经网络的步态识别
多时间尺度 3D 卷积神经网络的步态识别论文题目:Gait Recognition with Multiple-Temporal-Scale 3D Convolutional Neural Netw ...
基于CNN目标检测方法（RCNN，Fast-RCNN，Faster-RCNN，Mask-RCNN，YOLO，SSD）行人检测，目标追踪，卷积神经网络
一.研究意义卷积神经网络(CNN)由于其强大的特征提取能力,近年来被广泛用于计算机视觉领域.1998年Yann LeCun等提出的LeNet-5网络结构,该结构使得卷积神经网络可以端到端的训练,并应 ...
TensorFlow 2.0 mnist手写数字识别(CNN卷积神经网络)
TensorFlow 2.0 (五) - mnist手写数字识别(CNN卷积神经网络) 源代码/数据集已上传到 Github - tensorflow-tutorial-samples 大白话讲解卷积 ...
卷积神经网络（2D卷积神经网络和3D卷积神经网络理解）
前言卷积神经⽹络(convolutional neural network,CNN)是⼀类强⼤的神经⽹络,正是为处理图像数据而设计的.基于卷积神经⽹络结构的模型在计算机视觉领域中已经占主导地位,当 ...
深度学习之3D卷积神经网络
一.概述 3D CNN主要运用在视频分类.动作识别等领域,它是在2D CNN的基础上改变而来.由于2D CNN不能很好的捕获时序上的信息,因此我们采用3D CNN,这样就能将视频中时序信息进行很好的利 ...
特征匹配实现印刷体数字识别，卷积神经网络实现印刷体数字识别
特征匹配实现印刷体数字识别,卷积神经网络实现印刷体数字识别(很可靠) 1.印刷体数字识别(特征匹配) 1.首先需要了解为什么印刷体数字识别我使用的是特征匹配的方法,我起初也走了很多的坑,当初固执的识别 ...

python图像人类检测_OpenCV人类行为识别（3D卷积神经网络）

python图像人类检测_OpenCV人类行为识别（3D卷积神经网络）相关推荐

最新文章

热门文章