PyTorch 1.7 Video 初体验(Video Datasets,Video IO,Video Classification Models,Video Transform)
目录
Environment 环境
Reference 参考链接
Video Datasets 视频数据集 & 加载
加载 UCF101 数据集
加载 HMDB51 数据集
加载 Kinetics 400 数据集
Video I/O 视频 I/O 操作
torchvision.io.read_video()
torchvision.io.read_video_timestamps()
torchvision.io.write_video()
class torchvision.io.VideoReader(path, stream='video')
Video Transform 视频变换操作
ToTensorVideo()
NormalizeVideo()
RandomHorizontalFlipVideo()
CenterCropVideo()
RandomCropVideo()
RandomResizedCropVideo()
Example
Video Classification Model 视频动作分类模型
Example
Environment 环境
- Win 10
- Anaconda Navigator
- PyCharm
- cuda 10.1
- torch 1.7.1
- torchvision 0.8.2
- Python 3.8
Reference 参考链接
- Anaconda Navigator 版本的升级:https://www.cnblogs.com/developerchen/p/8879516.html
打开 Anaconda Prompt,输入以下命令:
conda install -c continuumcrew anaconda-navigator=1.5.1
conda update --all
- torch 1.7.1 的安装:https://pytorch.org/get-started/locally/
打开 Anaconda Prompt,切换到相应环境,输入以下命令:
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
- Pytorch 1.7.1 官方文档:https://pytorch.org/docs/stable/index.html
Video Datasets 视频数据集 & 加载
- UCF101:https://pytorch.org/docs/stable/torchvision/datasets.html#ucf101
- HMDB51:https://pytorch.org/docs/stable/torchvision/datasets.html#hmdb51
- Kinetics400:https://pytorch.org/docs/stable/torchvision/datasets.html#kinetics-400
- ......
加载 UCF101 数据集
import torchvision.datasets as datasetsdata = datasets.UCF101(root='path/UCF-101',annotation_path='path/UCF101TrainTestSplits-RecognitionTask/ucfTrainTestlist',frames_per_clip=16,num_workers=0
)print(data)
返回值:
- video (Tensor[T, H, W, C]): the `T` video frames
- audio(Tensor[K, L]): the audio frames, where `K` is the number of channels and `L` is the number of points
- label (int): class of the video clip
注意:
- win 10 系统下运行该代码一定要加上 num_workers=0,不然会报出如下错误
- 还需要安装 PyAV 这个库,安装命令:pip install av
- 在导入 UCF101 数据时,由于 windows 路径用的是“\”,所以在加载数据集时会报出如下错误:
原因 & 解决方案:https://stackoverflow.com/questions/61522539/i-cant-import-the-ucf-101-dataset-torchvision-list-index-out-of-range-error
原因:trainlist01/02/03.txt 和 testlist01/02/03.txt 中的 video path 长这样:ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi 和 windows 系统路径要求的斜杠( \ )不一样
我用的是其中的第一种解决方案:把 trainlist01/02/03.txt 和 testlist01/02/03.txt 中的 / 全部替换为 \
加载 HMDB51 数据集
参数:
root (string) – Root directory of the HMDB51 Dataset.
annotation_path (str) – Path to the folder containing the split files.
frames_per_clip (int) – Number of frames in a clip.
step_between_clips (int) – Number of frames between each clip.
fold (int, optional) – Which fold to use. Should be between 1 and 3.
train (bool, optional) – If
True
, creates a dataset from the train split, otherwise from thetest
split.transform (callable, optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.
返回值:
- video (Tensor[T, H, W, C]): the `T` video frames
- audio(Tensor[K, L]): the audio frames, where `K` is the number of channels and `L` is the number of points
- label (int): class of the video clip
加载 Kinetics 400 数据集
参数:
root (string) – Root directory of the Kinetics-400 Dataset.
frames_per_clip (int) – number of frames in a clip
step_between_clips (int) – number of frames between each clip
transform (callable, optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.
返回值:
- video (Tensor[T, H, W, C]): the `T` video frames
- audio(Tensor[K, L]): the audio frames, where `K` is the number of channels and `L` is the number of points
- label (int): class of the video clip
Video I/O 视频 I/O 操作
官方文档:
- https://pytorch.org/docs/stable/torchvision/io.html?highlight=video
- https://pytorch.org/docs/stable/torchvision/io.html#fine-grained-video-api
torchvision.io.read_video()
源码:https://pytorch.org/docs/stable/_modules/torchvision/io/video.html#read_video
Parameters
filename (str) – path to the video file
start_pts (int if pts_unit = 'pts', optional) – float / Fraction if pts_unit = ‘sec’, optional the start presentation time of the video
end_pts (int if pts_unit = 'pts', optional) – float / Fraction if pts_unit = ‘sec’, optional the end presentation time
pts_unit (str, optional) – unit in which start_pts and end_pts values will be interpreted, either ‘pts’ or ‘sec’. Defaults to ‘pts’.
Returns
vframes (Tensor[T, H, W, C]) – the T video frames
aframes (Tensor[K, L]) – the audio frames, where K is the number of channels and L is the number of points
info (Dict) – metadata for the video and audio. Can contain the fields video_fps (float) and audio_fps (int)
补充知识:什么是时间戳?什么是 pts?
https://blog.csdn.net/tanningzhong/article/details/105564589
时间戳单位
前面我们提到采样率,感觉到采样率是个很大的单位,一般标准的音频AAC采样率达到了44kHz,视频采样率也规定在90000Hz.所以我们衡量时间的单位不能再是秒,毫秒这种真实的时间单位,我们的单位应该转换为采样率,也就是一个采样的时间为音视频的时间单位,这就是时间戳的真实值。当我们要播放和控制时,我们再将时间戳根据采样率转换为真实的时间即可。
一句话,时间戳不是真实的时间是采样次数。比如时间戳是160,我们不能认为是160秒或者160毫秒,应该是160个采样。要换算真实时间,我们必须知道采样率,比如8000,那么说明1秒被划分成8000分之一,如果你要明确160个采样占用的时间,则160*(1/8000)即可,即20毫秒。
时间戳增量
就是一帧图像和另外一帧图像之间的时间戳差值,或者一帧音频和一帧音频的时间戳差值。同理时间戳增量也是采样个数的差值不是真实时间差值,还是要根据采样率才能换算成真实时间。
所以对于视频和音频的时间戳计算要一定明确帧率是多少,采样率是多少。
比如视频而言,帧率25,那么对于90000的采样率来说,一帧占用的采样数就是90000/25也就是3600,说明每帧图像的时间戳增量应该是3600,换算成实际时间就是3600*(1/90000)=0.04秒=40毫秒,这也和1/25=0.04秒=40毫秒一致。
对于AAC音频,一帧1024个采样,采样频率是44kHz,所以一帧的播放时间应该是1024*(1/44100)=0.0232秒=23.22毫秒。
用个 Example 更直观的理解这两个概念:
import torchvision.io as iovframes, aframes, info = io.read_video(filename='path/v_ApplyEyeMakeup_g01_c01.avi',pts_unit='pts',end_pts=3
)print(vframes.shape)
print(info)# output:
# torch.Size([3, 240, 320, 3])
# {'video_fps': 25.0, 'audio_fps': 44100}# --------------------------------------------------------------------import torchvision.io as iovframes, aframes, info = io.read_video(filename='path/v_ApplyEyeMakeup_g01_c01.avi',pts_unit='sec',end_pts=3
)print(vframes.shape)
print(info)# output:
# torch.Size([75, 240, 320, 3])
# {'video_fps': 25.0, 'audio_fps': 44100}
torchvision.io.read_video_timestamps()
源码:https://pytorch.org/docs/stable/_modules/torchvision/io/video.html#read_video_timestamps
Parameters
filename (str) – path to the video file
pts_unit (str, optional) – unit in which timestamp values will be returned either ‘pts’ or ‘sec’. Defaults to ‘pts’.
Returns
pts (List[int] if pts_unit = ‘pts’) – List[Fraction] if pts_unit = ‘sec’ presentation timestamps for each one of the frames in the video.
video_fps (float, optional) – the frame rate for the video
Example:
import torchvision.io as iov_pts, v_fps = io.read_video_timestamps(filename='path/v_ApplyEyeMakeup_g01_c01.avi',pts_unit='pts'
)print(v_pts)
print(v_fps)# output
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164]
# 25.0# ---------------------------------------------------------------------------import torchvision.io as iov_pts, v_fps = io.read_video_timestamps(filename='path/v_ApplyEyeMakeup_g01_c01.avi',pts_unit='sec'
)print(v_pts)
print(v_fps)# output
# [Fraction(1, 25), Fraction(2, 25), Fraction(3, 25), Fraction(4, 25), Fraction(1, 5), Fraction(6, 25), Fraction(7, 25), Fraction(8, 25), Fraction(9, 25), Fraction(2, 5), Fraction(11, 25), Fraction(12, 25), Fraction(13, 25), Fraction(14, 25), Fraction(3, 5), Fraction(16, 25), Fraction(17, 25), Fraction(18, 25), Fraction(19, 25), Fraction(4, 5), Fraction(21, 25), Fraction(22, 25), Fraction(23, 25), Fraction(24, 25), Fraction(1, 1), Fraction(26, 25), Fraction(27, 25), Fraction(28, 25), Fraction(29, 25), Fraction(6, 5), Fraction(31, 25), Fraction(32, 25), Fraction(33, 25), Fraction(34, 25), Fraction(7, 5), Fraction(36, 25), Fraction(37, 25), Fraction(38, 25), Fraction(39, 25), Fraction(8, 5), Fraction(41, 25), Fraction(42, 25), Fraction(43, 25), Fraction(44, 25), Fraction(9, 5), Fraction(46, 25), Fraction(47, 25), Fraction(48, 25), Fraction(49, 25), Fraction(2, 1), Fraction(51, 25), Fraction(52, 25), Fraction(53, 25), Fraction(54, 25), Fraction(11, 5), Fraction(56, 25), Fraction(57, 25), Fraction(58, 25), Fraction(59, 25), Fraction(12, 5), Fraction(61, 25), Fraction(62, 25), Fraction(63, 25), Fraction(64, 25), Fraction(13, 5), Fraction(66, 25), Fraction(67, 25), Fraction(68, 25), Fraction(69, 25), Fraction(14, 5), Fraction(71, 25), Fraction(72, 25), Fraction(73, 25), Fraction(74, 25), Fraction(3, 1), Fraction(76, 25), Fraction(77, 25), Fraction(78, 25), Fraction(79, 25), Fraction(16, 5), Fraction(81, 25), Fraction(82, 25), Fraction(83, 25), Fraction(84, 25), Fraction(17, 5), Fraction(86, 25), Fraction(87, 25), Fraction(88, 25), Fraction(89, 25), Fraction(18, 5), Fraction(91, 25), Fraction(92, 25), Fraction(93, 25), Fraction(94, 25), Fraction(19, 5), Fraction(96, 25), Fraction(97, 25), Fraction(98, 25), Fraction(99, 25), Fraction(4, 1), Fraction(101, 25), Fraction(102, 25), Fraction(103, 25), Fraction(104, 25), Fraction(21, 5), Fraction(106, 25), Fraction(107, 25), Fraction(108, 25), Fraction(109, 25), Fraction(22, 5), Fraction(111, 25), Fraction(112, 25), Fraction(113, 25), Fraction(114, 25), Fraction(23, 5), Fraction(116, 25), Fraction(117, 25), Fraction(118, 25), Fraction(119, 25), Fraction(24, 5), Fraction(121, 25), Fraction(122, 25), Fraction(123, 25), Fraction(124, 25), Fraction(5, 1), Fraction(126, 25), Fraction(127, 25), Fraction(128, 25), Fraction(129, 25), Fraction(26, 5), Fraction(131, 25), Fraction(132, 25), Fraction(133, 25), Fraction(134, 25), Fraction(27, 5), Fraction(136, 25), Fraction(137, 25), Fraction(138, 25), Fraction(139, 25), Fraction(28, 5), Fraction(141, 25), Fraction(142, 25), Fraction(143, 25), Fraction(144, 25), Fraction(29, 5), Fraction(146, 25), Fraction(147, 25), Fraction(148, 25), Fraction(149, 25), Fraction(6, 1), Fraction(151, 25), Fraction(152, 25), Fraction(153, 25), Fraction(154, 25), Fraction(31, 5), Fraction(156, 25), Fraction(157, 25), Fraction(158, 25), Fraction(159, 25), Fraction(32, 5), Fraction(161, 25), Fraction(162, 25), Fraction(163, 25), Fraction(164, 25)]
# 25.0
torchvision.io.write_video()
源码:https://pytorch.org/docs/stable/_modules/torchvision/io/video.html#write_video
Parameters
filename (str) – path where the video will be saved
video_array (Tensor[T, H, W, C]) – tensor containing the individual frames, as a uint8 tensor in [T, H, W, C] format
fps (Number) – frames per second
class torchvision.io.VideoReader(path, stream='video')
官方文档:https://pytorch.org/docs/stable/torchvision/io.html#fine-grained-video-api
Fine-grained video-reading API. Supports frame-by-frame reading of various streams from a single video container.
Parameters
path (string) – Path to the video file in supported format
stream (string, optional) – descriptor of the required stream, followed by the stream id, in the format
{stream_type}:{stream_id}
. Defaults to"video:0"
. Currently available options include['video', 'audio']
注意:我使用的时候报出了如下错误。原因是 VideoReader 还在测试【Beta】中,网上有人说安装 ffmpeg 后就可以了,但是我试了不管是在系统还是在 conda 下安装都没有用,还是等正式推出之后再说吧。。。
参考:
- "RuntimeError: Not compiled with video_reader support" raises when I use the new fine-grained VideoReader API. https://github.com/pytorch/vision/issues/2934#issuecomment-718834813
- 官方解释报错链接(还在测试【Beta】中):https://github.com/pytorch/vision/releases/tag/v0.8.0
- ffmpeg 的 conda 下载与安装:conda install ffmpeg
- ffmpeg 的 windows 下载与安装:https://www.zhihu.com/question/288655694/answer/1605692761
常用函数:
- __next__() :Decodes and returns the next frame of the current stream
Returns:
a dictionary with fields
data
andpts
containing decoded frame and corresponding timestamp
- get_metadata():Returns video metadata
Returns:
dictionary containing duration and frame rate for every stream
- seek(time_s: float):Seek within current stream.
Parameters
time_s (float) – seek time in seconds
Video Transform 视频变换操作
官方源码:
- https://github.com/pytorch/vision/blob/master/torchvision/transforms/_functional_video.py(比下一个链接更底层一点)
- https://github.com/pytorch/vision/blob/master/torchvision/transforms/_transforms_video.py(上一个链接包装了一下)
我暂时没有找到官方文档,不过从源码里的注释里也能明白。
第二个链接里官方给出的 video 相关的 Transform 函数如下:
- RandomCropVideo
- RandomResizedCropVideo
- CenterCropVideo
- NormalizeVideo
- ToTensorVideo
- RandomHorizontalFlipVideo
ToTensorVideo()
Convert tensor data type from uint8 to float, divide value by 255.0 and permute the dimensions of clip tensor.
和图片的 ToTensor() 操作类似,但要注意维度的顺序!
Args:
clip (torch.tensor, dtype=torch.uint8): Size is (T, H, W, C)Return:
clip (torch.tensor, dtype=torch.float): Size is (C, T, H, W)
NormalizeVideo()
Normalize the video clip by mean subtraction and division by standard deviation.
和图片的 Normalize() 函数是一致的。不过图片通常使用 ImageNet 的 mean 和 std,视频用的是 Kinetics-400 的 mean = [0.43216, 0.394666, 0.37645] and std = [0.22803, 0.22145, 0.216989](来源:https://pytorch.org/docs/stable/torchvision/models.html#video-classification) 。
Args:
mean (3-tuple): pixel RGB mean
std (3-tuple): pixel RGB standard deviation
inplace (boolean): whether do in-place normalization
RandomHorizontalFlipVideo()
Flip the video clip along the horizonal direction with a given probability.
没有 Video Vertically Flip 也能理解吧
Args:
p (float): probability of the clip being flipped. Default value is 0.5
CenterCropVideo()
Args:
clip (torch.tensor): Video clip to be cropped. Size is (C, T, H, W)crop_size: int / tuple
Returns:
torch.tensor: central cropping of video clip. Size is (C, T, crop_size, crop_size)
RandomCropVideo()
Args:
clip (torch.tensor): Video clip to be cropped. Size is (C, T, H, W)size: int / tuple
Returns:
torch.tensor: randomly cropped/resized video clip.
RandomResizedCropVideo()
Args:
clip (torch.tensor): Video clip to be cropped. Size is (C, T, H, W)scale:【Default】(0.08, 1.0)
ratio:【Default】(3.0 / 4.0, 4.0 / 3.0)
interpolation_mode:【Default】"bilinear"
Returns:
torch.tensor: randomly cropped/resized video clip.
Example
import torchvision.transforms as transform
import torchvision.transforms._transforms_video as v_transform
import torchvision.io as iovframes, aframes, info = io.read_video(filename='path/v_ApplyEyeMakeup_g01_c01.avi',pts_unit='pts',
)trans = transform.Compose([v_transform.ToTensorVideo(),v_transform.RandomHorizontalFlipVideo(),v_transform.RandomResizedCropVideo(112),
])print(vframes.shape)
print(trans(vframes))
print(trans(vframes).shape)# output:
# 原来的 video clip tensor's shape:torch.Size([164, 240, 320, 3])
# Transform 后的 video clip tensor's shape:torch.Size([3, 164, 112, 112])
Video Classification Model 视频动作分类模型
官方文档:https://pytorch.org/docs/stable/torchvision/models.html#video-classification
源码:https://pytorch.org/docs/stable/_modules/torchvision/models/video/resnet.html
模型:
ResNet 3D 18
ResNet MC 18
ResNet (2+1)D
这些模型我没太详细接触过,文档里已经非常贴心的给出了相应的论文:https://arxiv.org/abs/1711.11248。
Parameters
pretrained (bool) – If True, returns a model pre-trained on Kinetics-400
progress (bool) – If True, displays a progress bar of the download to stderr
Returns
Network
Example
import torchvision.models.video as v_modelmodel = v_model.r3d_18(pretrained=True)print(model)
PyTorch 1.7 Video 初体验(Video Datasets,Video IO,Video Classification Models,Video Transform)相关推荐
- 【前端大屏实战1】Vue+Echarts -- 大屏简介初体验 => 大屏自适应缩放解决方案 => 使用transform:scale => 组件化抽离ScaleBox=>【两分钟实现大屏宽高等比例】
如果累了就冬眠吧,如果睡不着,就接纳暂时的失眠,不强迫自己入睡. 目录 一.大屏简介 1.数据可视化 2.大屏用途 3.大屏效果展示 二.大屏需求分类 1.固定尺寸的"真实"大屏 ...
- PyTorch学习笔记:PyTorch初体验
PyTorch学习笔记:PyTorch初体验 一.在Anaconda里安装PyTorch 1.进入虚拟环境mlcc 2.安装PyTorch 二.在PyTorch创建张量 1.启动mlcc环境下的Spy ...
- 我的Go+语言初体验——【三、spx案例测试_许式伟先生推荐补充(附-视频)】
欢迎大家参与[我的Go+语言初体验]活动: 活动地址:[https://bbs.csdn.net/topics/603464006?utm_source=1594742339] 安装过程博文:[我的G ...
- 我的Go+语言初体验——【三、spx案例测试(附-视频)】
欢迎大家参与[我的Go+语言初体验]活动: 活动地址:[https://bbs.csdn.net/topics/603464006?utm_source=1594742339] 安装过程博文:[我的G ...
- 乐鑫 AT 固件初体验 - ESP32
乐鑫 AT 固件初体验 前往乐鑫官网 下载最新版本 AT 固件和 AT 指令集手册. 硬件准备 本文使用乐鑫的 ESP-WROOM-32(ESP-WROOM-32 是 ESP32-WROOM-32 的 ...
- iOS7和Xcode5初体验(多图杀猫) -毛玻璃时代的来临
WWDC2013来了,一夜未眠.没有兴奋,没有惊喜,没有失望,只有落寞. 不管怎样,WWDC2013 keynote还是有一些亮点,这里说一下我个人感兴趣的几个内容. 首先是这次keynote上唯一的 ...
- (转)iOS6和Xcode4.5初体验-图多杀猫
iOS6和Xcode4.5初体验-图多杀猫 看了WWDC2012的发布会后,作为一个开发者,当然选择第一时间下载ios6beta和与之配套的Xcode4.5preview,当然,还有最新的iTunes ...
- 腾讯开源Android动画库,腾讯开源的酷炫动画播放解决方案Vap初体验
同事在群里有提到Vap,播放炫酷动画的,可以让动画背景透明,就去了解了下. 也可以看下面的视频播放效果(不动点击播放): 原本以为是直接弄个视频就可以播放. 后来查看官方案例,为了让动画背景有半透明特 ...
- 腾讯TRTC产品初体验(web)
腾讯TRTC产品初体验(web) 简介 登入腾讯云 代码编程 SDK 使用逻辑概览 实现音视频通话基本逻辑 下图展示了实现音视频通话全过程的基础 API 调用流程 创建 Client 对象 获取临时u ...
- Nvidia Jetson AGX Orin 初体验
一.开箱 Nvidia Jetson AGX Orin是今年Nvidia推出的唯一的开发套件,相比Jetson Nano 472GFLOP算力.Jetson Xaiver 32TOPS(INT8)算力 ...
最新文章
- 网站核心关键词一定要控制在五个之内更方便集中优化
- 2021年网购大数据:哪些品类坑最多?什么价位的商品最有保障?
- 从架构理解价值-我的软件世界观
- 轻松完成Birt报告
- Commons Configuration之一简介
- 面试题:长方形个数计算
- java string对象创建_Java问题解析:到底创建了几个String对象
- 北大学霸的超级学习术: 颠覆传统学习,效率轻松高10倍
- CentOS7 WordPress无法将上传的文件移动至wp-content/uploads/ ApacheNginx解决方案
- Mac下打印机提示保持以备鉴定无法进行打印
- 日系插画学习笔记(二):结构与透视
- 计算机网络实验三 rdt协议
- lr_think_time()
- Linux 指令学习之ping
- 以java语言为工具的粗糙的飞机大战游戏的开发与实现
- Spring之IOC的注入方式总结
- WebRTC初学Demo
- IDL数学分析与插值
- java计算机毕业设计消防安全应急培训管理平台源码+系统+数据库+lw文档+mybatis+运行部署
- 赛钛客Cyborg R.A.T.7高度自定游戏鼠标
热门文章
- 目标检测 YOLO 系列:快速迭代 YOLO v5
- sap批量创建盘点凭证以及盘点凭证过账
- poj2096(概率dp)
- 富爸爸实现财务自由七步骤
- 计算机系统指定文件类型,Win7系统下设置显示已知文件类型的扩展名
- html font设置字号,html font标签怎么设置字体大小
- 计算机休眠唤醒后没声音,MacBook Pro从睡眠模式中唤醒后突然没有声音
- 威漫哨兵机器人_曾经秒杀X战警的哨兵机器人,在漫威漫画原著里更让人头疼!...
- 关于PWM调速(基础篇)
- 计算机管理调整磁盘分区,win7系统硬盘分区调整方法图解