PaddlePaddle飞桨论文复现营——3D Residual Networks for Action Recognition学习笔记

1 背景知识

1.1 C3D

C3D是一种3D卷积提取视频特征的方法,从水平(X)、垂直(Y)和时序(T)三个方向三个方面同时提取视频的时空特征,其提取到的特征比常规的2D卷积更自然[1]。缺点就是,因为多了一维“时序(T)”,卷积后会引起参数剧增,对计算机GPU算力要求较大。

P.S: C3D源代码和预训练模型可参考:http://vlg.cs.dartmouth.edu/c3d/


通过上图,不难发现,相比VGG16或VGG19,C3D的卷积层相对较少,只有8层。此外,3D卷积核相比2D卷积核所多出的大量参数,在数据集不充足的情况下,网络易趋于过拟合。

1.2 Kinetics数据集

2017年5月22日,Deepmind团队发布了最具有影响力的视频分类数据集之一,目前,其包含大约650000个高质量视频链接,涵盖700个人类动作类,包括人与对象之间的交互(如演奏乐器)以及人与人之间的互动(如握手和拥抱)。每个动作类至少有600个视频剪辑。每个剪辑都是人工注释与单个动作类,并持续约10秒。
P.S: Kinetics地址:https://deepmind.com/research/open-source/kinetics

1.3 ResNet

更深的神经网络意味着更难的训练。因此,微软研究院何恺明等人在《Deep Residual Learning for Image Recognition》提出了一种减轻网络训练负担的残差学习框架(ResNet),这种网络比以前使用过的网络本质上层次更深。可理解为其显式地将层重新配置为参考层输入学习剩余函数,而不是学习未参考函数。他们通过提供了全面的经验证据,证明了这些残差网络更易于优化,并且可以通过深度的增加而获得准确性的提升。在ImageNet数据集上,深度最大为152层的残差网络-比VGG19网络还要深8倍,但复杂度仍然较低。这些残留网络的整体在ImageNet测试仪上实现了3.57%的误差。该结果在ILSVRC2015分类任务中获得第一名[2]。其框架如下图所示:

下图是ResNet在CIFAR-10测试集上的分类错误情况

2 研究方法

2.1 3D ResNets简述

在当时的研究背景下,最新的大型视频数据集(如,Kinetics),虽然能大幅度地改善过拟合的情况,但是,相对于优秀的2D网络(如,ResNet)而言,C3D的网络缺乏深度。因此,作者的团队就基于ResNet提出了3D ResNets这个网络架构。

2.2 3D ResNet网络架构

2.2.1 Residual block——残差块


ResNets引入了捷径连接(shortcut connections),可以绕过一层到另一层的标记。这些连接通过网络的梯度流从较后层到较早层,并简化了非常深层网络的训练。上图(Figure 1)展示了残差块的结构,它是ResNets的一个元素。连接从块的顶部到尾部绕过标记。ResNets由多个残差块组成。

2.2.2 Network Architecture——网络架构


通过Table1的3D ResNets网络架构图不难发现,3D ResNets与原始ResNets的区别在于卷积核(convolutional kernels)和池化层(pooling)的维数。3DResNet执行3D卷积和3D池化。卷积核的大小为“ 3 × 3 × 3”,并且conv1的时间步长(stride)为1,类似于C3D。网络使用16帧RGB剪辑作为输入。输入剪辑的大小为“3 × 16 × 112 × 112”。残差块显示在Table1的括号中。 每个卷积层之后是Batch Normalization(BN,对每批数据进行归一化处理)和relu(激活函数)。 输入的下采样由步长(stride)为2的conv3_1,conv4_1,conv5_1执行。当特征图(feature maps)的数量增加时,作者采用零填充的身份快捷方式来避免增加参数数量。框架的最后一层是为Kinetics数据集(400个类别)设置的最后一个完全连接层,其输出函数是softmax(将输出值限定在0~1之间的一个概率值)。

2.2.3 Training——训练

作者使用带有动量的随机梯度下降(SGD)来训练3D ResNet的网络,通过从训练数据中的视频中随机生成训练样本以执行数据增强。主要内容如下:

  1. 通过均匀采样选择每个样本的时间位置(temporal positions)。
  2. 在选定的时间位置(temporal positions)周围生成16帧剪辑。如若视频少于16帧,则将对视频进行必要的多次循环。
  3. 从4个角或中心随机选择空间位置。
  4. 对每个样本的空间尺度进行多尺度裁剪,尺度选自 { 1 , 1 2 1 / 4 , 1 2 , 1 2 1 / 4 , 1 2 } \left\{1, \frac{1}{2^{1 / 4}}, \frac{1}{\sqrt{2}}, \frac{1}{2^{1 / 4}}, \frac{1}{2}\right\} {1,21/41​,2 ​1​,21/41​,21​},其中1为最大尺度。裁剪框的长宽比为1。生成的样本水平翻转的概率为50%。
  5. 对每个样本进行均值减法运算。所有生成的样本均与其原始视频具有相同的类别标签。

作者在训练时,学习率现(lr)先是设定为0.1,然后,当学习率降到0.0001后,验证损失达到饱和。较大的lr和batch对于实现良好的识别性能尤为重要。

2.2.4 Recognition——识别

首先,使用训练好的模型来识别视频中的动作。训练过程中,每个剪辑(每个视频被分成不重叠的16帧剪辑。)都以最大比例围绕中心位置进行裁剪。使用经过训练的模型估算每个剪辑的类别概率,并将它们平均化到视频的所有剪辑中,以识别视频中的动作。

2.2.5 Dataset——数据集

此实验中,作者使用了ActivityNet(v1.3)和Kinetics数据集。

ActivityNet数据集提供了200个人类动作类别的样本,每个类别平均有137个未修剪的视频,每个视频的Activity Instances(活动实例)为1.41。视频总时长为849小时,Activity Instances(活动实例)的总数为28108。数据集随机分为三个子集:训练,验证和测试,其中50%用于训练,25%用于验证和25%用于测试。

2017年,Kinetics数据集刚发布时,其提供了400个人类动作类别的样本,并且每个类别包含400个(或更多)的视频。视频在时间上进行了修剪,因此它们不包含非动作帧,并且持续约10秒钟。视频总数为300,000或更多。训练,验证和测试集的数量分别约为240,000、20,000和40,000。Kinetics的Activity Instances(活动实例)数量是ActivityNet的Activity Instances(活动实例)数量的10倍,而两个数据集的总视频长度却很接近。

对于这两个数据集,作者都其将视频的大小调整为360像素高度,而未更改其宽高比,并将其存储。

3 研究成果

3.1 基于ActivityNet数据集的初步实验结果


此实验的目的是在相对较小的数据集上探索3D ResNet的训练效果。在此实验中,作者的团队训练了Table1中所述的18层的3D ResNet和Sports-1M预先训练的C3D。观察Figure2,可以发现18层的3D ResNet出现了过拟合,因此其验证精度明显低于训练精度。相比之下,经过Sports-1M预训练的C3D没有出现过拟合,并且获得了更好的识别精度。

3.2 基于Kinetics数据集的实验结果


此实验中,作者的团队训练了34层的3D ResNet而不是18层的3D ResNet,因为Kinetics的活动实例数量明显大于ActivityNet的活动实例数量。Figure3显示了34层的3D ResNet不会过拟合并获得良好的性能。如Figure1(b)所示,Sports-1M预训练的C3D也实现了良好的验证准确性,但是,它的训练精度明显低于验证精度。


Table2显示了34层的3D ResNet和同时期较新技术的准确性。34层的3D ResNet的精度高于Sports-1M预先训练的C3D和C3D,并且具有从头开始训练的Batch Normalization(BN,对每批数据进行归一化处理)。该结果证明了3DResNet的有效性。但是,深度数小于34层的3D ResNet的RGB-I3D达到了最佳性能却在此实验中表现优异,其原因可能是,训练RGB-I3D时,使用了32 个 GPU,而训练34层的3D ResNet,只使用了4个256批处理大小的GPU。由于 GPU 内存限制,3D ResNet 的大小为“3 × 16 × 112 × 112”,而 RGB-I3D 的大小为“3 × 64 × 224 × 224”。高空间分辨率和较长的持续时间可提高识别精度。因此,使用大量GPU并增加批处理大小,空间分辨率和时间持续时间可能会进一步实现3D ResNets 的改进。

4 Conclusion——结论

作者及其团队创新性地提出了3D卷积内核及3D池化层的概念,并据此结合一系列实验来探索论证了ResNets在视频分类领域的有效性(尤其是在大数据集的环境下)[3]

5 源码简析

源码参考地址:https://github.com/kenshohara/3D-ResNets

5.1 training.py

import torch # 通过paddlepaddle实现时,此处需修改为对应的paddlepaddle包
import time
import os
import sysimport torch  # 通过paddlepaddle实现时,此处需修改为对应的paddlepaddle包
import torch.distributed as dist    # 通过paddlepaddle实现时,此处需修改为对应的paddlepaddle包from utils import AverageMeter, calculate_accuracydef train_epoch(epoch,   # 训练轮次data_loader,model,criterion,optimizer,device,current_lr,epoch_logger,batch_logger,tb_writer=None,distributed=False):print('train at epoch {}'.format(epoch))model.train()batch_time = AverageMeter()data_time = AverageMeter()losses = AverageMeter()accuracies = AverageMeter()end_time = time.time()for i, (inputs, targets) in enumerate(data_loader):data_time.update(time.time() - end_time)targets = targets.to(device, non_blocking=True)outputs = model(inputs)loss = criterion(outputs, targets)    # 损失值计算acc = calculate_accuracy(outputs, targets)  # 准确率计算losses.update(loss.item(), inputs.size(0))   # 损失值更新accuracies.update(acc, inputs.size(0))   # 准确率更新optimizer.zero_grad()loss.backward()optimizer.step()batch_time.update(time.time() - end_time)end_time = time.time()if batch_logger is not None:batch_logger.log({'epoch': epoch,'batch': i + 1,'iter': (epoch - 1) * len(data_loader) + (i + 1),'loss': losses.val,'acc': accuracies.val,'lr': current_lr})print('Epoch: [{0}][{1}/{2}]\t'                             # 打印运行日志'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t''Data {data_time.val:.3f} ({data_time.avg:.3f})\t''Loss {loss.val:.4f} ({loss.avg:.4f})\t''Acc {acc.val:.3f} ({acc.avg:.3f})'.format(epoch,i + 1,len(data_loader),batch_time=batch_time,data_time=data_time,loss=losses,acc=accuracies))if distributed:loss_sum = torch.tensor([losses.sum],dtype=torch.float32,device=device)loss_count = torch.tensor([losses.count],dtype=torch.float32,device=device)acc_sum = torch.tensor([accuracies.sum],dtype=torch.float32,device=device)acc_count = torch.tensor([accuracies.count],dtype=torch.float32,device=device)dist.all_reduce(loss_sum, op=dist.ReduceOp.SUM)dist.all_reduce(loss_count, op=dist.ReduceOp.SUM)dist.all_reduce(acc_sum, op=dist.ReduceOp.SUM)dist.all_reduce(acc_count, op=dist.ReduceOp.SUM)losses.avg = loss_sum.item() / loss_count.item()accuracies.avg = acc_sum.item() / acc_count.item()if epoch_logger is not None:epoch_logger.log({'epoch': epoch,'loss': losses.avg,'acc': accuracies.avg,'lr': current_lr})if tb_writer is not None:tb_writer.add_scalar('train/loss', losses.avg, epoch)tb_writer.add_scalar('train/acc', accuracies.avg, epoch)tb_writer.add_scalar('train/lr', accuracies.avg, epoch)

5.2 main.py

from pathlib import Path
import json
import random
import osimport numpy as np
import torch
from torch.nn import CrossEntropyLoss
from torch.optim import SGD, lr_scheduler
import torch.multiprocessing as mp
import torch.distributed as dist
from torch.backends import cudnn
import torchvisionfrom opts import parse_opts
from model import (generate_model, load_pretrained_model, make_data_parallel,get_fine_tuning_parameters)
from mean import get_mean_std
from spatial_transforms import (Compose, Normalize, Resize, CenterCrop,CornerCrop, MultiScaleCornerCrop,RandomResizedCrop, RandomHorizontalFlip,ToTensor, ScaleValue, ColorJitter,PickFirstChannels)
from temporal_transforms import (LoopPadding, TemporalRandomCrop,TemporalCenterCrop, TemporalEvenCrop,SlidingWindow, TemporalSubsampling)
from temporal_transforms import Compose as TemporalCompose
from dataset import get_training_data, get_validation_data, get_inference_data
from utils import Logger, worker_init_fn, get_lr
from training import train_epoch
from validation import val_epoch
import inferencedef json_serial(obj):if isinstance(obj, Path):return str(obj)def get_opt():opt = parse_opts()if opt.root_path is not None:opt.video_path = opt.root_path / opt.video_pathopt.annotation_path = opt.root_path / opt.annotation_pathopt.result_path = opt.root_path / opt.result_pathif opt.resume_path is not None:opt.resume_path = opt.root_path / opt.resume_pathif opt.pretrain_path is not None:opt.pretrain_path = opt.root_path / opt.pretrain_pathif opt.pretrain_path is not None:opt.n_finetune_classes = opt.n_classesopt.n_classes = opt.n_pretrain_classesif opt.output_topk <= 0:opt.output_topk = opt.n_classesif opt.inference_batch_size == 0:opt.inference_batch_size = opt.batch_sizeopt.arch = '{}-{}'.format(opt.model, opt.model_depth)opt.begin_epoch = 1opt.mean, opt.std = get_mean_std(opt.value_scale, dataset=opt.mean_dataset)opt.n_input_channels = 3if opt.input_type == 'flow':opt.n_input_channels = 2opt.mean = opt.mean[:2]opt.std = opt.std[:2]if opt.distributed:opt.dist_rank = int(os.environ["OMPI_COMM_WORLD_RANK"])if opt.dist_rank == 0:print(opt)with (opt.result_path / 'opts.json').open('w') as opt_file:json.dump(vars(opt), opt_file, default=json_serial)else:print(opt)with (opt.result_path / 'opts.json').open('w') as opt_file:json.dump(vars(opt), opt_file, default=json_serial)return opt# 恢复已保存的模型
def resume_model(resume_path, arch, model):print('loading checkpoint {} model'.format(resume_path))checkpoint = torch.load(resume_path, map_location='cpu')assert arch == checkpoint['arch']if hasattr(model, 'module'):model.module.load_state_dict(checkpoint['state_dict'])else:model.load_state_dict(checkpoint['state_dict'])return modeldef resume_train_utils(resume_path, begin_epoch, optimizer, scheduler):print('loading checkpoint {} train utils'.format(resume_path))checkpoint = torch.load(resume_path, map_location='cpu')begin_epoch = checkpoint['epoch'] + 1if optimizer is not None and 'optimizer' in checkpoint:optimizer.load_state_dict(checkpoint['optimizer'])if scheduler is not None and 'scheduler' in checkpoint:scheduler.load_state_dict(checkpoint['scheduler'])return begin_epoch, optimizer, scheduler# 数据标准化、归一化方法
def get_normalize_method(mean, std, no_mean_norm, no_std_norm):if no_mean_norm:if no_std_norm:return Normalize([0, 0, 0], [1, 1, 1])else:return Normalize([0, 0, 0], std)else:if no_std_norm:return Normalize(mean, [1, 1, 1])else:return Normalize(mean, std)def get_train_utils(opt, model_parameters):assert opt.train_crop in ['random', 'corner', 'center']spatial_transform = []if opt.train_crop == 'random':spatial_transform.append(RandomResizedCrop(opt.sample_size, (opt.train_crop_min_scale, 1.0),(opt.train_crop_min_ratio, 1.0 / opt.train_crop_min_ratio)))elif opt.train_crop == 'corner':scales = [1.0]scale_step = 1 / (2**(1 / 4))for _ in range(1, 5):scales.append(scales[-1] * scale_step)spatial_transform.append(MultiScaleCornerCrop(opt.sample_size, scales))elif opt.train_crop == 'center':spatial_transform.append(Resize(opt.sample_size))spatial_transform.append(CenterCrop(opt.sample_size))normalize = get_normalize_method(opt.mean, opt.std, opt.no_mean_norm,opt.no_std_norm)if not opt.no_hflip:spatial_transform.append(RandomHorizontalFlip())if opt.colorjitter:spatial_transform.append(ColorJitter())spatial_transform.append(ToTensor())if opt.input_type == 'flow':spatial_transform.append(PickFirstChannels(n=2))spatial_transform.append(ScaleValue(opt.value_scale))spatial_transform.append(normalize)spatial_transform = Compose(spatial_transform)assert opt.train_t_crop in ['random', 'center']temporal_transform = []if opt.sample_t_stride > 1:temporal_transform.append(TemporalSubsampling(opt.sample_t_stride))if opt.train_t_crop == 'random':temporal_transform.append(TemporalRandomCrop(opt.sample_duration))elif opt.train_t_crop == 'center':temporal_transform.append(TemporalCenterCrop(opt.sample_duration))temporal_transform = TemporalCompose(temporal_transform)train_data = get_training_data(opt.video_path, opt.annotation_path,opt.dataset, opt.input_type, opt.file_type,spatial_transform, temporal_transform)if opt.distributed:train_sampler = torch.utils.data.distributed.DistributedSampler(train_data)else:train_sampler = Nonetrain_loader = torch.utils.data.DataLoader(train_data,batch_size=opt.batch_size,shuffle=(train_sampler is None),num_workers=opt.n_threads,pin_memory=True,sampler=train_sampler,worker_init_fn=worker_init_fn)if opt.is_master_node:train_logger = Logger(opt.result_path / 'train.log',['epoch', 'loss', 'acc', 'lr'])train_batch_logger = Logger(opt.result_path / 'train_batch.log',['epoch', 'batch', 'iter', 'loss', 'acc', 'lr'])else:train_logger = Nonetrain_batch_logger = Noneif opt.nesterov:dampening = 0else:dampening = opt.dampeningoptimizer = SGD(model_parameters,lr=opt.learning_rate,momentum=opt.momentum,dampening=dampening,weight_decay=opt.weight_decay,nesterov=opt.nesterov)assert opt.lr_scheduler in ['plateau', 'multistep']assert not (opt.lr_scheduler == 'plateau' and opt.no_val)if opt.lr_scheduler == 'plateau':scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=opt.plateau_patience)else:scheduler = lr_scheduler.MultiStepLR(optimizer,opt.multistep_milestones)return (train_loader, train_sampler, train_logger, train_batch_logger,optimizer, scheduler)def get_val_utils(opt):normalize = get_normalize_method(opt.mean, opt.std, opt.no_mean_norm,opt.no_std_norm)spatial_transform = [Resize(opt.sample_size),CenterCrop(opt.sample_size),ToTensor()]if opt.input_type == 'flow':spatial_transform.append(PickFirstChannels(n=2))spatial_transform.extend([ScaleValue(opt.value_scale), normalize])spatial_transform = Compose(spatial_transform)temporal_transform = []if opt.sample_t_stride > 1:temporal_transform.append(TemporalSubsampling(opt.sample_t_stride))temporal_transform.append(TemporalEvenCrop(opt.sample_duration, opt.n_val_samples))temporal_transform = TemporalCompose(temporal_transform)val_data, collate_fn = get_validation_data(opt.video_path,opt.annotation_path, opt.dataset,opt.input_type, opt.file_type,spatial_transform,temporal_transform)if opt.distributed:val_sampler = torch.utils.data.distributed.DistributedSampler(val_data, shuffle=False)else:val_sampler = Noneval_loader = torch.utils.data.DataLoader(val_data,batch_size=(opt.batch_size //opt.n_val_samples),shuffle=False,num_workers=opt.n_threads,pin_memory=True,sampler=val_sampler,worker_init_fn=worker_init_fn,collate_fn=collate_fn)if opt.is_master_node:val_logger = Logger(opt.result_path / 'val.log',['epoch', 'loss', 'acc'])else:val_logger = Nonereturn val_loader, val_loggerdef get_inference_utils(opt):assert opt.inference_crop in ['center', 'nocrop']normalize = get_normalize_method(opt.mean, opt.std, opt.no_mean_norm,opt.no_std_norm)spatial_transform = [Resize(opt.sample_size)]if opt.inference_crop == 'center':spatial_transform.append(CenterCrop(opt.sample_size))spatial_transform.append(ToTensor())if opt.input_type == 'flow':spatial_transform.append(PickFirstChannels(n=2))spatial_transform.extend([ScaleValue(opt.value_scale), normalize])spatial_transform = Compose(spatial_transform)temporal_transform = []if opt.sample_t_stride > 1:temporal_transform.append(TemporalSubsampling(opt.sample_t_stride))temporal_transform.append(SlidingWindow(opt.sample_duration, opt.inference_stride))temporal_transform = TemporalCompose(temporal_transform)inference_data, collate_fn = get_inference_data(opt.video_path, opt.annotation_path, opt.dataset, opt.input_type,opt.file_type, opt.inference_subset, spatial_transform,temporal_transform)inference_loader = torch.utils.data.DataLoader(inference_data,batch_size=opt.inference_batch_size,shuffle=False,num_workers=opt.n_threads,pin_memory=True,worker_init_fn=worker_init_fn,collate_fn=collate_fn)return inference_loader, inference_data.class_namesdef save_checkpoint(save_file_path, epoch, arch, model, optimizer, scheduler):if hasattr(model, 'module'):model_state_dict = model.module.state_dict()else:model_state_dict = model.state_dict()save_states = {'epoch': epoch,'arch': arch,'state_dict': model_state_dict,'optimizer': optimizer.state_dict(),'scheduler': scheduler.state_dict()}torch.save(save_states, save_file_path)def main_worker(index, opt):random.seed(opt.manual_seed)np.random.seed(opt.manual_seed)torch.manual_seed(opt.manual_seed)if index >= 0 and opt.device.type == 'cuda':opt.device = torch.device(f'cuda:{index}')if opt.distributed:opt.dist_rank = opt.dist_rank * opt.ngpus_per_node + indexdist.init_process_group(backend='nccl',init_method=opt.dist_url,world_size=opt.world_size,rank=opt.dist_rank)opt.batch_size = int(opt.batch_size / opt.ngpus_per_node)opt.n_threads = int((opt.n_threads + opt.ngpus_per_node - 1) / opt.ngpus_per_node)opt.is_master_node = not opt.distributed or opt.dist_rank == 0model = generate_model(opt) # 调用模型if opt.batchnorm_sync:assert opt.distributed, 'SyncBatchNorm only supports DistributedDataParallel.'model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)if opt.pretrain_path:model = load_pretrained_model(model, opt.pretrain_path, opt.model,opt.n_finetune_classes)if opt.resume_path is not None:model = resume_model(opt.resume_path, opt.arch, model)model = make_data_parallel(model, opt.distributed, opt.device)if opt.pretrain_path:parameters = get_fine_tuning_parameters(model, opt.ft_begin_module)else:parameters = model.parameters()if opt.is_master_node:print(model)criterion = CrossEntropyLoss().to(opt.device)if not opt.no_train:(train_loader, train_sampler, train_logger, train_batch_logger,optimizer, scheduler) = get_train_utils(opt, parameters)if opt.resume_path is not None:opt.begin_epoch, optimizer, scheduler = resume_train_utils(opt.resume_path, opt.begin_epoch, optimizer, scheduler)if opt.overwrite_milestones:scheduler.milestones = opt.multistep_milestonesif not opt.no_val:val_loader, val_logger = get_val_utils(opt)if opt.tensorboard and opt.is_master_node:from torch.utils.tensorboard import SummaryWriterif opt.begin_epoch == 1:tb_writer = SummaryWriter(log_dir=opt.result_path)else:tb_writer = SummaryWriter(log_dir=opt.result_path,purge_step=opt.begin_epoch)else:tb_writer = Noneprev_val_loss = Nonefor i in range(opt.begin_epoch, opt.n_epochs + 1):if not opt.no_train:if opt.distributed:train_sampler.set_epoch(i)current_lr = get_lr(optimizer)train_epoch(i, train_loader, model, criterion, optimizer,opt.device, current_lr, train_logger,train_batch_logger, tb_writer, opt.distributed)if i % opt.checkpoint == 0 and opt.is_master_node:save_file_path = opt.result_path / 'save_{}.pth'.format(i)save_checkpoint(save_file_path, i, opt.arch, model, optimizer,scheduler)if not opt.no_val:prev_val_loss = val_epoch(i, val_loader, model, criterion,opt.device, val_logger, tb_writer,opt.distributed)if not opt.no_train and opt.lr_scheduler == 'multistep':scheduler.step()elif not opt.no_train and opt.lr_scheduler == 'plateau':scheduler.step(prev_val_loss)if opt.inference:inference_loader, inference_class_names = get_inference_utils(opt)inference_result_path = opt.result_path / '{}.json'.format(opt.inference_subset)inference.inference(inference_loader, model, inference_result_path,inference_class_names, opt.inference_no_average,opt.output_topk)if __name__ == '__main__':opt = get_opt()opt.device = torch.device('cpu' if opt.no_cuda else 'cuda')if not opt.no_cuda:cudnn.benchmark = Trueif opt.accimage:torchvision.set_image_backend('accimage')opt.ngpus_per_node = torch.cuda.device_count()if opt.distributed:opt.world_size = opt.ngpus_per_node * opt.world_sizemp.spawn(main_worker, nprocs=opt.ngpus_per_node, args=(opt,))else:main_worker(-1, opt)

5.3 model.py

import torch
from torch import nn# 从models目录中导入resnet, resnet2p1d, pre_act_resnet, wide_resnet, resnext, densenet等网络
from models import resnet, resnet2p1d, pre_act_resnet, wide_resnet, resnext, densenet def get_module_name(name):name = name.split('.')if name[0] == 'module':i = 1else:i = 0if name[i] == 'features':i += 1return name[i]def get_fine_tuning_parameters(model, ft_begin_module):if not ft_begin_module:return model.parameters()parameters = []add_flag = Falsefor k, v in model.named_parameters():if ft_begin_module == get_module_name(k):add_flag = Trueif add_flag:parameters.append({'params': v})return parameters# 结合已有网络构造模型
def generate_model(opt):assert opt.model in ['resnet', 'resnet2p1d', 'preresnet', 'wideresnet', 'resnext', 'densenet']if opt.model == 'resnet':model = resnet.generate_model(model_depth=opt.model_depth,n_classes=opt.n_classes,n_input_channels=opt.n_input_channels,shortcut_type=opt.resnet_shortcut,conv1_t_size=opt.conv1_t_size,conv1_t_stride=opt.conv1_t_stride,no_max_pool=opt.no_max_pool,widen_factor=opt.resnet_widen_factor)elif opt.model == 'resnet2p1d':model = resnet2p1d.generate_model(model_depth=opt.model_depth,n_classes=opt.n_classes,n_input_channels=opt.n_input_channels,shortcut_type=opt.resnet_shortcut,conv1_t_size=opt.conv1_t_size,conv1_t_stride=opt.conv1_t_stride,no_max_pool=opt.no_max_pool,widen_factor=opt.resnet_widen_factor)elif opt.model == 'wideresnet':model = wide_resnet.generate_model(model_depth=opt.model_depth,k=opt.wide_resnet_k,n_classes=opt.n_classes,n_input_channels=opt.n_input_channels,shortcut_type=opt.resnet_shortcut,conv1_t_size=opt.conv1_t_size,conv1_t_stride=opt.conv1_t_stride,no_max_pool=opt.no_max_pool)elif opt.model == 'resnext':model = resnext.generate_model(model_depth=opt.model_depth,cardinality=opt.resnext_cardinality,n_classes=opt.n_classes,n_input_channels=opt.n_input_channels,shortcut_type=opt.resnet_shortcut,conv1_t_size=opt.conv1_t_size,conv1_t_stride=opt.conv1_t_stride,no_max_pool=opt.no_max_pool)elif opt.model == 'preresnet':model = pre_act_resnet.generate_model(model_depth=opt.model_depth,n_classes=opt.n_classes,n_input_channels=opt.n_input_channels,shortcut_type=opt.resnet_shortcut,conv1_t_size=opt.conv1_t_size,conv1_t_stride=opt.conv1_t_stride,no_max_pool=opt.no_max_pool)elif opt.model == 'densenet':model = densenet.generate_model(model_depth=opt.model_depth,n_classes=opt.n_classes,n_input_channels=opt.n_input_channels,conv1_t_size=opt.conv1_t_size,conv1_t_stride=opt.conv1_t_stride,no_max_pool=opt.no_max_pool)return model# 载入预训练模型
def load_pretrained_model(model, pretrain_path, model_name, n_finetune_classes):if pretrain_path:print('loading pretrained model {}'.format(pretrain_path))pretrain = torch.load(pretrain_path, map_location='cpu')model.load_state_dict(pretrain['state_dict'])tmp_model = modelif model_name == 'densenet':tmp_model.classifier = nn.Linear(tmp_model.classifier.in_features,n_finetune_classes)else:tmp_model.fc = nn.Linear(tmp_model.fc.in_features,n_finetune_classes)return modeldef make_data_parallel(model, is_distributed, device):if is_distributed:if device.type == 'cuda' and device.index is not None:torch.cuda.set_device(device)model.to(device)model = nn.parallel.DistributedDataParallel(model,device_ids=[device])else:model.to(device)model = nn.parallel.DistributedDataParallel(model)elif device.type == 'cuda':model = nn.DataParallel(model, device_ids=None).cuda()return model

参考文献:

[1]. Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., & Paluri, M. (2014). C3D: Generic Features for Video Analysis. ArXiv, abs/1412.0767.
[2]. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.
[3]. Hara, K., Kataoka, H., & Satoh, Y. (2017). Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 3154-3160.

P.S: Last but not least

首先,感谢百度飞桨提供这次机会,请来一批非常优秀的老师免费带领我们学习如何复现顶会论文,同时提供了宝贵的免费GPU算力让我们用于代码优化、调参等实践。第一次参加顶会论文的学习,还有很多都是在摸索中,如果有大佬理解第5小节的代码,或者关于这些代码有着自己的一些想法,欢迎在评论区留言哦~

如果你也对深度学习感兴趣,也想掌握前沿论文的阅读能力,欢迎你一起加入百度顶会论文复现营学习!

课程链接:https://aistudio.baidu.com/aistudio/education/group/info/1340

PaddlePaddle飞桨论文复现营——3D Residual Networks for Action Recognition学习笔记相关推荐

  1. 飞桨论文复现营 CFDGCN-Paddle

    ★★★ 本文源自AI Studio社区精品项目,[点击此处]查看更多精品内容 >>> 飞桨论文复现营 科学计算 Combining Differentiable PDE Solver ...

  2. 【视频分类论文阅读】Two-Stream Convolutional Networks for Action Recognition in Videos

    论文是视频分类的开山之作,采用了一个双流网络,是空间流和事件流共同组成的,网络的具体实现都是CNN,空间流的输入是静止的图片,来获取物体形状大小等appearance信息,时间流的输入是多个从两帧之间 ...

  3. 组队瓜分百万奖金池,资深算法工程师带你挑战飞桨论文复现赛!

    你是否正在焦虑找不到好的论文? 好不容易找到了paper,无法复现出code? 缺少科研同行交流,只能独自一人闭门造车? 是的,论文复现是要想最快的学习和了解AI领域的方式,复现困境也被叫做" ...

  4. 【飞桨论文复现挑战赛(第七期)】InvDN:基于可逆网络的真实噪声去除

    InvDN Invertible Denoising Network: A Light Solution for Real Noise Removal (CVPR 2021) 论文复现 官方源码:ht ...

  5. 飞桨论文复现课笔记(论文复现步骤)

    目录 一.读论文 二.论文代码解读 1. ReadMe 2. 代码结构 3. 核心代码 三.论文复现 四.评估模型 五.小结 一篇论文的复现步骤如下 一.读论文 一篇论文的abstract.appli ...

  6. 百度飞桨论文复现训练营笔记1

    需要复现的论文涉及到torch模型到paddle模型的转换,才开始了解到百度智能云平台.在过去我对它的认识仅集中在调用人家写好的接口实现功能,这次的任务是通过学习训练营课程去复现一篇论文,其中就涉及到 ...

  7. 飞桨AI创造营2期-t3-数据处理

    飞桨AI创造营2期-数据处理 (Datawhale34期组队学习) 文章目录 飞桨AI创造营2期-数据处理 1知识点 2具体内容 2.1AI Studio Notebook命令 2.2Numpy基础 ...

  8. 【论文模型讲解】Two-Stream Convolutional Networks for Action Recognition in Videos

    文章目录 前言 0 摘要 1 Introduction 1.1 相关工作 2 用于视频识别的双流结构 3 光流卷积神经网络(时间流) 3.1 ConvNet 输入配置 4 实现细节 4.1 测试 4. ...

  9. 【PaddlePaddle论文复现营】Temporal Pyramid Network for Action Recognition

    [PaddlePaddle论文复现营]Temporal Pyramid Network for Action Recognition 写在前面的话 论文简介 从视频分类领域中的一个痛点谈起 相关工作 ...

最新文章

  1. C#关于值类型和引用类型的备忘
  2. LINQ系列:Linq to Object分区操作符
  3. Httprunner生成Allure格式HTML报告
  4. 小程序-wx:for
  5. java数组拼字符串_java数组、字符串拼接
  6. cmd上运行java程序遇到的问题(找不到或无法加载主类)
  7. 程序设计课程技巧小总结
  8. 对STL的string进行格式化输出
  9. cuDNN编写卷积实例
  10. 探讨一个好算法——找出一百万个数字中十个最大数字的算法
  11. 九度OJ 题目1534:数组中第K小的数字(二分解)
  12. 数学动态规划:期望DP
  13. 【MATLAB统计分析与应用100例】案例010:matlab调用normrnd函数生成正态分布随机数
  14. SpringBoot + iframe 前后端实现简单实用的下载文件、导出excel案例
  15. 不会做PPT图表?1000个高大上的PPT图表,0门槛0套路,想要就给你
  16. srm 592 div 2
  17. 模型转换:pth转onnx
  18. 数字图像处理第五章笔记
  19. 实验四 手写数字识别的神经网络算法设计与实现
  20. WF2011 Chips Challenge

热门文章

  1. GANSS ALT71D键盘使用说明
  2. 删除共享文件凭据脚本
  3. 浅谈在软件开发中的开发与测试 - 下
  4. Win11怎么添加信任软件?Win11怎么把软件添加进白名单?
  5. nRF52832 定时器REPEATED模式,导致异常重启的问题排查全过程
  6. HMMER批量比对及结果处理
  7. Java的Integer和Integer比较相等
  8. Bat 批处理之 for/f 详解
  9. 常见ant命令及其用法
  10. 画了张图,总结了机房里AAU、RRU以及各模块之间的走线关系和线束规格类型