Fater RCNN 试着加入注意力机制模型
最近一直状态不好,从什么时候开始的呢,自己也忘啦,积极的调整和永远的相信自己可以~废话不多说
一、源码中给出的resnet50_fpn_backbone,解析
1.backbone的body层,也就是resnet层提取的输出
Resnet中的基本组成单元residual结构,分为左右两种,50用的是后面一种bottleneck结构50 101 152的区别其实就是每组layer里面bottleneck的个数不同。
class ResNet(nn.Module):def __init__(self, block, blocks_num, num_classes=1000, include_top=True, norm_layer=None):super(ResNet, self).__init__()if norm_layer is None:norm_layer = nn.BatchNorm2dself._norm_layer = norm_layerself.include_top = include_topself.in_channel = 64#通过Maxpooling之后的得到的特征矩阵的深度self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,padding=3, bias=False)#224*224*3 -> 112*112*64self.bn1 = norm_layer(self.in_channel)self.relu = nn.ReLU(inplace=True)self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)#56*56*64self.layer1 = self._make_layer(block, 64, blocks_num[0])self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)#一张3*224*224的图像经过layer4之后会变为2048*7*7if self.include_top:self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1)self.fc = nn.Linear(512 * block.expansion, num_classes)for m in self.modules():if isinstance(m, nn.Conv2d):nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')#Bottleneck, block_num = [3, 4, 6, 3],channel代表该层的第一个conv的输出通道数,channel * block.expansion代表该层的输出通道数,stride代表该总的layer是否会stridedef _make_layer(self, block, channel, block_num, stride=1):norm_layer = self._norm_layer#batch normdownsample = None#self.in_channel代表每层输入的通道数,channel * block.expansion就是该层输出通道数if stride != 1 or self.in_channel != channel * block.expansion:downsample = nn.Sequential(nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),norm_layer(channel * block.expansion))layers = []layers.append(block(self.in_channel, channel, downsample=downsample,stride=stride, norm_layer=norm_layer))self.in_channel = channel * block.expansionfor _ in range(1, block_num):layers.append(block(self.in_channel, channel, norm_layer=norm_layer))return nn.Sequential(*layers)def forward(self, x):x = self.conv1(x)x = self.bn1(x)x = self.relu(x)x = self.maxpool(x)x = self.layer1(x)x = self.layer2(x)x = self.layer3(x)x = self.layer4(x)if self.include_top:x = self.avgpool(x)x = torch.flatten(x, 1)#将索引为 start_dim 和 end_dim 之间(包括该位置)的数量相乘,其余位置不变。因为默认 start_dim=0,end_dim=-1,所以 torch.flatten(t) 返回只有一维的数据x = self.fc(x)return x
resnet_backbone = ResNet(Bottleneck, [3, 4, 6, 3],#50Layer得Resnetinclude_top=False)
从源码可以看到它的结构,[3,4,6,3]从而选择了为50层的结构。这里和之前单层baakbone不同的是,因为要的是多层输出,所以returnlayers这个字典会有多组值。
return_layers = {'layer1': '0', 'layer2': '1', 'layer3': '2', 'layer4': '3'}#这个是用来告诉要提取哪些层的输出
然后这里又构造了一个类,类不重要,里面的第一步分内容是这个,
body = IntermediateLayerGetter(backbone, return_layers=return_layers)#类似于pytoch自带的create_feature_extractor,但是这个只能定位到子模块第一层
其实和上节课的create_feature_extractor函数作用差不多。这样我们就提取到了resnet50的四个特征层的输出。
2.backbone的fpn层层,也就是resnet层提取的输出之后,进行特征融合和backbone的最后输出
如图,也就是初始化函数中8个卷积层,forward()函数中再加3次上采样和一次Maxpool.
经过fpn的输出其实和body输出一样是字典的形式,只不过多了一个"pool"
FPN的八个卷积核,左边四个的输入输出,输入为in_channels_list,输出只有一个为
out_channels,后面四个卷积核输入输出都为out_channels。
3.将得到的baakbone作为参数传入FasterRCNN作为其中一个形参创建FasterRCNN模型。
model = FasterRCNN(backbone=backbone, num_classes=21)
二、换MobileNet V3+FPN(MBCONV)
1.MBCONV模块
2.代码第一部分:截取主干网络
monile_v3_backbone = torchvision.models.mobilenet_v3_large()return_layers = {"features.6": "0" , "features.12": "1" , "features.16": "2"}monile_v3_backbone = create_feature_extractor(monile_v3_backbone, return_nodes=return_layers)
InvertedResidualConfig模块:
class InvertedResidualConfig:# Stores information listed at Tables 1 and 2 of the MobileNetV3 paperdef __init__(self, input_channels: int, kernel: int, expanded_channels: int, out_channels: int, use_se: bool,activation: str, stride: int, dilation: int, width_mult: float):self.input_channels = self.adjust_channels(input_channels, width_mult)self.kernel = kernelself.expanded_channels = self.adjust_channels(expanded_channels, width_mult)self.out_channels = self.adjust_channels(out_channels, width_mult)self.use_se = use_seself.use_hs = activation == "HS"self.stride = strideself.dilation = dilation@staticmethoddef adjust_channels(channels: int, width_mult: float):return _make_divisible(channels * width_mult, 8)
class InvertedResidual(nn.Module):# Implemented as described at section 5 of MobileNetV3 paperdef __init__(self, cnf: InvertedResidualConfig, norm_layer: Callable[..., nn.Module],se_layer: Callable[..., nn.Module] = partial(SElayer, scale_activation=nn.Hardsigmoid)):super().__init__()if not (1 <= cnf.stride <= 2):raise ValueError('illegal stride value')self.use_res_connect = cnf.stride == 1 and cnf.input_channels == cnf.out_channelslayers: List[nn.Module] = []activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU# expandif cnf.expanded_channels != cnf.input_channels:layers.append(ConvNormActivation(cnf.input_channels, cnf.expanded_channels, kernel_size=1,norm_layer=norm_layer, activation_layer=activation_layer))# depthwisestride = 1 if cnf.dilation > 1 else cnf.stridelayers.append(ConvNormActivation(cnf.expanded_channels, cnf.expanded_channels, kernel_size=cnf.kernel,stride=stride, dilation=cnf.dilation, groups=cnf.expanded_channels,norm_layer=norm_layer, activation_layer=activation_layer))if cnf.use_se:squeeze_channels = _make_divisible(cnf.expanded_channels // 4, 8)layers.append(se_layer(cnf.expanded_channels, squeeze_channels))# projectlayers.append(ConvNormActivation(cnf.expanded_channels, cnf.out_channels, kernel_size=1, norm_layer=norm_layer,activation_layer=None))self.block = nn.Sequential(*layers)self.out_channels = cnf.out_channelsself._is_cn = cnf.stride > 1def forward(self, input: Tensor) -> Tensor:result = self.block(input)if self.use_res_connect:result += inputreturn result
self.features = nn.Sequential(*layers)self.avgpool = nn.AdaptiveAvgPool2d(1)self.classifier = nn.Sequential(nn.Linear(lastconv_output_channels, last_channel),nn.Hardswish(inplace=True),nn.Dropout(p=0.2, inplace=True),nn.Linear(last_channel, num_classes),)
2.确定FPN层
in_channels_list = [40,112,960]out_channels = 256backbone = fpn.BackboneWithFPN(monile_v3_backbone,return_layers,in_channels_list,out_channels)
3.确定anchor_generator
anchor_sizes=((64,),(128,),(256,),(512,)) # 这里是元组里面的一组,所以是生成3*4=12中anchor
aspect_ratios=((0.5, 1.0, 2.0),)*len(anchor_sizes)
anchor_generator = AnchorsGenerator(sizes=anchor_sizes, # 这里是元组里面的一组,所以是生成3*4=12中anchoraspect_ratios=aspect_ratios)
4.确定roi_pooler
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0','1','2'], # 在哪些特征层上进行roi poolingoutput_size=[7, 7], # roi_pooling输出特征矩阵尺寸sampling_ratio=2) # 采样率
5.生成FasterRCNN
model = FasterRCNN(backbone=backbone,num_classes=num_classes,rpn_anchor_generator=anchor_generator,box_roi_pool=roi_pooler)shen
三、
stage4 5 以及最后1*1卷积的输出
#efficientB0
def create_model(num_classes):monile_v3_backbone = torchvision.models.efficientnet_b0()return_layers = {"features.3": "0" , "features.5": "1" , "features.8": "2"}monile_v3_backbone = create_feature_extractor(monile_v3_backbone, return_nodes=return_layers)img = torch.randn(1,3,224,224)# outputs = monile_v3_backbone(img)in_channels_list = [40,112,1280]out_channels = 256backbone = fpn.BackboneWithFPN(monile_v3_backbone,return_layers,in_channels_list,out_channels)anchor_sizes=((64,),(128,),(256,),(512,)) # 这里是元组里面的一组,所以是生成3*4=12中anchoraspect_ratios=((0.5, 1.0, 2.0),)*len(anchor_sizes)anchor_generator = AnchorsGenerator(sizes=anchor_sizes, # 这里是元组里面的一组,所以是生成3*4=12中anchoraspect_ratios=aspect_ratios)roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0','1','2'], # 在哪些特征层上进行roi poolingoutput_size=[7, 7], # roi_pooling输出特征矩阵尺寸sampling_ratio=2) # 采样率model = FasterRCNN(backbone=backbone,num_classes=num_classes,rpn_anchor_generator=anchor_generator,box_roi_pool=roi_pooler)return model
class MBConv(nn.Module):def __init__(self, cnf: MBConvConfig, stochastic_depth_prob: float, norm_layer: Callable[..., nn.Module],se_layer: Callable[..., nn.Module] = SqueezeExcitation) -> None:super().__init__()if not (1 <= cnf.stride <= 2):raise ValueError('illegal stride value')self.use_res_connect = cnf.stride == 1 and cnf.input_channels == cnf.out_channelslayers: List[nn.Module] = []activation_layer = nn.SiLU# expandexpanded_channels = cnf.adjust_channels(cnf.input_channels, cnf.expand_ratio)if expanded_channels != cnf.input_channels:layers.append(ConvNormActivation(cnf.input_channels, expanded_channels, kernel_size=1,norm_layer=norm_layer, activation_layer=activation_layer))# depthwiselayers.append(ConvNormActivation(expanded_channels, expanded_channels, kernel_size=cnf.kernel,stride=cnf.stride, groups=expanded_channels,norm_layer=norm_layer, activation_layer=activation_layer))# squeeze and excitationsqueeze_channels = max(1, cnf.input_channels // 4)layers.append(se_layer(expanded_channels, squeeze_channels, activation=partial(nn.SiLU, inplace=True)))# projectlayers.append(ConvNormActivation(expanded_channels, cnf.out_channels, kernel_size=1, norm_layer=norm_layer,activation_layer=None))self.block = nn.Sequential(*layers)self.stochastic_depth = StochasticDepth(stochastic_depth_prob, "row")self.out_channels = cnf.out_channelsdef forward(self, input: Tensor) -> Tensor:result = self.block(input)if self.use_res_connect:result = self.stochastic_depth(result)result += inputreturn result
Fater RCNN 试着加入注意力机制模型相关推荐
- Attention!注意力机制模型最新综述(附下载)
来源:专知 本文多资源,建议阅读5分钟. 本文详细描述了Attention模型的概念.定义.影响以及如何着手进行实践工作. [导 读]Attention模型目前已经成为神经网络中的一个重要概念,本文为 ...
- Context R-CNN一种基于注意力机制的视频检测算法
最近遇到同一环境下,拍摄多张图片,检测结果存在差异的问题,故调研,考虑使用融合多帧信息去解决上述问题,发现这篇论文,该算法适用于我当前的问题,更适用于从事监控领域的同学,算法细节不赘述,看算法主体思路 ...
- NLP基础模型和注意力机制
3.1 基础模型 欢迎来到本次课程的最后一周的内容,同时这也是五门深度学习课程的最后一门,你即将抵达本课程的终点. 你将会学习seq2seq(sequence to sequence)模型,从机器翻译 ...
- 注意力机制(一):注意力提示、注意力汇聚、Nadaraya-Watson 核回归
专栏:神经网络复现目录 注意力机制 注意力机制(Attention Mechanism)是一种人工智能技术,它可以让神经网络在处理序列数据时,专注于关键信息的部分,同时忽略不重要的部分.在自然语言处理 ...
- Deep Reading | 从0到1再读注意力机制,此文必收藏!
译者 | forencegan 编辑 | 琥珀 出品 | AI科技大本营(ID: rgznai100) [AI科技大本营导语]注意力机制(Attention)已经成为深度学习必学内容之一,无论是计算机 ...
- 【Pytorch神经网络理论篇】 20 神经网络中的注意力机制
注意力机制可以使神经网络忽略不重要的特征向量,而重点计算有用的特征向量.在抛去无用特征对拟合结果于扰的同时,又提升了运算速度. 1 注意力机制 所谓Attention机制,便是聚焦于局部信息的机制,比 ...
- 论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析...
MaY, Peng H, Cambria E. Targeted aspect-based sentiment analysis via embedding commonsense knowledge ...
- 《Effective Approaches to Attention-based Neural Machine Translation》—— 基于注意力机制的有效神经机器翻译方法
目录 <Effective Approaches to Attention-based Neural Machine Translation> 一.论文结构总览 二.论文背景知识 2.1 ...
- 论文解读:医学影像中的注意力机制
点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达 来源|Daniel Liu@知乎,https://zhuanlan ...
最新文章
- poj2139(Flody算法)
- 02-NLP-01-jieba中文处理
- T-SQL查询进阶--流程控制语句
- 总结:一款Loading动画的实现思路
- wxpython 安装教程
- SAP Spartacus user form页面的css设计重构
- 使用SSL和Spring Security保护Tomcat应用程序的安全
- 基于uniapp开发的适用于微信小程序,头条小程序
- 李迟2021年11月知识总结
- c语言 __FILE__,__DATE__,__TIME__ (宏)
- bzoj4517[Sdoi2016]排列计数(组合数,错排)
- 数字式PID控制MATLAB仿真
- 将vim打造成强大的python和c的ide
- Farkas'Lemma 和 S-Lemma
- 2017 最新qq登录算法
- 流量分类方法设计(一)——参考论文整理
- 2019最新前端薪资报告来啦!前端的工资到底有多高?其实真相是这样的......
- Linux使用tar命令进行磁带备份
- 【SDOI2009】学校食堂
- BZOJ_4398_福慧双修BZOJ_2407_探险_分治+dij