深度学习目标检测——结构变化顺序是RCNN->SPP->Fast RCNN->Faster RCNN->YOLO->SSD->YOLO2->Mask RCNN->YOLO3。

博文末尾支持二维码赞赏哦 _

1. RCNN 区域卷积神经网络

RCNN 网络思想：

1. 首先使用基于图论和层次聚类的候选框提取算法（SS）,在原图像上提取一些可能的候选框；

2. 对得到的候选框直接使用 resize变形算法之间变形到固定的尺寸；

3. 对变形后的候选框分别使用 CNN卷积网络提取特征得到固定长度的特征向量；

4. 使用SVM支持向量机进行分类

5. 使用回归算法，在候选框的基础上回归得到真实的边框。

RCNN存在三个明显的问题：

1.多个候选区域对应的图像需要预先提取，占用较大的磁盘空间；

2.针对传统CNN需要固定尺寸的输入图像，crop/warp（归一化）产生物体截断或拉伸，会导致输入CNN的信息丢失；

3.每个ProposalRegion都需要进入CNN网络计算，上千个Region存在大量的重叠，重复的特征提取带来巨大的计算浪费。

`2. SPP网络空间金字塔网络`

既然CNN的特征提取过程如此耗时（大量的卷积计算），为什么要对每一个候选区域独立计算，

而不是提取整体特征，仅在分类之前做一次Region截取呢

SPP网络思想：

1. 基于SS算法在原图像上提取一些可能的候选框；

2. 对整张图像进行 CNN 卷积网路提取特征；

3. 在特征图上根据1提取到的边框提取ROI区域（考虑卷积步长）；

4. 对得到的ROI特征图，使用SPP池化，分别对整张图、2*2图、4*4图的每个子图使用最大值池化，提取1+4+16=21个特征；

5. 使用SVM支持向量机进行分类

6. 使用回归算法，在候选框的基础上回归得到真实的边框。

尽管SPP-Net贡献很大，仍然存在很多问题：

1. 和RCNN一样，训练过程仍然是隔离的，提取候选框ss | 计算CNN特征| SVM分类 | Bounding Box回归独立训练，

大量的中间结果需要转存，无法整体训练参数；

2. SPP-Net无法同时协调SPP-层两边的卷积层和全连接层，很大程度上限制了深度CNN的效果；

3. 在整个过程中，Proposal Region(ss算法)仍然很耗时。

3. Fast RCNN 快速-区域卷积神经网络

Fast RCNN 网络思想：

1. ss 算法产出候选区域（region of interest, ROI）；

2. 对整张图像使用 CNN 提取出提取出feature map；

3. ROI对应的特征区域进入Fast RCNN网络；

4. 对ROI使用 ROI pooling层(单层SPP，分成W*H个子图，最大值池化) 框定（固定）尺寸，

通过一些全连接层得到ROI 特征向量；

5. ROI 特征向量，分别进入两个不同的全连接层，一个通过 softmax得到类别概率，分类结果，

另一个通过regression回归，得到 bounding box 边界框。

问题在以下方面得到改进：

1. 卖点1 - 借鉴SPP思路，提出简化版的ROI池化层（注意，没用金字塔，替换spp），同时加入了候选框映射功能，

使得网络能够反向传播，解决了SPP的整体网络训练问题；

2. 卖点2 - 多任务Loss层;

A. SoftmaxLoss代替了SVM，证明了softmax比SVM更好的效果；

B. SmoothL1Loss取代Bouding box回归。

将分类和边框回归进行合并（又一个开创性的思路），通过多任务Loss层进一步整合深度网络，统一了训练过程，从而提高了算法准确度。

3. 全连接层通过SVD加速，这个大家可以自己看，有一定的提升但不是革命性的;

4. 结合上面的改进，模型训练时可对所有层进行更新，除了速度提升外（训练速度是SPP的3倍，测试速度10倍），

得到了更好的检测效果（VOC07数据集mAP为70，注：mAP，mean Average Precision）。

4. Faster RCNN 更快速-区域卷积神经网络

对于提取候选框最常用的SelectiveSearch方法，提取一副图像大概需要2s的时间，
改进的EdgeBoxes算法将效率提高到了0.2s，但是这还不够。
候选框提取不一定要在原图上做，特征图上同样可以，低分辨率特征图意味着更少的计算量，
基于这个假设，MSRA的任少卿等人提出RPN（RegionProposal Network），完美解决了这个问题，我们先来看一下网络拓扑。

Faster RCNN 网络思想：

1. 对整张图像使用 CNN 提取出提取出特征图feature map

2. 使用区域建议网络RPN(Region Proposal Network) 代替之前的 ss 算法，

在特征图上产生候选区域 ROI（RPN网络与检测网络共享特征）。

3. ROI对应的特征区域进入Fast RCNN网络

4. 先通过 ROI pooling层,将ROI特征区域池化到（固定）尺寸，
通过一些全连接层得到ROI 特征向量，

5. ROI 特征向量，分别进入两个不同的全连接层，一个通过 softmax得到类别概率，分类结果，

另一个通过regression回归，得到 bounding box 边界框。

5.YOLO - v1 一个阶段的卷积神经网络

卷积提取特征，边框回归+softmax分类，NMS飞极大值抑制，得到边框和预测结果.

其中，卷积层用来提取图像特征，全连接层用来预测图像位置和类别概率值。
YOLO网络借鉴了GoogLeNet分类网络结构。不同的是，YOLO未使用inception
module，而是使用1x1卷积层（此处1x1卷积层的存在是为了跨通道信息整合）+3x3卷积层简单替代。

YOLO 算法描述为：

    1）将图像划分为固定的网格（比如7*7），如果某个样本Object中心落在对应网格，该网格负责这个Object位置的回归；2）每个网格预测包含Object位置与置信度信息，这些信息编码为一个向量；3）网络输出层即为每个Grid的对应结果，由此实现端到端的训练。

YOLO算法的问题有以下几点：

    1）7*7的网格回归特征丢失比较严重，缺乏多尺度回归依据；2）Loss计算方式无法有效平衡（不管是加权或者均差），Loss收敛变差，导致模型不稳定。

预测格子设置及方案：

7*7*30   （4 + 1）×2 +20 =30
每个格子输出B个bounding box（包含物体的矩形区域）信息，以及C个物体属于某种类别的概率信息。
Bounding box信息包含5个数据值，分别是x,y,w,h,和confidence。
其中x,y是指当前格子预测得到的物体的bounding box的中心位置的坐标。
w,h是bounding box的宽度和高度。

注意：

实际训练过程中，w和h的值使用图像的宽度和高度进行归一化到[0,1]区间内；

x，y是bounding box中心位置相对于当前格子位置的偏移值，并且被归一化到[0,1]。YOLO网络最终的全连接层的输出维度是 S*S*(B*5 + C)。
YOLO论文中，作者训练采用的输入图像分辨率是448x448，S=7，B=2；
采用VOC 20类标注物体作为训练数据，C=20。因此输出向量为7*7*(20 + 2*5)=1470维。

注：
1. 由于输出层为全连接层，因此在检测时，YOLO训练模型只支持与训练图像相同的输入分辨率。
2. 虽然每个格子可以预测B个bounding box，但是最终只选择只选择交并比 IOU 最高的bounding box，最后通过NMS后，输出，
3. 即每个格子最多只预测出一个物体。当物体占画面比例较小，如图像中包含畜群或鸟群时，每个格子包含多个物体，但却只能检测出其中一个。这是YOLO方法的一个缺陷。

yolo-v1 结构

1. 448*448*3 图像输入
2. 7*7卷积步长2 64输出  + 2*2最大值池化步长2
3. 3*3卷积步长1 192输出 + 2*2最大值池化步长2
4. 1*1卷积128输出  + 3*3卷积256输出       + 1*1卷积256输出 + 3*3卷积512输出 + 2*2最大值池化步长2
5. (1*1卷积256输出 + 3*3卷积512输出 )*4次 + 1*1卷积512输出 + 3*3卷积1024输出 + 2*2最大值池化步长2
6. (1*1卷积512输出 + 3*3卷积1024输出)*2次 + 3*3卷积1024输出 + 3*3卷积步长2 1024输出
7. 3*3卷积1024输出*2次
8. 全链接层（1*1卷积） 512输出 + 全链接层（1*1卷积）4096输出 + 全链接层（1*1卷积）1470输出

yolo-v1 tensorflow实现

github代码

#-*- coding: utf-8 -*-
"""
Yolo V1 by tensorflow
23个卷积层
输入： 448*448*3  ，  输出： N*1*1470
# 每张图片划分 7*7个网格 每个网格预测 2个框 每个框 5个参数 共同拥有 20个类别预测概率 7*7*（2*5+20）=1470
1. 448*448*3 图像输入
2. 7*7卷积步长2 64输出 + 2*2最大值池化步长2
3. 3*3卷积步长1 192输出 + 2*2最大值池化步长2
4. 1*1卷积128输出 + 3*3卷积256输出 + 1*1卷积256输出 + 3*3卷积512输出 + 2*2最大值池化步长2
5. (1*1卷积256输出 + 3*3卷积512输出 )*4次 + 1*1卷积512输出 + 3*3卷积1024输出 + 2*2最大值池化步长2
6. (1*1卷积512输出 + 3*3卷积1024输出)*2次 + 3*3卷积1024输出 + 3*3卷积步长2 1024输出
7. 3*3卷积1024输出*2次
8. 全链接层（1*1卷积） 512输出 + 全链接层（1*1卷积）4096输出 + 全链接层（1*1卷积）1470输出
paper https://pjreddie.com/media/files/papers/yolo.pdf
"""import numpy as np
import tensorflow as tf
import cv2# Leaky ReLU激活函数
def leak_relu(x, alpha=0.1):return tf.maximum(alpha * x, x)class Yolo(object):def __init__(self, weights_file, verbose=True):#打印调试信息标志verboseself.verbose = verbose# detection paramsself.S = 7  # cell size       分割的格子尺寸 7*7个格子self.B = 2  # boxes_per_cell  每个格子预测 的 边框数量self.classes = ["aeroplane", "bicycle", "bird", "boat", "bottle","bus", "car", "cat", "chair", "cow", "diningtable","dog", "horse", "motorbike", "person", "pottedplant","sheep", "sofa", "train","tvmonitor"]# 分类的物体类别self.C = len(self.classes) # number of classes 类别总数# offset for box center (top left point of each cell) 格子框的 左上角 0 1 2 3 4 5 6self.x_offset = np.transpose(np.reshape(np.array([np.arange(self.S)]*self.S*self.B),[self.B, self.S, self.S]), [1, 2, 0])self.y_offset = np.transpose(self.x_offset, [1, 0, 2])self.threshold = 0.2  # 置信度*类别概率 得分预置 confidence scores threhold  最终显示框self.iou_threshold = 0.4 # 交并比 预置#  the maximum number of boxes to be selected by non max suppressionself.max_output_size = 10# 非极大值抑制 选择的框的数量self.sess = tf.Session()self._build_net()# 创建网络结构self._build_detector()# 对网络输出 转换成识别结果self._load_weights(weights_file)# In 输入： 448*448*3  ， OUT 输出： N*1*1470def _build_net(self):"""build the network"""if self.verbose:print("Start to build the network ...")#### 1. 448*448*3 图像输入   ####################################################################################self.images = tf.placeholder(tf.float32, [None, 448, 448, 3])# 448*448*3 rgb三通道图像#### 2. 7*7卷积步长2 64输出 + 2*2最大值池化步长2 ##################################################################net = self._conv_layer(self.images, 1, 64, 7, 2)#N*448*448*3 -> 7*7*3*64/2 -> N*224*224*64net = self._maxpool_layer(net, 1, 2, 2)  # 2*2 池化核 步长2            -> 112*112*64#### 3. 3*3卷积步长1 192输出 + 2*2最大值池化步长2 #################################################################net = self._conv_layer(net, 2, 192, 3, 1)# 112*112*64 -> 3*3*64*192/1 -> 112*112*192net = self._maxpool_layer(net, 2, 2, 2)  # 2*2 池化核 步长2            -> 56*56*192#### 4. 1*1卷积128输出 + 3*3卷积256输出 + 1*1卷积256输出 + 3*3卷积512输出 + 2*2最大值池化步长2######################net = self._conv_layer(net, 3, 128, 1, 1)# 56*56*192 -> 1*1*192*128/1 -> 56*56*128net = self._conv_layer(net, 4, 256, 3, 1)# 56*56*128 -> 3*3*128*256/1 -> 56*56*256net = self._conv_layer(net, 5, 256, 1, 1)# 56*56*256 -> 1*1*256*256/1 -> 56*56*256net = self._conv_layer(net, 6, 512, 3, 1)# 56*56*256 -> 3*3*256*512/1 -> 56*56*512net = self._maxpool_layer(net, 6, 2, 2)  # 2*2 池化核 步长2            -> 28*28*512#### 5. (1*1卷积256输出 + 3*3卷积512输出 )*4次 + 1*1卷积512输出 + 3*3卷积1024输出 + 2*2最大值池化步长2 ##############net = self._conv_layer(net, 7, 256, 1, 1)# 28*28*512 -> 1*1*512*256/1 -> 28*28*256net = self._conv_layer(net, 8, 512, 3, 1)# 28*28*256 -> 3*3*256*512/1 -> 28*28*512net = self._conv_layer(net, 9, 256, 1, 1)# 28*28*512 -> 1*1*512*256/1 -> 28*28*256net = self._conv_layer(net, 10, 512, 3, 1)#28*28*256 -> 3*3*256*512/1 -> 28*28*512net = self._conv_layer(net, 11, 256, 1, 1)#28*28*512 -> 1*1*512*256/1 -> 28*28*256net = self._conv_layer(net, 12, 512, 3, 1)#28*28*256 -> 3*3*256*512/1 -> 28*28*512net = self._conv_layer(net, 13, 256, 1, 1)#28*28*512 -> 1*1*512*256/1 -> 28*28*256net = self._conv_layer(net, 14, 512, 3, 1)#28*28*256 -> 3*3*256*512/1 -> 28*28*512#### ###### 1*1卷积512输出 + 3*3卷积1024输出 + 2*2最大值池化步长2#############net = self._conv_layer(net, 15, 512, 1, 1)#28*28*512 -> 1*1*512*512/1 -> 28*28*512net = self._conv_layer(net, 16, 1024, 3, 1)#28*28*512-> 3*3*512*1024/1-> 28*28*1024net = self._maxpool_layer(net, 16, 2, 2)   #2*2 池化核 步长2            -> 14*14*1024#### 6. (1*1卷积512输出 + 3*3卷积1024输出)*2次 + 3*3卷积1024输出 + 3*3卷积步长2 1024输出################################net = self._conv_layer(net, 17, 512, 1, 1) #14*14*1024 -> 1*1*1024*512/1 -> 14*14*512net = self._conv_layer(net, 18, 1024, 3, 1)#14*14*512  -> 3*3*512*1024/1 -> 14*14*1024net = self._conv_layer(net, 19, 512, 1, 1) #14*14*1024 -> 1*1*1024*512/1 -> 14*14*512net = self._conv_layer(net, 20, 1024, 3, 1)#14*14*512  -> 3*3*512*1024/1 -> 14*14*1024net = self._conv_layer(net, 21, 1024, 3, 1)#14*14*1024 -> 3*3*1024*1024/1-> 14*14*1024net = self._conv_layer(net, 22, 1024, 3, 2)#14*14*1024 -> 3*3*1024*1024/2-> 7*7*1024#### 7. 3*3卷积1024输出*2次 ##########################################################################################net = self._conv_layer(net, 23, 1024, 3, 1)#7*7*1024   -> 3*3*1024*1024/1-> 7*7*1024net = self._conv_layer(net, 24, 1024, 3, 1)#7*7*1024   -> 3*3*1024*1024/1-> 7*7*1024#### 8. 全链接层（1*1卷积） 512输出 + 全链接层（1*1卷积）4096输出 + 全链接层（1*1卷积）1470输出##########################net = self._flatten(net)# 7*7*1024 -> 1* 7*7*1024net = self._fc_layer(net, 25, 512, activation=leak_relu)# 1*512net = self._fc_layer(net, 26, 4096, activation=leak_relu)# 1*4096net = self._fc_layer(net, 27, self.S*self.S*(self.C+5*self.B))#1* s*s *((4+1)*b+c) = 7*7*(5*2+20) = 1*1470# 1470  =  前980 类别预测概率 + 98 边框置信度  + 196 边框1参数 + 196 边框2参数self.predicts = netdef _build_detector(self):"""Interpret the net output and get the predicted boxes"""# the width and height of orignal imageself.width = tf.placeholder(tf.float32, name="img_w")self.height = tf.placeholder(tf.float32, name="img_h")# get class prob, confidence, boxes from net outputidx1 = self.S * self.S * self.C# 总 类别预测数量  7*7*20 = 980idx2 = idx1 + self.S * self.S * self.B# 总边框数量 + 总 类别预测数量# class prediction  类别预测概率   7*7*2=98class_probs = tf.reshape(self.predicts[0, :idx1], [self.S, self.S, self.C])# confidence        置信度  0/1 * 交并比confs = tf.reshape(self.predicts[0, idx1:idx2], [self.S, self.S, self.B])# boxes -> (x, y, w, h)  7*7*1*4 + 7*7*1*4 = 196boxes = tf.reshape(self.predicts[0, idx2:], [self.S, self.S, self.B, 4])# (x,y,w,h)# convert the x, y to the coordinates relative to the top left point of the image# the predictions of w, h are the square root# multiply the width and height of image# 得到真实 矩形框 坐标中心 和 长宽尺寸boxes = tf.stack([(boxes[:, :, :, 0] + tf.constant(self.x_offset, dtype=tf.float32)) / self.S * self.width,#x小格子占比(boxes[:, :, :, 1] + tf.constant(self.y_offset, dtype=tf.float32)) / self.S * self.height,#ytf.square(boxes[:, :, :, 2]) * self.width,#w 0~1 * 图片尺寸tf.square(boxes[:, :, :, 3]) * self.height], axis=3)#h  0~1 * 图片尺寸## 最终得分 置信度*类别预测概率  class-specific confidence scores [S, S, B, C]scores = tf.expand_dims(confs, -1) * tf.expand_dims(class_probs, 2)#增加一维scores = tf.reshape(scores, [-1, self.C])  # [S*S*B, C]#98个框 每个框 20个预测得分boxes = tf.reshape(boxes, [-1, 4])  # [S*S*B, 4]#98个框 每个框 四个 边框参数 坐标中心 和 长宽尺寸# find each box class, only select the max scorebox_classes = tf.argmax(scores, axis=1)# 在98个框中找到 20个得分中最高的 类别box_class_scores = tf.reduce_max(scores, axis=1)#最高的 得分# filter the boxes by the score thresholdfilter_mask = box_class_scores >= self.threshold#大于得分显示阈值的scores = tf.boolean_mask(box_class_scores, filter_mask)# 对应最终的得分boxes = tf.boolean_mask(boxes, filter_mask)#框的位置box_classes = tf.boolean_mask(box_classes, filter_mask)#类别# non max suppression (do not distinguish different classes)# ref: https://tensorflow.google.cn/api_docs/python/tf/image/non_max_suppression# box (x, y, w, h) -> box (x1, y1, x2, y2)  得到边框 上四条边的中心点_boxes = tf.stack([boxes[:, 0] - 0.5 * boxes[:, 2], boxes[:, 1] - 0.5 * boxes[:, 3],# x-0.5*wboxes[:, 0] + 0.5 * boxes[:, 2], boxes[:, 1] + 0.5 * boxes[:, 3]], axis=1)#非极大值抑制 筛选 剔除 重叠度高的边框nms_indices = tf.image.non_max_suppression(_boxes, scores,self.max_output_size, self.iou_threshold)self.scores = tf.gather(scores, nms_indices)self.boxes = tf.gather(boxes, nms_indices)self.box_classes = tf.gather(box_classes, nms_indices)# 卷积层              输入  id   卷积核数量   卷积核尺寸   滑动卷积步长def _conv_layer(self, x, id, num_filters, filter_size, stride):"""Conv layer"""in_channels = x.get_shape().as_list()[-1]# 输入通道数量# 创建卷积权重                              尺寸*尺寸*通道数 * 卷积核数量weight = tf.Variable(tf.truncated_normal([filter_size, filter_size,in_channels, num_filters], stddev=0.1))bias = tf.Variable(tf.zeros([num_filters,]))# 偏置为 输出 卷积核数量 个# padding, note: not using padding="SAME"pad_size = filter_size // 2# 填充pad_mat = np.array([[0, 0], [pad_size, pad_size], [pad_size, pad_size], [0, 0]])x_pad = tf.pad(x, pad_mat)# 在输入层 加上pad扩展边conv = tf.nn.conv2d(x_pad, weight, strides=[1, stride, stride, 1], padding="VALID")output = leak_relu(tf.nn.bias_add(conv, bias)) #Leaky ReLU激活函数if self.verbose:print("    Layer %d: type=Conv, num_filter=%d, filter_size=%d, stride=%d, output_shape=%s" \% (id, num_filters, filter_size, stride, str(output.get_shape())))return output# 全连接层def _fc_layer(self, x, id, num_out, activation=None):"""fully connected layer"""num_in = x.get_shape().as_list()[-1]# 输入通道数量weight = tf.Variable(tf.truncated_normal([num_in, num_out], stddev=0.1))bias = tf.Variable(tf.zeros([num_out,]))output = tf.nn.xw_plus_b(x, weight, bias)if activation:output = activation(output)if self.verbose:print("    Layer %d: type=Fc, num_out=%d, output_shape=%s" \% (id, num_out, str(output.get_shape())))return output# 最大值池化层def _maxpool_layer(self, x, id, pool_size, stride):output = tf.nn.max_pool(x, [1, pool_size, pool_size, 1],strides=[1, stride, stride, 1], padding="SAME")if self.verbose:print("    Layer %d: type=MaxPool, pool_size=%d, stride=%d, output_shape=%s" \% (id, pool_size, stride, str(output.get_shape())))return output# 平滑def _flatten(self, x):"""flatten the x"""tran_x = tf.transpose(x, [0, 3, 1, 2])  # channle first modenums = np.product(x.get_shape().as_list()[1:])return tf.reshape(tran_x, [-1, nums])# 载入网络参数def _load_weights(self, weights_file):"""Load weights from file"""if self.verbose:print("Start to load weights from file:%s" % (weights_file))saver = tf.train.Saver()saver.restore(self.sess, weights_file)def detect_from_file(self, image_file, imshow=True, deteted_boxes_file="boxes.txt",detected_image_file="detected_image.jpg"):"""Do detection given a image file"""# read imageimage = cv2.imread(image_file)img_h, img_w, _ = image.shapescores, boxes, box_classes = self._detect_from_image(image)predict_boxes = []for i in range(len(scores)):predict_boxes.append((self.classes[box_classes[i]], boxes[i, 0],boxes[i, 1], boxes[i, 2], boxes[i, 3], scores[i]))self.show_results(image, predict_boxes, imshow, deteted_boxes_file, detected_image_file)#从图像上检测 物体def _detect_from_image(self, image):"""Do detection given a cv image"""img_h, img_w, _ = image.shape#图像长宽img_resized = cv2.resize(image, (448, 448))#resize到固定尺寸 448*448img_RGB = cv2.cvtColor(img_resized, cv2.COLOR_BGR2RGB)# 转到 RGB 通道img_resized_np = np.asarray(img_RGB)#转换成 数组_images = np.zeros((1, 448, 448, 3), dtype=np.float32)_images[0] = (img_resized_np / 255.0) * 2.0 - 1.0# 像素值归一化到 0~1 之间scores, boxes, box_classes = self.sess.run([self.scores, self.boxes, self.box_classes],feed_dict={self.images: _images, self.width: img_w, self.height: img_h})return scores, boxes, box_classes# 打印结果 显示框选 的结果def show_results(self, image, results, imshow=True, deteted_boxes_file=None,detected_image_file=None):"""Show the detection boxes"""img_cp = image.copy()#赋值原图像 因为要修改 画 矩形在上面if deteted_boxes_file:f = open(deteted_boxes_file, "w")#写文件#  draw boxesfor i in range(len(results)):x = int(results[i][1])#中心点坐标y = int(results[i][2])#w = int(results[i][3]) // 2# 矩形框宽度h = int(results[i][4]) // 2# 矩形框高度if self.verbose:#打印调试信息print("   class: %s, [x, y, w, h]=[%d, %d, %d, %d], confidence=%f" % (results[i][0],x, y, w, h, results[i][-1]))cv2.rectangle(img_cp, (x - w, y - h), (x + w, y + h), (0, 255, 0), 2)cv2.rectangle(img_cp, (x - w, y - h - 20), (x + w, y - h), (125, 125, 125), -1)cv2.putText(img_cp, results[i][0] + ' : %.2f' % results[i][5], (x - w + 5, y - h - 7),cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1)if deteted_boxes_file:f.write(results[i][0] + ',' + str(x) + ',' + str(y) + ',' +str(w) + ',' + str(h)+',' + str(results[i][5]) + '\n')if imshow:cv2.imshow('YOLO_small detection', img_cp)cv2.waitKey(1)if detected_image_file:cv2.imwrite(detected_image_file, img_cp)if deteted_boxes_file:f.close()if __name__ == "__main__":yolo_net = Yolo("./weights/YOLO_small.ckpt")yolo_net.detect_from_file("./test/car.jpg")

6.YOLO - v2 k-means聚类得到候选框 Passthrough层结合不同尺度层特征

YOLO - v2 在 v1上的改进：

1. 多尺度网格设计，新的网络Darknet-19，共32下采样补偿，如416*416输入，最终特征图尺寸为13*13；

2. 使用聚类从数据集中得到先验框的尺寸和数量，综合速度和准确度，使用5种先验框；

3. 每个格子预测5种框，每个框，有4个坐标参数（x,y,w,h）,1个置信度，分类数量个类别预测概率，例如20

4. 激活函数之前使用BN批规范化。

yolo-v2 结构

1. 3*3*3*32 卷积核  3通道输入 32通道输出 步长1 + 最大值池化
2. 3*3*32*64 卷积核  32通道输入 64通道输出 步长1 + 最大值池化
3. 3*3 1*1 3*3 卷积 + 最大值池化
4. 3*3 1*1 3*3 卷积 + 最大值池化
5. 3*3 1*1 3*3 1*1 3*3 卷积 + 最大值池化
6. 3*3 1*1 3*3 1*1 3*3 卷积
7.  3*3  3*3 卷积
7.5 passtrough 层 尺寸减半 通道数量变为4倍 跨层 通道合并concat
8. 3*3*(1024+64*4)*1024 卷积核  1280通道输入 1024通道输出 步长1
9. 3*3*1024* n_last_channels 卷积核  1024通道输入 n_last_channels 通道输出 步长1 检测网络输出

yolo-v2 tensorflow实现

github代码

utils.py

# -*- coding: utf-8 -*-
# 常用函数等核心函数 预处理图像  后端处理图像
# 在图片上显示检测结果  按照预测边框的得分进行排序
# 计算两个边框之间的交并比  非极大值抑制排除重复的边框
"""
Help functions for YOLOv2
"""
import random
import colorsysimport cv2
import numpy as np############## 预处理图像        ##################
############## preprocess image ##################
def preprocess_image(image, image_size=(416, 416)):"""Preprocess a image to inference"""image_cp = np.copy(image).astype(np.float32)# Float32格式# 图像变形固定尺寸 resize the imageimage_rgb = cv2.cvtColor(image_cp, cv2.COLOR_BGR2RGB)# BGR 转成 RGBimage_resized = cv2.resize(image_rgb, image_size)# 变形到固定尺寸# 归一化到 0~1 normalizeimage_normalized = image_resized.astype(np.float32) / 255.0# 除以最大值 255 -> 0~1# 扩展一个维度来存储 批次大小 expand the batch_size dim 416*416 -> 1*416*416image_expanded = np.expand_dims(image_normalized, axis=0)return image_expanded############## 后端处理图像 #######################
def postprocess(bboxes, obj_probs, class_probs, image_shape=(416, 416),threshold=0.5):"""post process the detection results 处理检测结果 """bboxes = np.reshape(bboxes, [-1, 4])# 边框值 0~1之间bboxes[:, 0::2] *= float(image_shape[1])# 乘上 图像大小 变成 实际大小 0 2 列 正比高 hbboxes[:, 1::2] *= float(image_shape[0])# 1 3列 正比 宽 Wbboxes = bboxes.astype(np.int32)# 截取整数# 边框要在 图像大小之内 clip the bboxsbbox_ref = [0, 0, image_shape[1] - 1, image_shape[0] - 1]bboxes = bboxes_clip(bbox_ref, bboxes)# 预测的 目标/非目标 概率obj_probs = np.reshape(obj_probs, [-1])class_probs = np.reshape(class_probs, [len(obj_probs), -1])class_inds = np.argmax(class_probs, axis=1)#类别class_probs = class_probs[np.arange(len(obj_probs)), class_inds]#类别概率scores = obj_probs * class_probs# 分数 = 目标/非目标 概率 * 类别概率# filter bboxes with scores > thresholdkeep_inds = scores > threshold# 得分大于阈值 的索引bboxes = bboxes[keep_inds]# 对应的得分较好的边框scores = scores[keep_inds]# 对应的得分class_inds = class_inds[keep_inds]# 对应的预测类型# 按照得分 排序sort top Kclass_inds, scores, bboxes = bboxes_sort(class_inds, scores, bboxes)# 非极大值抑制 排除 重叠度较大的预测框nmsclass_inds, scores, bboxes = bboxes_nms(class_inds, scores, bboxes)return bboxes, scores, class_inds# 最终的 边框 得分 类别索引# 在图片上显示 检测结果
def draw_detection(im, bboxes, scores, cls_inds, labels, thr=0.3):# for display############################# Generate colors for drawing bounding boxes.hsv_tuples = [(x / float(len(labels)), 1., 1.)for x in range(len(labels))]colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),colors))random.seed(10101)  # Fixed seed for consistent colors across runs.random.shuffle(colors)  # Shuffle colors to decorrelate adjacent classes.random.seed(None)  # Reset seed to default.# draw imageimgcv = np.copy(im)h, w, _ = imgcv.shapefor i, box in enumerate(bboxes):if scores[i] < thr:continuecls_indx = cls_inds[i]thick = int((h + w) / 300)# 线条粗细# 画长方形 cv2.rectangle(imgcv,(box[0], box[1]), (box[2], box[3]),colors[cls_indx], thick)mess = '%s: %.3f' % (labels[cls_indx], scores[i])if box[1] < 20:text_loc = (box[0] + 2, box[1] + 15)else:text_loc = (box[0], box[1] - 10)cv2.putText(imgcv, mess, text_loc,cv2.FONT_HERSHEY_SIMPLEX, 1e-3 * h, colors[cls_indx], thick // 3)return imgcv############## process bboxes 边框要在 图像大小之内  ##################
def bboxes_clip(bbox_ref, bboxes):"""Clip bounding boxes with respect to reference bbox."""bboxes = np.copy(bboxes)#实际边框 中心点 宽 高bboxes = np.transpose(bboxes)#转置bbox_ref = np.transpose(bbox_ref)bboxes[0] = np.maximum(bboxes[0], bbox_ref[0])#bboxes[1] = np.maximum(bboxes[1], bbox_ref[1])bboxes[2] = np.minimum(bboxes[2], bbox_ref[2])bboxes[3] = np.minimum(bboxes[3], bbox_ref[3])bboxes = np.transpose(bboxes)return bboxes
############## 按照预测边框的得分 进行排序 前 400个#################
def bboxes_sort(classes, scores, bboxes, top_k=400):"""Sort bounding boxes by decreasing order and keep only the top_k"""# if priority_inside:#     inside = (bboxes[:, 0] > margin) & (bboxes[:, 1] > margin) & \#         (bboxes[:, 2] < 1-margin) & (bboxes[:, 3] < 1-margin)#     idxes = np.argsort(-scores)#     inside = inside[idxes]#     idxes = np.concatenate([idxes[inside], idxes[~inside]])idxes = np.argsort(-scores)classes = classes[idxes][:top_k]#类别索引排序scores = scores[idxes][:top_k]#得分排序bboxes = bboxes[idxes][:top_k]#最优预测边框排序return classes, scores, bboxes#类别 得分 边框################ 计算 两个边框之间 的交并比 ########################
################ 交叠部分的面积， 并集   IOU = 交集/并集
def bboxes_iou(bboxes1, bboxes2):"""Computing iou between bboxes1 and bboxes2.Note: bboxes1 and bboxes2 can be multi-dimensional, but should broacastable."""bboxes1 = np.transpose(bboxes1)bboxes2 = np.transpose(bboxes2)# Intersection bbox and volume.int_ymin = np.maximum(bboxes1[0], bboxes2[0])int_xmin = np.maximum(bboxes1[1], bboxes2[1])int_ymax = np.minimum(bboxes1[2], bboxes2[2])int_xmax = np.minimum(bboxes1[3], bboxes2[3])int_h = np.maximum(int_ymax - int_ymin, 0.)int_w = np.maximum(int_xmax - int_xmin, 0.)int_vol = int_h * int_w# 交叠部分的面积# 并集的面积 = S1+S2-交集.vol1 = (bboxes1[2] - bboxes1[0]) * (bboxes1[3] - bboxes1[1])vol2 = (bboxes2[2] - bboxes2[0]) * (bboxes2[3] - bboxes2[1])iou = int_vol / (vol1 + vol2 - int_vol)return iou################# 非极大值抑制 排除重复的边框##########################
########## 输入为已排序的
########## 1. 选取得分较高的边框 在剩下的边框中 排除掉 与该边框 IOU较大的边框
########## 2. 再次选取得分较高的边框 按1的方法处理    直到边框被排除到没有
########## 3. 每次选择出来的边框 即为最优的边框
def bboxes_nms(classes, scores, bboxes, nms_threshold=0.5):"""Apply non-maximum selection to bounding boxes."""keep_bboxes = np.ones(scores.shape, dtype=np.bool)# 边框是否保留的标志for i in range(scores.size-1):if keep_bboxes[i]:#剩余的边框# Computer overlap with bboxes which are following.overlap = bboxes_iou(bboxes[i], bboxes[(i+1):])#计算交并比# Overlap threshold for keeping + checking part of the same classkeep_overlap = np.logical_or(overlap < nms_threshold, classes[(i+1):] != classes[i])keep_bboxes[(i+1):] = np.logical_and(keep_bboxes[(i+1):], keep_overlap)idxes = np.where(keep_bboxes)return classes[idxes], scores[idxes], bboxes[idxes]

model.py

#-*- coding: utf-8 -*-
# yolo-v2 模型文件
# dark 19  passthrough 层 跨通道合并特征
"""
YOLOv2 implemented by Tensorflow, only for predicting
1. 3*3*3*32 卷积核  3通道输入 32通道输出 步长1 + 最大值池化
2. 3*3*32*64 卷积核  32通道输入 64通道输出 步长1 + 最大值池化
3. 3*3 1*1 3*3 卷积 + 最大值池化
4. 3*3 1*1 3*3 卷积 + 最大值池化
5. 3*3 1*1 3*3 1*1 3*3 卷积 + 最大值池化
6. 3*3 1*1 3*3 1*1 3*3 卷积
7.  3*3  3*3 卷积
7.5 passtrough 层 尺寸减半 通道数量变为4倍 跨层 通道合并concat
8. 3*3*(1024+64*4)*1024 卷积核  1280通道输入 1024通道输出 步长1
9. 3*3*1024* n_last_channels 卷积核  1024通道输入 n_last_channels 通道输出 步长1 检测网络输出
"""
import osimport numpy as np
import tensorflow as tf######## basic layers #######
# 激活函数  max(0,  0.1*x)
def leaky_relu(x):return tf.nn.leaky_relu(x, alpha=0.1, name="leaky_relu")# Leaky ReLU激活函数
#def leak_relu(x, alpha=0.1):
#return tf.maximum(alpha * x, x)# Conv2d 2d 卷积 padding延拓 + 2d卷积 + 批规范化 + 激活输出
def conv2d(x, filters, size, pad=0, stride=1, batch_normalize=1,activation=leaky_relu, use_bias=False, name="conv2d"):# 对输入通道 延拓if pad > 0:x = tf.pad(x, [[0, 0], [pad, pad], [pad, pad], [0, 0]])# 2d 卷积# conv = tf.nn.conv2d(x_pad, weight, strides=[1, stride, stride, 1], padding="VALID") 需要指定4维卷积和# tf.layers.conv2d 可以省略指定输入通道数量out = tf.layers.conv2d(x, filters, size, strides=stride, padding="VALID",activation=None, use_bias=use_bias, name=name)# 批规范化if batch_normalize == 1:out = tf.layers.batch_normalization(out, axis=-1, momentum=0.9,training=False, name=name+"_bn")# 激活输出if activation:out = activation(out)return out# 最大值池化层 maxpool2d
def maxpool(x, size=2, stride=2, name="maxpool"):return tf.layers.max_pooling2d(x, size, stride)# passtrougt 层 reorg layer
# 按行和按列隔行采样的方法，就可以得到4个新的特征图。
def reorg(x, stride):return tf.extract_image_patches(x, [1, stride, stride, 1],[1, stride, stride, 1], [1,1,1,1], padding="VALID")# 网络结构  输出通道数量 默认为 5*(5+80)=425  ,默认80类
def darknet(images, n_last_channels=425):"""Darknet19 for YOLOv2"""######### 1. 3*3*3*32 卷积核  3通道输入 32通道输出 步长1 + 最大值池化 ########################net = conv2d(images, 32, 3, 1, name="conv1")net = maxpool(net, name="pool1")# 2*2的池化核 步长 2  尺寸减半######### 2. 3*3*32*64 卷积核  32通道输入 64通道输出 步长1 + 最大值池化 ######################net = conv2d(net, 64, 3, 1, name="conv2")net = maxpool(net, name="pool2")# 2*2的池化核 步长 2  尺寸减半######### 3. 3*3 1*1 3*3 卷积 + 最大值池化 ##################################################net = conv2d(net, 128, 3, 1, name="conv3_1")# 3*3*64*128 卷积核  64通道输入  128通道输出 步长1net = conv2d(net, 64, 1, name="conv3_2")    # 1*1*128*64 卷积核  128通道输入 64通道输出 步长1 net = conv2d(net, 128, 3, 1, name="conv3_3")# 3*3*64*128 卷积核  64通道输入  128通道输出 步长1net = maxpool(net, name="pool3")# 2*2的池化核 步长 2  尺寸减半######### 4. 3*3 1*1 3*3 卷积 + 最大值池化 ###################################################net = conv2d(net, 256, 3, 1, name="conv4_1")# 3*3*128*256 卷积核  128通道输入  256通道输出 步长1net = conv2d(net, 128, 1, name="conv4_2")   # 1*1*256*128 卷积核  256通道输入  128通道输出 步长1net = conv2d(net, 256, 3, 1, name="conv4_3")# 3*3*128*256 卷积核  128通道输入  256通道输出 步长1net = maxpool(net, name="pool4")# 2*2的池化核 步长 2  尺寸减半######### 5. 3*3 1*1 3*3 1*1 3*3 卷积 + 最大值池化   ##########################################net = conv2d(net, 512, 3, 1, name="conv5_1")# 3*3*256*512 卷积核  256通道输入  512通道输出 步长1net = conv2d(net, 256, 1, name="conv5_2")   # 1*1*512*256 卷积核  512通道输入  256通道输出 步长1net = conv2d(net, 512, 3, 1, name="conv5_3")# 3*3*256*512 卷积核  256通道输入  512通道输出 步长1net = conv2d(net, 256, 1, name="conv5_4")   # 1*1*512*256 卷积核  512通道输入  256通道输出 步长1net = conv2d(net, 512, 3, 1, name="conv5_5")# 3*3*256*512 卷积核  256通道输入  512通道输出 步长1shortcut = net ######## 保存大尺寸的卷积特征图###########net = maxpool(net, name="pool5")# 2*2的池化核 步长 2  尺寸减半######### 6. 3*3 1*1 3*3 1*1 3*3 卷积   ##########################################################net = conv2d(net, 1024, 3, 1, name="conv6_1")# 3*3*512*1024  卷积核  512通道输入   1024通道输出 步长1net = conv2d(net, 512, 1, name="conv6_2")    # 1*1*1024*512  卷积核  1024通道输入  512通道输出  步长1net = conv2d(net, 1024, 3, 1, name="conv6_3")# 3*3*512*1024  卷积核  512通道输入   1024通道输出 步长1net = conv2d(net, 512, 1, name="conv6_4")    # 1*1*1024*512  卷积核  1024通道输入  512通道输出  步长1net = conv2d(net, 1024, 3, 1, name="conv6_5")# 3*3*512*1024  卷积核  512通道输入   1024通道输出 步长1######### 7.  3*3  3*3 卷积 ######################################################################net = conv2d(net, 1024, 3, 1, name="conv7_1")# 3*3*1024*1024 卷积核  1024通道输入  1024通道输出 步长 net = conv2d(net, 1024, 3, 1, name="conv7_2")# 3*3*1024*1024 卷积核  1024通道输入  1024通道输出 步长 # shortcut ？？？？？？？有点问题shortcut = conv2d(shortcut, 64, 1, name="conv_shortcut")# 26*26*512 特征图  再卷积 1*1*512*64  64个输出？？shortcut = reorg(shortcut, 2)# passtrough 层 尺寸减半 通道数量变为4倍net = tf.concat([shortcut, net], axis=-1)# 跨层 通道合并######### 8. 3*3*(1024+64*4)*1024 卷积核  1280通道输入 1024通道输出 步长1 ##########################net = conv2d(net, 1024, 3, 1, name="conv8")######### 9. 3*3*1024* n_last_channels 卷积核  1024通道输入 n_last_channels 通道输出 步长1################### 检测网络输出net = conv2d(net, n_last_channels, 1, batch_normalize=0,activation=None, use_bias=True, name="conv_dec")return netif __name__ == "__main__":x = tf.random_normal([1, 416, 416, 3])#随机 0~1之间model = darknet(x)#检测saver = tf.train.Saver()with tf.Session() as sess:saver.restore(sess, "./checkpoint_dir/yolo2_coco.ckpt")#载入网络参数print(sess.run(model).shape)#打印结果

6.YOLO - v3 ResNet残差网络结构，使得网络深度更深 FPN层结合不同尺度层特征

YOLO - v3 在 v2上的改进：

1. 借鉴ResNet残差网络结构，使得网络深度更深，新的网络Darknet-53，共53个卷积层；

2. 借鉴FPN特征金字塔，在网络后面三个台阶处，使用FPN结构，小尺寸上采样结合大尺寸特征图；

3. 聚类得到9种先验框，3种尺度的特征图上每个3种先验框，不同尺度，不同数量的格子每个格子预测3种边框。

代码后补

7. 残差网络 ResNet f(x) + W*x

ResNet 网络思想：

1. 结合不同卷积层的特征
2. 模块为 f(x) + W*xf(x) 为 2个 3x3的卷积
3. 简化计算将2个3x3的卷积层替换为 1x1 + 3x3 + 1x1 卷积。

新结构中的中间3x3的卷积层首先在一个降维1x1卷积层下减少了计算，

然后在另一个1x1的卷积层下做了还原，既保持了精度又减少了计算量。

ResNet核心模块

________________________________>
|                                 +  f(x) + x
x-----> 1x1 + 3x3 + 1x1 卷积 ----->

ResNet 网络模型

残差网络 f(x) + W*x
50个卷积层
ResNet50
2018/04/22
1  64个输出  7*7卷积核 步长 2 224*224图像输入 大小减半（+BN + RELU + MaxPol）
2  3个 3*3卷积核  64输出的残差模块 256/4 = 64     且第一个残差块的第一个卷积步长为1
3  4个 3*3卷积核  128输出的残差模块 512/4 = 128   且第一个残差块的第一个卷积步长为2
4  6个 3*3卷积核  256输出的残差模块 1024/4 = 256  且第一个残差块的第一个卷积步长为2
5  3个  3*3卷积核 512输出的残差模块 2048/4 = 512  且第一个残差块的第一个卷积步长为2
6  均值池化
7  全连接层 输出 1000  类
8  softmax分类 预测类别输出
实际中，考虑计算的成本，对残差块做了计算优化，即将2个3x3的卷积层替换为 1x1 + 3x3 + 1x1 。
新结构中的中间3x3的卷积层首先在一个降维1x

ResNet 网络代码

github代码

#-*- coding: utf-8 -*-
# 残差网络 f(x) + W*x# 核心思想
"""
结合不同卷积层的特征
f(x) + W*x
f(x) 为 23x3的卷积
实际中，考虑计算的成本，对残差块做了计算C，即将2个3x3的卷积层替换为 1x1 + 3x3 + 1x1 。
新结构中的中间3x3的卷积层首先在一个降维1x1卷积层下减少了计算，然后在另一个1x1的卷积层下做了还原，既保持了精度又减少了计算量。________________________________>|                                 +  f(x) + xx-----> 1x1 + 3x3 + 1x1 卷积 ----->
"""# 网络模型
"""
残差网络 f(x) + W*x
50个卷积层
ResNet50
2018/04/22
1  64个输出  7*7卷积核 步长 2 224*224图像输入 大小减半（+BN + RELU + MaxPol）
2  3个 3*3卷积核  64输出的残差模块 256/4 = 64     且第一个残差块的第一个卷积步长为1
3  4个 3*3卷积核  128输出的残差模块 512/4 = 128   且第一个残差块的第一个卷积步长为2
4  6个 3*3卷积核  256输出的残差模块 1024/4 = 256  且第一个残差块的第一个卷积步长为2
5  3个  3*3卷积核 512输出的残差模块 2048/4 = 512  且第一个残差块的第一个卷积步长为2
6  均值池化
7  全连接层 输出 1000  类
8  softmax分类 预测类别输出
实际中，考虑计算的成本，对残差块做了计算优化，即将2个3x3的卷积层替换为 1x1 + 3x3 + 1x1 。
新结构中的中间3x3的卷积层首先在一个降维1x1卷积层下减少了计算，然后在另一个1x1的卷积层下做了还原，既保持了精度又减少了计算量。
运行
python3 ResNet50.py
# python3.4 对应的1.4版本 tensorflow 安装
sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.4.0-cp34-cp34m-linux_x86_64.whl
"""import tensorflow as tf
from tensorflow.python.training import moving_averages# 移动平均#### 使用GPU时指定 gpu设备
#import os
#os.environ["CUDA_VISIBLE_DEVICES"] = "1"####### 全连接层 卷积层初始化############################
fc_initializer = tf.contrib.layers.xavier_initializer
conv2d_initializer = tf.contrib.layers.xavier_initializer_conv2d######## 创建变量 create weight variable########################
def create_var(name, shape, initializer, trainable=True):return tf.get_variable(name, shape=shape, dtype=tf.float32,initializer=initializer, trainable=trainable)
####### 2d卷积层 conv2d layer##################################
####### 卷积核3维 加一个输出数量 [filter_w, filter_h, input_chanels, output_chanels]
def conv2d(x, num_outputs, kernel_size, stride=1, scope="conv2d"):num_inputs = x.get_shape()[-1]#输入通道数with tf.variable_scope(scope):kernel = create_var("kernel", [kernel_size, kernel_size,num_inputs, num_outputs],conv2d_initializer())return tf.nn.conv2d(x, kernel, strides=[1, stride, stride, 1],padding="SAME")
###### 全连接层 fully connected layer###########################
def fc(x, num_outputs, scope="fc"):num_inputs = x.get_shape()[-1]# 输入通道数with tf.variable_scope(scope):weight = create_var("weight", [num_inputs, num_outputs],fc_initializer())bias = create_var("bias", [num_outputs,],tf.zeros_initializer())return tf.nn.xw_plus_b(x, weight, bias)
####### 批规范化 （去均值 除以方差 零均值 1方差处理）batch norm layer#########
def batch_norm(x, decay=0.999, epsilon=1e-03, is_training=True,scope="scope"):x_shape = x.get_shape()num_inputs = x_shape[-1]#输入通道数reduce_dims = list(range(len(x_shape) - 1))with tf.variable_scope(scope):# 版本问题 beta = create_var("beta", [num_inputs,], initializer=tf.zeros_initializer())beta = create_var("beta", [num_inputs,], initializer=tf.zeros_initializer())gamma = create_var("gamma", [num_inputs,], initializer=tf.ones_initializer())# 移动均值for inferencemoving_mean = create_var("moving_mean", [num_inputs,], initializer=tf.zeros_initializer(), trainable=False)# 方差moving_variance = create_var("moving_variance", [num_inputs], initializer=tf.ones_initializer(), trainable=False)if is_training:mean, variance = tf.nn.moments(x, axes=reduce_dims)update_move_mean = moving_averages.assign_moving_average(moving_mean,mean, decay=decay)update_move_variance = moving_averages.assign_moving_average(moving_variance,variance, decay=decay)tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_move_mean)tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_move_variance)else:mean, variance = moving_mean, moving_variancereturn tf.nn.batch_normalization(x, mean, variance, beta, gamma, epsilon)############# 均值池化层 avg pool layer###################
def avg_pool(x, pool_size, scope):with tf.variable_scope(scope):return tf.nn.avg_pool(x, [1, pool_size, pool_size, 1],strides=[1, pool_size, pool_size, 1], padding="VALID")############# 最大值池化层 max pool layer######################
def max_pool(x, pool_size, stride, scope):with tf.variable_scope(scope):return tf.nn.max_pool(x, [1, pool_size, pool_size, 1],[1, stride, stride, 1], padding="SAME")
################ 残差网络 1000 类#######################
class ResNet50(object):def __init__(self, inputs, num_classes=1000, is_training=True,scope="resnet50"):self.inputs =inputs# 输入数量self.is_training = is_trainingself.num_classes = num_classes# 类别数量with tf.variable_scope(scope):# 定义模型结构 construct the model###### 1.  64个输出  7*7卷积核 步长 2 224*224图像输入 大小减半#########net = conv2d(inputs, 64, 7, 2, scope="conv1") # -> [batch, 112, 112, 64]####### 先 批规范化 在 relu激活net = tf.nn.relu(batch_norm(net, is_training=self.is_training, scope="bn1"))##### 最大值池化 步长2 大小减半net = max_pool(net, 3, 2, scope="maxpool1")  # -> [batch, 56, 56, 64]##### 2.  3个 3*3卷积核 64输出的残差模块 256/4 = 64  且第一个残差块的第一个卷积步长为1net = self._block(net, 256, 3, init_stride=1, is_training=self.is_training,scope="block2") # -> [batch, 56, 56, 256]##### 3.  4个 3*3卷积核 128输出的残差模块 512/4 = 128 且第一个残差块的第一个卷积步长为2             net = self._block(net, 512, 4, is_training=self.is_training, scope="block3")# -> [batch, 28, 28, 512]##### 4.  6个 3*3卷积核 256输出的残差模块 1024/4 = 256且第一个残差块的第一个卷积2                                                    net = self._block(net, 1024, 6, is_training=self.is_training, scope="block4")# -> [batch, 14, 14, 1024]##### 5.  3个 3*3卷积核 512输出的残差模块 2048/4 = 512 且第一个残差块的第一个卷积步长为2                                              net = self._block(net, 2048, 3, is_training=self.is_training, scope="block5")# -> [batch, 7, 7, 2048]##### 6.  均值池化                                          net = avg_pool(net, 7, scope="avgpool5") # -> [batch, 1, 1, 2048]  net = tf.squeeze(net, [1, 2], name="SpatialSqueeze") # -> [batch, 2048]##### 7. 全连接层self.logits = fc(net, self.num_classes, "fc6")       # -> [batch, num_classes]##### 8. softmax分类 预测输出self.predictions = tf.nn.softmax(self.logits)# 残差块集合 默认卷积步长为2def _block(self, x, n_out, n, init_stride=2, is_training=True, scope="block"):with tf.variable_scope(scope):h_out = n_out // 4 #上面的调用 该参数都为实际输出通道的 4被所有这里除以4# 第一个残差模型（会涉及到 不同残差集合块 不同通道的合并） f(x) + x# 这里第一个残差块的 卷积步长 与后面的残差块的卷积步长可能不一样 所以独立出来out = self._bottleneck(x, h_out, n_out, stride=init_stride, is_training=is_training, scope="bottlencek1")for i in range(1, n):#1....(n-1)个残差块out = self._bottleneck(out, h_out, n_out, is_training=is_training, scope=("bottlencek%s" % (i + 1)))return out'''实际中，考虑计算的成本，对残差块做了计算优化，即将2个3x3的卷积层替换为 1x1 + 3x3 + 1x1 。新结构中的中间3x3的卷积层首先在一个降维1x1卷积层下减少了计算，然后在另一个1x1的卷积层下做了还原，既保持了精度又减少了计算量。'''# 残差模块 f(x) + xdef _bottleneck(self, x, h_out, n_out, stride=None, is_training=True, scope="bottleneck"):""" A residual bottleneck unit"""n_in = x.get_shape()[-1]#输入通道if stride is None:stride = 1 if n_in == n_out else 2# 步长大小with tf.variable_scope(scope):# 经过两个3×3卷积(= 1*1  + 3*3  + 1*1)形成 f(x) 后与 x相加  # 第一个卷积(2d卷积 + 批规范化 + relu激活) 1*1的卷积核h = conv2d(x, h_out, 1, stride=stride, scope="conv_1")h = batch_norm(h, is_training=is_training, scope="bn_1")h = tf.nn.relu(h)#  第二个卷积(2d卷积 + 批规范化 + relu激活) 3*3的卷积核h = conv2d(h, h_out, 3, stride=1, scope="conv_2")h = batch_norm(h, is_training=is_training, scope="bn_2")h = tf.nn.relu(h)#  第三个卷积 1*1h = conv2d(h, n_out, 1, stride=1, scope="conv_3")h = batch_norm(h, is_training=is_training, scope="bn_3")if n_in != n_out:# 当 x和 f(x)通道不一致时 对x再次进行卷积 输出和f(x)一致的通道数shortcut = conv2d(x, n_out, 1, stride=stride, scope="conv_4")shortcut = batch_norm(shortcut, is_training=is_training, scope="bn_4")else:shortcut = xreturn tf.nn.relu(shortcut + h)# f(x) + w*xif __name__ == "__main__":# 32个图像  224*224*3 3个通道 x = tf.random_normal([32, 224, 224, 3])resnet50 = ResNet50(x)# 32个图像每个图像 输出类别个数个 预测概率print(resnet50.logits)
© 2018 GitHub, Inc.

8. MobileNet 深度可分解卷积 v1

模型简化思想

  3 × 3 × 3 ×16  3*3的卷积 3通道输入  16通道输出===== 3 × 3 × 1 × 3的深度卷积(3个3*3的卷积核，每一个卷积核对输入通道分别卷积后叠加输出) 输出3通道   1d卷积===== + 1 × 1 × 3 ×16的1 ×1点卷积 1*1卷积核 3通道输入  16通道输出参数数量 75/432 = 0.173*3*输入通道*输出通道 -> BN -> RELU=======>3*3*1*输入通道 -> BN -> RELU ->    1*1*输入通道*输出通道 -> BN -> RELU

MobileNet-v1 网络结构

  """1. 普通3d卷积层 3*3*3*round(32 * width_multiplier) 3*3卷积核 3通道输入 输出通道数量 随机确定1~32个2. 13个 depthwise_separable_conv2d 层 3*3*1*输入通道 -> BN -> RELU ->  1*1*输入通道*输出通道 -> BN -> RELU3. 均值池化层    7*7核   + squeeze 去掉维度为1的维4. 全连接层 输出  -> [N, 1000]5. softmax分类输出到 0~1之间"""

MobileNet v1 代码

github代码

#-*- coding: utf-8 -*-
# MobileNets模型结构
# 深度可分解卷积
# MobileNets总共28层（1 + 2 × 13 + 1 = 28）# 核心思想：
"""
将标准卷积分解成一个深度卷积和一个点卷积（1 × 1卷积核）。深度卷积将每个卷积核应用到每一个通道，
而1 × 1卷积用来组合通道卷积的输出。后文证明，这种分解可以有效减少计算量，降低模型大小。
3 × 3 × 3 ×16 3*3的卷积 3通道输入  16通道输出===== 3 × 3 × 1 × 3的深度卷积(3个3*3的卷积核，每一个卷积核对输入通道分别卷积后叠加输出) 输出3通道   1d卷积===== + 1 × 1 × 3 ×16的1 ×1点卷积 1*1卷积核 3通道输入  16通道输出
参数数量 75/432 = 0.17
3*3*输入通道*输出通道 -> BN -> RELU
=======>
3*3*1*输入通道 -> BN -> RELU ->    1*1*输入通道*输出通道 -> BN -> RELU
"""#网络结构：
"""
1. 普通3d卷积层 3*3*3*round(32 * width_multiplier) 3*3卷积核 3通道输入 输出通道数量 随机确定1~32个
2. 13个 depthwise_separable_conv2d 层 3*3*1*输入通道 -> BN -> RELU ->  1*1*输入通道*输出通道 -> BN -> RELU
3. 均值池化层     7*7核   + squeeze 去掉维度为1的维
4. 全连接层 输出  -> [N, 1000]
5. softmax分类输出到 0~1之间
"""import tensorflow as tf
from tensorflow.python.training import moving_averagesUPDATE_OPS_COLLECTION = "_update_ops_"
#### 使用GPU时指定 gpu设备
# import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "1"################################################################
# 创建变量 create variable 默认可优化训练
def create_variable(name, shape, initializer,dtype=tf.float32, trainable=True):return tf.get_variable(name, shape=shape, dtype=dtype,initializer=initializer, trainable=trainable)################################################################
# 批规范化 归一化层 BN层 减均值除方差 batchnorm layer
# s1 = W*x + b
# s2 = (s1 - s1均值)/s1方差
# s3 = beta * s2 + gamma
def bacthnorm(inputs, scope, epsilon=1e-05, momentum=0.99, is_training=True):inputs_shape = inputs.get_shape().as_list()# 输出 形状尺寸params_shape = inputs_shape[-1:]# 输入参数的长度axis = list(range(len(inputs_shape) - 1))with tf.variable_scope(scope):beta = create_variable("beta", params_shape,initializer=tf.zeros_initializer())gamma = create_variable("gamma", params_shape,initializer=tf.ones_initializer())# 均值 常量 不需要训练 for inferencemoving_mean = create_variable("moving_mean", params_shape,initializer=tf.zeros_initializer(), trainable=False)# 方差 常量 不需要训练moving_variance = create_variable("moving_variance", params_shape,initializer=tf.ones_initializer(), trainable=False)if is_training:mean, variance = tf.nn.moments(inputs, axes=axis)# 计算均值和方差# 移动平均求 均值和 方差  考虑上一次的量 xt = a * x_t-1 +(1-a)*x_nowupdate_move_mean = moving_averages.assign_moving_average(moving_mean,mean, decay=momentum)update_move_variance = moving_averages.assign_moving_average(moving_variance,variance, decay=momentum)tf.add_to_collection(UPDATE_OPS_COLLECTION, update_move_mean)tf.add_to_collection(UPDATE_OPS_COLLECTION, update_move_variance)else:mean, variance = moving_mean, moving_variancereturn tf.nn.batch_normalization(inputs, mean, variance, beta, gamma, epsilon)################################################################
##### 实现 3*3*1*输入通道卷积
# 3*3*输入通道*输出通道 -> BN -> RELU
# =======>
# 3*3*1*输入通道 -> BN -> RELU ->    1*1*输入通道*输出通道 -> BN -> RELU
#########################
# depthwise conv2d layer
def depthwise_conv2d(inputs, scope, filter_size=3, channel_multiplier=1, strides=1):inputs_shape = inputs.get_shape().as_list()# 输入通道 形状尺寸 64*64* 512in_channels = inputs_shape[-1]#输入通道数量 最后一个参数 512with tf.variable_scope(scope):filter = create_variable("filter", shape=[filter_size, filter_size,in_channels, channel_multiplier],initializer=tf.truncated_normal_initializer(stddev=0.01))return tf.nn.depthwise_conv2d(inputs, filter, strides=[1, strides, strides, 1],padding="SAME", rate=[1, 1])#################################################################
# 正常的卷积层 conv2d layer 输出通道    核大小
def conv2d(inputs, scope, num_filters, filter_size=1, strides=1):inputs_shape = inputs.get_shape().as_list()# 输入通道 形状尺寸 64*64* 512in_channels = inputs_shape[-1]#输入通道数量 最后一个参数 512with tf.variable_scope(scope):filter = create_variable("filter", shape=[filter_size, filter_size,in_channels, num_filters],initializer=tf.truncated_normal_initializer(stddev=0.01))return tf.nn.conv2d(inputs, filter, strides=[1, strides, strides, 1],padding="SAME")################################################################
# 均值池化层 avg pool layer
def avg_pool(inputs, pool_size, scope):with tf.variable_scope(scope):return tf.nn.avg_pool(inputs, [1, pool_size, pool_size, 1],strides=[1, pool_size, pool_size, 1], padding="VALID")################################################################
# 全连接层 fully connected layer
def fc(inputs, n_out, scope, use_bias=True):inputs_shape = inputs.get_shape().as_list()# 输入通道 形状尺寸 1*1* 512 输入时已经被展开了n_in = inputs_shape[-1]#输入通道数量 最后一个参数 512with tf.variable_scope(scope):weight = create_variable("weight", shape=[n_in, n_out],initializer=tf.random_normal_initializer(stddev=0.01))if use_bias:#带偏置 与输出通道数量 同维度bias = create_variable("bias", shape=[n_out,],initializer=tf.zeros_initializer())return tf.nn.xw_plus_b(inputs, weight, bias)#带偏置 相乘return tf.matmul(inputs, weight)#不带偏置 相乘################################################################
##### MobileNet模型结构定义 ####################################
class MobileNet(object):def __init__(self, inputs, num_classes=1000, is_training=True,width_multiplier=1, scope="MobileNet"):"""The implement of MobileNet(ref:https://arxiv.org/abs/1704.04861):param inputs:      输入数据 4-D Tensor of [batch_size, height, width, channels]:param num_classes: 类别数量 ImageNet 1000 类物体 number of classes:param is_training: 训练模型 Boolean, whether or not the model is training:param width_multiplier: 宽度乘数 0~1 改变网络输入输出通道数量 float, controls the size of model:param scope: Optional scope for variables"""self.inputs = inputs#输入数据self.num_classes = num_classes#类别数量self.is_training = is_training#训练标志self.width_multiplier = width_multiplier# 模型 输入输出通道数量 宽度乘数 因子# 定义模型结构 construct modelwith tf.variable_scope(scope):######## 1. 普通3d卷积层  随机输出 通道数量 round(32 * width_multiplier) 步长2 卷积+NB+RELU###############  3*3*3*round(32 * width_multiplier) 3*3卷积核 3通道输入 输出通道数量 随机确定1~32个net = conv2d(inputs, "conv_1", round(32 * width_multiplier), filter_size=3,strides=2)  # ->[N, 112, 112, 32]net = tf.nn.relu(bacthnorm(net, "conv_1/bn", is_training=self.is_training))# NB+RELU######## 2. 13个 depthwise_separable_conv2d 层 ###################################  3*3*1*输入通道 -> BN -> RELU ->    1*1*输入通道*输出通道 -> BN -> RELU###################### a.  MobileNet 核心模块 64输出 卷积步长1 尺寸不变net = self._depthwise_separable_conv2d(net, 64, self.width_multiplier,"ds_conv_2") # ->[N, 112, 112, 64]###################### b.  MobileNet 核心模块 128输出 卷积步长2 尺寸减半net = self._depthwise_separable_conv2d(net, 128, self.width_multiplier,"ds_conv_3", downsample=True) # ->[N, 56, 56, 128]###################### c.  MobileNet 核心模块 128输出 卷积步长1 尺寸不变                   net = self._depthwise_separable_conv2d(net, 128, self.width_multiplier,"ds_conv_4") # ->[N, 56, 56, 128]###################### d.  MobileNet 核心模块 256 输出 卷积步长2 尺寸减半                   net = self._depthwise_separable_conv2d(net, 256, self.width_multiplier,"ds_conv_5", downsample=True) # ->[N, 28, 28, 256]###################### e.  MobileNet 核心模块 256输出 卷积步长1 尺寸不变                  net = self._depthwise_separable_conv2d(net, 256, self.width_multiplier,"ds_conv_6") # ->[N, 28, 28, 256]###################### f.  MobileNet 核心模块 512 输出 卷积步长2 尺寸减半                           net = self._depthwise_separable_conv2d(net, 512, self.width_multiplier,"ds_conv_7", downsample=True) # ->[N, 14, 14, 512]###################### g.  MobileNet 核心模块 512输出 卷积步长1 尺寸不变                  net = self._depthwise_separable_conv2d(net, 512, self.width_multiplier,"ds_conv_8") # ->[N, 14, 14, 512]###################### h.  MobileNet 核心模块 512输出 卷积步长1 尺寸不变                    net = self._depthwise_separable_conv2d(net, 512, self.width_multiplier,"ds_conv_9")  # ->[N, 14, 14, 512]###################### i.  MobileNet 核心模块 512输出 卷积步长1 尺寸不变                   net = self._depthwise_separable_conv2d(net, 512, self.width_multiplier,"ds_conv_10")  # ->[N, 14, 14, 512]###################### j.  MobileNet 核心模块 512输出 卷积步长1 尺寸不变                  net = self._depthwise_separable_conv2d(net, 512, self.width_multiplier,"ds_conv_11")  # ->[N, 14, 14, 512]###################### k.  MobileNet 核心模块 512输出 卷积步长1 尺寸不变                  net = self._depthwise_separable_conv2d(net, 512, self.width_multiplier,"ds_conv_12")  # ->[N, 14, 14, 512]###################### l.  MobileNet 核心模块 1024输出 卷积步长2 尺寸减半                 net = self._depthwise_separable_conv2d(net, 1024, self.width_multiplier,"ds_conv_13", downsample=True) # ->[N, 7, 7, 1024]###################### m.  MobileNet 核心模块 1024输出 卷积步长1 尺寸不变                    net = self._depthwise_separable_conv2d(net, 1024, self.width_multiplier,"ds_conv_14") # ->[N, 7, 7, 1024]######### 3. 均值池化层    7*7核   + squeeze 去掉维度为1的维net = avg_pool(net, 7, "avg_pool_15")# ->[N, 1, 1, 1024]net = tf.squeeze(net, [1, 2], name="SpatialSqueeze")# 去掉维度为1的维[N, 1, 1, 1024] => [N,1024]######### 4. 全连接层 输出  -> [N, 1000]self.logits = fc(net, self.num_classes, "fc_16")# -> [N, 1000]######### 5. softmax分类输出到 0~1之间self.predictions = tf.nn.softmax(self.logits)####################################################################################### MobileNet 核心模块 ######## 3*3*1*输入通道 -> BN -> RELU ->    1*1*输入通道*输出通道 -> BN -> RELUdef _depthwise_separable_conv2d(self, inputs, num_filters, width_multiplier,scope, downsample=False):"""depthwise separable convolution 2D function"""num_filters = round(num_filters * width_multiplier)#输出通道数量strides = 2 if downsample else 1#下采样 确定卷积步长with tf.variable_scope(scope):####### 1. 3*3*1*输入通道 卷积 depthwise conv2ddw_conv = depthwise_conv2d(inputs, "depthwise_conv", strides=strides)####### 2. BN 批规范化 batchnormbn = bacthnorm(dw_conv, "dw_bn", is_training=self.is_training)####### 3. relu激活输出relu = tf.nn.relu(bn)####### 4. 普通卷积 1*1*输入通道*输出通道 点卷积 1*1卷积核 pointwise conv2d (1x1)pw_conv = conv2d(relu, "pointwise_conv", num_filters)####### 5. BN 批规范化 batchnormbn = bacthnorm(pw_conv, "pw_bn", is_training=self.is_training)####### 6. relu激活输出return tf.nn.relu(bn)if __name__ == "__main__":# test datainputs = tf.random_normal(shape=[4, 224, 224, 3])# 4张图片 224*224 大小 3通道mobileNet = MobileNet(inputs)# 网络模型输出writer = tf.summary.FileWriter("./logs", graph=tf.get_default_graph())#init = tf.global_variables_initializer()with tf.Session() as sess:sess.run(init)pred = sess.run(mobileNet.predictions)#预测输出print(pred.shape)#打印

9. MobileNet 深度可分解卷积 v2 结构借鉴 ResNet结构

残差模块结构

  f(x) + W*xf(x) 为 2个 3x3的卷积实际中，考虑计算的成本，对残差块做了计算C，即将2个3x3的卷积层替换为 1x1 + 3x3 + 1x1 。新结构中的中间3x3的卷积层首先在一个降维1x1卷积层下减少了计算，然后在另一个1x1的卷积层下做了还原，既保持了精度又减少了计算量。_____________________________________>|                                     +  f(x) + xx-----> 1x1 + 3x3标准 + 1x1 卷积 ----->  压缩”→“卷积提特征”→“扩张”

MobileNet v2 核心模块

  在v1 的 Depth-wise convolution之前多了一个1*1的“扩张”层，目的是为了提升通道数，获得更多特征；最后不采用Relu，而是Linear，目的是防止Relu破坏特征。结合 x (中间3x3DW 步长为1结合x  步长为2时不结合x )1. 步长为1结合x shortcut___________________________________>|                                     -->  f(x) + xx-----> 1x1 + 3x3DW + 1x1 卷积 ----->  “扩张”→“卷积提特征”→ “压缩”ResNet是：压缩”→“卷积提特征”→“扩张”，MobileNetV2则是Inverted residuals,即：“扩张”→“卷积提特征”→ “压缩”2. 步长为2时不结合x x-----> 1x1 + 3x3DW(步长为2) + 1x1 卷积 ----->   输出

MobileNet v2 结构

1. 2D卷积块  =  2D卷积层 + BN + RELU6  3*3*3*32 步长2 32个通道输出
2. 1个自适应残差深度可拆解模块  中间层扩张倍数为1为32*1  16个通道输出
3. 2个自适应残差深度可拆解模块  中间层扩张倍数为6为16*6  24个通道输出
4. 3个自适应残差深度可拆解模块  中间层扩张倍数为6为24*6  32个通道输出
5. 4个自适应残差深度可拆解模块  中间层扩张倍数为6为32*6  64个通道输出
6. 3个自适应残差深度可拆解模块  中间层扩张倍数为6为64*6  96个通道输出
7. 3个自适应残差深度可拆解模块  中间层扩张倍数为6为96*6  160个通道输出
8. 1个自适应残差深度可拆解模块  中间层扩张倍数为6为160*6  320个通道输出
9. 1个 1*1点卷积块 = 1*1 PW点卷积 +  BN + RELU6 1280个通道输出
10. 全局均值 池化 average_pooling2d
11. 1*1 PW点卷积 后 展开成1维
12. softmax 分类得到分类结果

MobileNet v2 代码

ops.py

#-*- coding: utf-8 -*-
# 使用的函数
"""
激活函数 relu6
批规范化BN 减均值 除以方差
2D卷积块  =  2D卷积层 + BN + RELU6
点卷积块  = 1*1 PW点卷积 +  BN + RELU6
DW 深度拆解卷积 depthwise_conv2d  3*3*1*输入通道卷积
自适应残差深度可拆解模块  1x1 + 3x3DW(步长为2) + 1x1 卷积
使用函数库实现 自适应残差深度可拆解模块
全剧均值 池化
展开成1维
0 填充
"""import tensorflow as tfweight_decay=1e-4#################################################################################
# 激活函数 relu6 =  min(max(x, 0), 6)  最小值0 最大值6
# relu = max(x, 0) 最小值0
def relu(x, name='relu6'):return tf.nn.relu6(x, name)################################################################################
# 批规范化BN 减均值 除以方差
def batch_norm(x, momentum=0.9, epsilon=1e-5, train=True, name='bn'):return tf.layers.batch_normalization(x,momentum=momentum,epsilon=epsilon,scale=True,training=train,name=name)######################################
# 2D卷积层           输出通道数量 核尺寸    stride尺寸   初始权重方差
def conv2d(input_, output_dim, k_h, k_w, d_h, d_w, stddev=0.02, name='conv2d', bias=False):with tf.variable_scope(name):# 正态分布初始化权重  [核高 宽 输入通道 输出通道]w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],regularizer=tf.contrib.layers.l2_regularizer(weight_decay),initializer=tf.truncated_normal_initializer(stddev=stddev))# 2d卷积conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')# 偏置if bias:biases = tf.get_variable('bias', [output_dim], initializer=tf.constant_initializer(0.0))conv = tf.nn.bias_add(conv, biases)return conv################################################################################
# 2D卷积块  =  2D卷积层 + BN + RELU6
def conv2d_block(input, out_dim, k, s, is_train, name):with tf.name_scope(name), tf.variable_scope(name):net = conv2d(input, out_dim, k, k, s, s, name='conv2d')#卷积net = batch_norm(net, train=is_train, name='bn')# 批规范化net = relu(net)#激活return net###########################
# 1*1 PW点卷积 1*1的卷积核
def conv_1x1(input, output_dim, name, bias=False):with tf.name_scope(name):return conv2d(input, output_dim, 1,1,1,1, stddev=0.02, name=name, bias=bias)################################################################################
# 点卷积块 = 1*1 PW点卷积 +  BN + RELU6
def pwise_block(input, output_dim, is_train, name, bias=False):with tf.name_scope(name), tf.variable_scope(name):out=conv_1x1(input, output_dim, bias=bias, name='pwb')out=batch_norm(out, train=is_train, name='bn')# 批规范化out=relu(out)# 激活return out################################################################################
#####DW 深度拆解卷积 depthwise_conv2d  3*3*1*输入通道卷积#
def dwise_conv(input, k_h=3, k_w=3, channel_multiplier= 1, strides=[1,1,1,1],padding='SAME', stddev=0.02, name='dwise_conv', bias=False):with tf.variable_scope(name):in_channel=input.get_shape().as_list()[-1]# 输入通道数量# 正态分布初始化权重  [核高 宽 输入通道 输出通道]w = tf.get_variable('w', [k_h, k_w, in_channel, channel_multiplier],regularizer=tf.contrib.layers.l2_regularizer(weight_decay),initializer=tf.truncated_normal_initializer(stddev=stddev))# depthwise_conv2d  3*3*1*输入通道卷积 单个卷积核对所有输入通道卷积后合并conv = tf.nn.depthwise_conv2d(input, w, strides, padding, rate=None,name=None,data_format=None)# 偏置if bias:biases = tf.get_variable('bias', [in_channel*channel_multiplier], initializer=tf.constant_initializer(0.0))conv = tf.nn.bias_add(conv, biases)return conv################################################################################
# 1. 步长为1结合x shortcut
# ___________________________________>
#  |                                     -->  f(x) + x  可能需要对 x做卷积调整 使得 通道一直 可以直接合并
#  x-----> 1x1 + 3x3DW + 1x1 卷积 ----->
#      “扩张”→“卷积提特征”→ “压缩”
#  ResNet是：压缩”→“卷积提特征”→“扩张”，MobileNetV2则是Inverted residuals,即：“扩张”→“卷积提特征”→ “压缩”
#
# 2. 步长为2时不结合x
# x-----> 1x1 + 3x3DW(步长为2) + 1x1 卷积 ----->   输出
# 自适应残差深度可拆解模块
def res_block(input, expansion_ratio, output_dim, stride, is_train, name, bias=False, shortcut=True):with tf.name_scope(name), tf.variable_scope(name):######## 1. pw 1*1 点卷积 #########################bottleneck_dim=round(expansion_ratio*input.get_shape().as_list()[-1])# 中间输出 通道数量随机net = conv_1x1(input, bottleneck_dim, name='pw', bias=bias)#点卷积net = batch_norm(net, train=is_train, name='pw_bn')# 批规范化net = relu(net)# 激活######## 2. dw 深度拆解卷积 depthwise_conv2d  3*3*1*输入通道卷积 ##############net = dwise_conv(net, strides=[1, stride, stride, 1], name='dw', bias=bias)net = batch_norm(net, train=is_train, name='dw_bn')# 批规范化net = relu(net)# 激活######## 3. 1*1 点卷积 pw & linear 无非线性激活relu ############net = conv_1x1(net, output_dim, name='pw_linear', bias=bias)net = batch_norm(net, train=is_train, name='pw_linear_bn')# element wise add, only for stride==1if shortcut and stride == 1:# 需要叠加 x 即 需要残差结构in_dim = int(input.get_shape().as_list()[-1]) # 输入通道 数量if in_dim != output_dim:   # f(x) 和 x通道不一致  f(x) + w*xins = conv_1x1(input, output_dim, name='ex_dim')net = ins + net # f(x) + w*xelse: # f(x) 和 x通道一致net = input + net# f(x) + x# 不需要残差结构 直接输出 f(x)return net#
################################################################################
# 使用函数库实现 自适应残差深度可拆解模块
def separable_conv(input, k_size, output_dim, stride, pad='SAME', channel_multiplier=1, name='sep_conv', bias=False):with tf.name_scope(name), tf.variable_scope(name):in_channel = input.get_shape().as_list()[-1]# 输入通道数量# dw 深度拆解卷积核参数dwise_filter = tf.get_variable('dw', [k_size, k_size, in_channel, channel_multiplier],regularizer=tf.contrib.layers.l2_regularizer(weight_decay),initializer=tf.truncated_normal_initializer(stddev=0.02))# pw 1*1 点卷积核参数pwise_filter = tf.get_variable('pw', [1, 1, in_channel*channel_multiplier, output_dim],regularizer=tf.contrib.layers.l2_regularizer(weight_decay),initializer=tf.truncated_normal_initializer(stddev=0.02))strides = [1,stride, stride,1]# 使用函数库实现 自适应残差深度可拆解模块conv=tf.nn.separable_conv2d(input,dwise_filter,pwise_filter,strides,padding=pad, name=name)# 偏置if bias:biases = tf.get_variable('bias', [output_dim],initializer=tf.constant_initializer(0.0))conv = tf.nn.bias_add(conv, biases)return conv
################################################################################
# 全剧均值 池化
def global_avg(x):with tf.name_scope('global_avg'):net=tf.layers.average_pooling2d(x, x.get_shape()[1:-1], 1)return net
################################################################################
# 展开成1维
def flatten(x):#flattened=tf.reshape(input,[x.get_shape().as_list()[0], -1])  # or, tf.layers.flatten(x)return tf.contrib.layers.flatten(x)
################################################################################
# 0 填充
def pad2d(inputs, pad=(0, 0), mode='CONSTANT'):paddings = [[0, 0], [pad[0], pad[0]], [pad[1], pad[1]], [0, 0]]net = tf.pad(inputs, paddings, mode=mode)return net
© 2018 GitHub, Inc.

MobileNet_v2_tf.py

#-*- coding: utf-8 -*-
# MobileNet v2 模型结构
# 深度可分解卷积
"""
1. 2D卷积块  =  2D卷积层 + BN + RELU6  3*3*3*32 步长2 32个通道输出
2. 1个自适应残差深度可拆解模块  中间层扩张倍数为1为32*1  16个通道输出
3. 2个自适应残差深度可拆解模块  中间层扩张倍数为6为16*6  24个通道输出
4. 3个自适应残差深度可拆解模块  中间层扩张倍数为6为24*6  32个通道输出
5. 4个自适应残差深度可拆解模块  中间层扩张倍数为6为32*6  64个通道输出
6. 3个自适应残差深度可拆解模块  中间层扩张倍数为6为64*6  96个通道输出
7. 3个自适应残差深度可拆解模块  中间层扩张倍数为6为96*6  160个通道输出
8. 1个自适应残差深度可拆解模块  中间层扩张倍数为6为160*6  320个通道输出
9. 1个 1*1点卷积块 = 1*1 PW点卷积 +  BN + RELU6 1280个通道输出
10. 全局均值 池化 average_pooling2d
11. 1*1 PW点卷积 后 展开成1维
12. softmax 分类得到分类结果
"""
import tensorflow as tf
from ops import *## 模型 结构
def mobilenetv2(inputs, num_classes, is_train=True, reuse=False):exp = 6  # 扩张倍数 expansion ratiowith tf.variable_scope('mobilenetv2'):####### 1. 2D卷积块  =  2D卷积层 + BN + RELU6  3*3*3*32 步长2 32个通道输出 ############net = conv2d_block(inputs, 32, 3, 2, is_train, name='conv1_1') # 步长2 size/2 尺寸减半####### 2. 1个自适应残差深度可拆解模块  中间层扩张倍数为1为32*1  16个通道输出 ############net = res_block(net, 1, 16, 1, is_train, name='res2_1')####### 3. 2个自适应残差深度可拆解模块  中间层扩张倍数为6为16*6  24个通道输出 ############net = res_block(net, exp, 24, 2, is_train, name='res3_1')  #  步长2 size/4 尺寸减半net = res_block(net, exp, 24, 1, is_train, name='res3_2')####### 4. 3个自适应残差深度可拆解模块  中间层扩张倍数为6为24*6  32个通道输出 ############net = res_block(net, exp, 32, 2, is_train, name='res4_1')  # 步长2 size/8 尺寸减半net = res_block(net, exp, 32, 1, is_train, name='res4_2')net = res_block(net, exp, 32, 1, is_train, name='res4_3')####### 5. 4个自适应残差深度可拆解模块  中间层扩张倍数为6为32*6  64个通道输出 ############net = res_block(net, exp, 64, 1, is_train, name='res5_1')net = res_block(net, exp, 64, 1, is_train, name='res5_2')net = res_block(net, exp, 64, 1, is_train, name='res5_3')net = res_block(net, exp, 64, 1, is_train, name='res5_4')####### 6. 3个自适应残差深度可拆解模块  中间层扩张倍数为6为64*6  96个通道输出 ############net = res_block(net, exp, 96, 2, is_train, name='res6_1')  # 步长2 size/16 尺寸减半net = res_block(net, exp, 96, 1, is_train, name='res6_2')net = res_block(net, exp, 96, 1, is_train, name='res6_3')####### 7. 3个自适应残差深度可拆解模块  中间层扩张倍数为6为96*6  160个通道输出 ###########net = res_block(net, exp, 160, 2, is_train, name='res7_1')  # 步长2 size/32 尺寸减半net = res_block(net, exp, 160, 1, is_train, name='res7_2')net = res_block(net, exp, 160, 1, is_train, name='res7_3')####### 8. 1个自适应残差深度可拆解模块  中间层扩张倍数为6为160*6  320个通道输出 ##########net = res_block(net, exp, 320, 1, is_train, name='res8_1', shortcut=False)# 不进行残差合并 f(x)####### 9. 1个 1*1点卷积块 = 1*1 PW点卷积 +  BN + RELU6 1280个通道输出 ################net = pwise_block(net, 1280, is_train, name='conv9_1')####### 10. 全局均值 池化 average_pooling2d #########################################net = global_avg(net)####### 11. 1*1 PW点卷积 后 展开成1维 ###############################################logits = flatten(conv_1x1(net, num_classes, name='logits'))####### 12. softmax 分类得到分类结果  ###############################################pred = tf.nn.softmax(logits, name='prob')return logits, pred
© 2018 GitHub, Inc.

10.轻量级网络--ShuffleNet 分组点卷积+通道重排+逐通道卷积

10.1 ResNet 残差网络结合不同层特征

 ________________________________>
|                                 ADD -->  f(x) + x
x-----> 1x1 + 3x3 + 1x1 卷积 ----->

10.2MobileNet 普通点卷积 + 逐通道卷积 + 普通点卷积

1. 步长为1结合x shortcut
___________________________________>
|                                    ADD -->  f(x) + x
x-----> 1x1 + 3x3DW + 1x1 卷积 ----->  “扩张”→“卷积提特征”→ “压缩”
ResNet是：压缩”→“卷积提特征”→“扩张”，MobileNetV2则是Inverted residuals,即：“扩张”→“卷积提特征”→ “压缩”2. 步长为2时不结合x （下采样版本）
x-----> 1x1 + 3x3DW(步长为2) + 1x1 卷积 ----->   输出

10.3 ShuffleNet 普通点卷积变为分组点卷积+通道重排逐通道卷积

普通点卷积时间还是较长

版本1：

___________________________________________________________>
|                                                            ADD -->  f(x) + x
x-----> 1x1分组点卷积 + 通道重排 + 3x3DW + 1x1分组点卷积 ----->

版本2：（特征图降采样）

_____________________3*3AvgPool____________________________>
|                                                            concat -->  f(x) 链接  x
x-----> 1x1分组点卷积 + 通道重排 + 3x3DW步长2 + 1x1分组点卷积 ----->   layers.py

#-*- coding:utf-8 -*-
# 各种网络层实现
#
"""
1. 卷积操作
2. 卷积层  卷积 + BN + RELU + droupout + maxpooling
3. 分组卷积层 每组输入通道数量平分  输出通道数量平分 卷积后 各通道 concat通道扩展合并 + BN + relu激活
4. 通道重排 channel_shuffle 分组再分组 取每个组中的一部分重新排序
5. 逐通道卷积操作 逐通道卷积 每个卷积核 只和输入数据的一个通道卷积
6. 逐通道卷积层  逐通道卷积 + BN 批规范化 + 激活
7. ShuffleNet 核心模块 1x1分组点卷积 + 通道重排 + 3x3DW + 1x1分组点卷积
8. 全连接层之前 最后的卷积之后 摊平操作 (N,H,W,C)----> (N,D)   D = H*W*C
9. 全连接 操作 (N,D)*(D,output_dim) + Baise --> (N,output_dim)
10. 全连接层 全链接 + BN + 激活 + 随机失活dropout
11. 最大值池化
12. 均值池化
13. 权重参数初始化  可带有  L2 正则项
14. 参数记录 均值 方差  最大值 最小值 直方图
"""
import tensorflow as tf
import numpy as np################################################################################################
# 1.  卷积操作  Convolution layer Methods########################################################
def __conv2d_p(name, x, w=None, num_filters=16, kernel_size=(3, 3), padding='SAME', stride=(1, 1),initializer=tf.contrib.layers.xavier_initializer(), l2_strength=0.0, bias=0.0):"""Convolution 2D Wrapper:param name: (string) The name scope provided by the upper tf.name_scope('name') as scope.:param x: (tf.tensor) The input to the layer (N, H, W, C). 批大小 图像尺寸 通道数量:param w: (tf.tensor) pretrained weights (if None, it means no pretrained weights):param num_filters: (integer) No. of filters (This is the output depth) 输出通道大小 卷积核数量:param kernel_size: (integer tuple) The size of the convolving kernel.  卷积核大小:param padding: (string) The amount of padding required.                填充:param stride: (integer tuple) The stride required.                     卷积步长:param initializer: (tf.contrib initializer)  normal or Xavier normal are recommended. 初始化器:param l2_strength:(weight decay) (float) L2 regularization parameter.  L2正则化系数:param bias: (float) Amount of bias. (if not float, it means pretrained bias)#偏置:return out: The output of the layer. (N, H', W', num_filters)"""with tf.variable_scope(name):stride = [1, stride[0], stride[1], 1]# 卷积步长kernel_shape = [kernel_size[0], kernel_size[1], x.shape[-1], num_filters]# 卷积核尺寸 with tf.name_scope('layer_weights'):# 初始化权重if w == None:w = __variable_with_weight_decay(kernel_shape, initializer, l2_strength)# 初始化__variable_summaries(w)# 记录参数with tf.name_scope('layer_biases'):# 初始化偏置if isinstance(bias, float):bias = tf.get_variable('biases', [num_filters], initializer=tf.constant_initializer(bias))__variable_summaries(bias)# 记录参数with tf.name_scope('layer_conv2d'):conv = tf.nn.conv2d(x, w, stride, padding)# 卷积out = tf.nn.bias_add(conv, bias)# 添加偏置return out
###########################################################################################
# 2. 卷积层  卷积 + BN + RELU + droupout + maxpooling########################################
def conv2d(name, x, w=None, num_filters=16, kernel_size=(3, 3), padding='SAME', stride=(1, 1),initializer=tf.contrib.layers.xavier_initializer(), l2_strength=0.0, bias=0.0,activation=None, batchnorm_enabled=False, max_pool_enabled=False, dropout_keep_prob=-1,is_training=True):"""This block is responsible for a convolution 2D layer followed by optional (non-linearity, dropout, max-pooling).Note that: "is_training" should be passed by a correct value based on being in either training or testing.:param name: (string) The name scope provided by the upper tf.name_scope('name') as scope.:param x: (tf.tensor) The input to the layer (N, H, W, C).:param num_filters: (integer) No. of filters (This is the output depth):param kernel_size: (integer tuple) The size of the convolving kernel.:param padding: (string) The amount of padding required.:param stride: (integer tuple) The stride required.:param initializer: (tf.contrib initializer) The initialization scheme, He et al. normal or Xavier normal are recommended.:param l2_strength:(weight decay) (float) L2 regularization parameter.:param bias: (float) Amount of bias.:param activation: (tf.graph operator) The activation function applied after the convolution operation. If None, linear is applied.:param batchnorm_enabled: (boolean) for enabling batch normalization.:param max_pool_enabled:  (boolean) for enabling max-pooling 2x2 to decrease width and height by a factor of 2.:param dropout_keep_prob: (float) for the probability of keeping neurons. If equals -1, it means no dropout:param is_training: (boolean) to diff. between training and testing (important for batch normalization and dropout):return: The output tensor of the layer (N, H', W', C')."""with tf.variable_scope(name) as scope:# 2d卷积conv_o_b = __conv2d_p('conv', x=x, w=w, num_filters=num_filters, kernel_size=kernel_size, stride=stride,padding=padding,initializer=initializer, l2_strength=l2_strength, bias=bias)# BN + 激活if batchnorm_enabled:conv_o_bn = tf.layers.batch_normalization(conv_o_b, training=is_training, epsilon=1e-5)if not activation:conv_a = conv_o_bnelse:conv_a = activation(conv_o_bn)else:if not activation:conv_a = conv_o_belse:conv_a = activation(conv_o_b)# droupout 随机失活def dropout_with_keep():return tf.nn.dropout(conv_a, dropout_keep_prob)# 全部 激活def dropout_no_keep():return tf.nn.dropout(conv_a, 1.0)if dropout_keep_prob != -1:conv_o_dr = tf.cond(is_training, dropout_with_keep, dropout_no_keep)else:conv_o_dr = conv_aconv_o = conv_o_dr# 最大值池化if max_pool_enabled:conv_o = max_pool_2d(conv_o_dr)return conv_o#######################################################################################
# 3. 分组卷积层 ###########################################################################
## 分组卷积 每组输入通道数量平分  输出通道数量平分 卷积后 各通道 concat通道扩展合并 + BN + relu激活
def grouped_conv2d(name, x, w=None, num_filters=16, kernel_size=(3, 3), padding='SAME', stride=(1, 1),initializer=tf.contrib.layers.xavier_initializer(), num_groups=1, l2_strength=0.0, bias=0.0,activation=None, batchnorm_enabled=False, dropout_keep_prob=-1,is_training=True):with tf.variable_scope(name) as scope:sz = x.get_shape()[3].value // num_groups# 每组 通道数量 = 通道总数量/ 分组数# 分组卷积 每组输入通道数量平分  输出通道数量平分conv_side_layers = [conv2d(name + "_" + str(i), x[:, :, :, i * sz:i * sz + sz], w, num_filters // num_groups, kernel_size,padding,stride,initializer,l2_strength, bias, activation=None,batchnorm_enabled=False, max_pool_enabled=False, dropout_keep_prob=dropout_keep_prob,is_training=is_training) for i in range(num_groups)]conv_g = tf.concat(conv_side_layers, axis=-1)# 组 间 通道扩展合并# BN 批规范化 + 激活if batchnorm_enabled:conv_o_bn = tf.layers.batch_normalization(conv_g, training=is_training, epsilon=1e-5)if not activation:conv_a = conv_o_bnelse:conv_a = activation(conv_o_bn)else:if not activation:conv_a = conv_gelse:conv_a = activation(conv_g)return conv_a##########################################################################
# 4. 通道重排 channel_shuffle ########################################
##  |||||| |||||| ||||||  分组
##  || || ||   || || ||    || || || 分组再分组
## 取每个组中的一部分重新排序
def channel_shuffle(name, x, num_groups):with tf.variable_scope(name) as scope:n, h, w, c = x.shape.as_list()# 批数量  特征图尺寸  通道总数量  1*10 x_reshaped = tf.reshape(x, [-1, h, w, num_groups, c // num_groups])# 分组 再 分组 2*5x_transposed = tf.transpose(x_reshaped, [0, 1, 2, 4, 3])output = tf.reshape(x_transposed, [-1, h, w, c])return output
"""
Shuffle的基本思路如下，假设输入2个group，输出5个group| group 1   | group 2  || 1,2,3,4,5  |6,7,8,9,10 |转化为矩阵为2*5的矩阵
1 2 3 4 5
6 7 8 9 10转置矩阵，5*2矩阵
1 6
2 7
3 8
4 9
5 10
摊平矩阵
| group 1   | group 2  | group 3   | group 4  | group 5  || 1,6        |2,7      |3,8        |4,9       |5,10      |
""" ########################################################################
# 5. 逐通道卷积操作 depthwise_conv#######################################
# 逐通道卷积 每个卷积核 只和输入数据的一个通道卷积
def __depthwise_conv2d_p(name, x, w=None, kernel_size=(3, 3), padding='SAME', stride=(1, 1),initializer=tf.contrib.layers.xavier_initializer(), l2_strength=0.0, bias=0.0):with tf.variable_scope(name):stride = [1, stride[0], stride[1], 1]# 卷积步长kernel_shape = [kernel_size[0], kernel_size[1], x.shape[-1], 1]#  初始化权重with tf.name_scope('layer_weights'):if w is None:w = __variable_with_weight_decay(kernel_shape, initializer, l2_strength)__variable_summaries(w)# 初始化偏置with tf.name_scope('layer_biases'):if isinstance(bias, float):bias = tf.get_variable('biases', [x.shape[-1]], initializer=tf.constant_initializer(bias))__variable_summaries(bias)# 逐通道卷积 + 偏置with tf.name_scope('layer_conv2d'):conv = tf.nn.depthwise_conv2d(x, w, stride, padding)# 逐通道卷积out = tf.nn.bias_add(conv, bias)# 偏置return out###############################################################################
# 6. 逐通道卷积层 ########################################################
# 逐通道卷积 + BN 批规范化 + 激活
def depthwise_conv2d(name, x, w=None, kernel_size=(3, 3), padding='SAME', stride=(1, 1),initializer=tf.contrib.layers.xavier_initializer(), l2_strength=0.0, bias=0.0, activation=None,batchnorm_enabled=False, is_training=True):with tf.variable_scope(name) as scope:# 逐通道卷积操作 DW CONVconv_o_b = __depthwise_conv2d_p(name='conv', x=x, w=w, kernel_size=kernel_size, padding=padding,stride=stride, initializer=initializer, l2_strength=l2_strength, bias=bias)# BN 批规范化 + 激活if batchnorm_enabled:conv_o_bn = tf.layers.batch_normalization(conv_o_b, training=is_training, epsilon=1e-5)if not activation:conv_a = conv_o_bnelse:conv_a = activation(conv_o_bn)else:if not activation:conv_a = conv_o_belse:conv_a = activation(conv_o_b)return conv_a############################################################################################################
# 7. ShuffleNet unit methods 核心模块
#
'''
ShuffleNet 普通点卷积 变为 分组点卷积+通道重排   逐通道卷积
普通点卷积时间还是较长版本1：___________________________________________________________>|                                                            ADD -->  f(x) + xx-----> 1x1分组点卷积 + 通道重排 + 3x3DW + 1x1分组点卷积 ----->版本2：（特征图降采样）_____________________3*3AvgPool_________________________________>|                                                               concat -->  f(x) 链接  xx-----> 1x1分组点卷积 + 通道重排 + 3x3DW步长2 + 1x1分组点卷积 ----->
'''
############################################################################################################
def shufflenet_unit(name, x, w=None, num_groups=1, group_conv_bottleneck=True, num_filters=16, stride=(1, 1),l2_strength=0.0, bias=0.0, batchnorm_enabled=True, is_training=True, fusion='add'):# Paper parameters. If you want to change them feel free to pass them as method parameters.activation = tf.nn.relu# relu 激活 max(0,x) with tf.variable_scope(name) as scope:residual = x# 残差模块的 xbottleneck_filters = (num_filters // 4) if fusion == 'add' else (num_filters - residual.get_shape()[3].value) // 4# 分组卷积 ##############if group_conv_bottleneck:# 1*1 分组点卷积bottleneck = grouped_conv2d('Gbottleneck', x=x, w=None, num_filters=bottleneck_filters, kernel_size=(1, 1),padding='VALID',num_groups=num_groups, l2_strength=l2_strength, bias=bias,activation=activation,batchnorm_enabled=batchnorm_enabled, is_training=is_training)# 通道重排shuffled = channel_shuffle('channel_shuffle', bottleneck, num_groups)else:# 普通 点卷积bottleneck = conv2d('bottleneck', x=x, w=None, num_filters=bottleneck_filters, kernel_size=(1, 1),padding='VALID', l2_strength=l2_strength, bias=bias, activation=activation,batchnorm_enabled=batchnorm_enabled, is_training=is_training)shuffled = bottleneck# 填充padded = tf.pad(shuffled, [[0, 0], [1, 1], [1, 1], [0, 0]], "CONSTANT")# 逐通道卷积层depthwise = depthwise_conv2d('depthwise', x=padded, w=None, stride=stride, l2_strength=l2_strength,padding='VALID', bias=bias,activation=None, batchnorm_enabled=batchnorm_enabled, is_training=is_training)# 逐通道卷积层 步长为2 下采样模式if stride == (2, 2):# 残差通道也需要降采样 使用 3*3均值池化核 2*2步长下采样residual_pooled = avg_pool_2d(residual, size=(3, 3), stride=stride, padding='SAME')else:# 非下采样模式 特征图尺寸不变residual_pooled = residual# 再次通过 1*1分组点卷积 +  和残差通路 通道扩展合并 + relu激活if fusion == 'concat':group_conv1x1 = grouped_conv2d('Gconv1x1', x=depthwise, w=None,num_filters=num_filters - residual.get_shape()[3].value,kernel_size=(1, 1),padding='VALID',num_groups=num_groups, l2_strength=l2_strength, bias=bias,activation=None,batchnorm_enabled=batchnorm_enabled, is_training=is_training)return activation(tf.concat([residual_pooled, group_conv1x1], axis=-1))# 再次通过 1*1分组点卷积 +  和残差通路 通道叠加合并 + relu激活elif fusion == 'add':group_conv1x1 = grouped_conv2d('Gconv1x1', x=depthwise, w=None,num_filters=num_filters,kernel_size=(1, 1),padding='VALID',num_groups=num_groups, l2_strength=l2_strength, bias=bias,activation=None,batchnorm_enabled=batchnorm_enabled, is_training=is_training)# 通道叠加合并是 需要保证 x和F(x)的通道数量一致residual_match = residual_pooled# This is used if the number of filters of the residual block is different from that# of the group convolution.if num_filters != residual_pooled.get_shape()[3].value:# 对X 再次卷积 使得其通道数量和 F(x)的通道数量一致residual_match = conv2d('residual_match', x=residual_pooled, w=None, num_filters=num_filters,kernel_size=(1, 1),padding='VALID', l2_strength=l2_strength, bias=bias, activation=None,batchnorm_enabled=batchnorm_enabled, is_training=is_training)return activation(group_conv1x1 + residual_match)else:raise ValueError("Specify whether the fusion is \'concat\' or \'add\'")############################################################################################################
# Fully Connected layer Methods 全连接层 ########################
#########################################################################################################################
# 8. 全连接层之前 最后的卷积之后 摊平 (N,H,W,C)----> (N,D)   D = H*W*C
###### 卷积后得到 (N,H,W,C) ----> (N,D) ----> 全链接 (N,D)*(D,output_dim) ----> (N,output_dim)
def flatten(x):"""Flatten a (N,H,W,C) input into (N,D) output. Used for fully connected layers after conolution layers:param x: (tf.tensor) representing input:return: flattened output"""all_dims_exc_first = np.prod([v.value for v in x.get_shape()[1:]])# D = H*W*Co = tf.reshape(x, [-1, all_dims_exc_first])return o###################################################################
# 9. 全连接 操作 (N,D)*(D,output_dim) + Baise --> (N,output_dim) ##
def __dense_p(name, x, w=None, output_dim=128, initializer=tf.contrib.layers.xavier_initializer(), l2_strength=0.0,bias=0.0):"""Fully connected layer:param name: (string) The name scope provided by the upper tf.name_scope('name') as scope.:param x: (tf.tensor) The input to the layer (N, D).:param output_dim: (integer) It specifies H, the output second dimension of the fully connected layer [ie:(N, H)]:param initializer: (tf.contrib initializer) The initialization scheme, He et al. normal or Xavier normal are recommended.:param l2_strength:(weight decay) (float) L2 regularization parameter.:param bias: (float) Amount of bias. (if not float, it means pretrained bias):return out: The output of the layer. (N, H)"""n_in = x.get_shape()[-1].value# 最后一个参数为 输入通道数量 (N,D) 也就是Dwith tf.variable_scope(name):if w == None:w = __variable_with_weight_decay([n_in, output_dim], initializer, l2_strength)__variable_summaries(w)if isinstance(bias, float):bias = tf.get_variable("layer_biases", [output_dim], tf.float32, tf.constant_initializer(bias))__variable_summaries(bias)output = tf.nn.bias_add(tf.matmul(x, w), bias)return output##############################################################
# 10. 全连接层 全链接 + BN + 激活 + 随机失活dropout
def dense(name, x, w=None, output_dim=128, initializer=tf.contrib.layers.xavier_initializer(), l2_strength=0.0,bias=0.0,activation=None, batchnorm_enabled=False, dropout_keep_prob=-1,is_training=True):"""This block is responsible for a fully connected followed by optional (non-linearity, dropout, max-pooling).Note that: "is_training" should be passed by a correct value based on being in either training or testing.:param name: (string) The name scope provided by the upper tf.name_scope('name') as scope.:param x: (tf.tensor) The input to the layer (N, D).:param output_dim: (integer) It specifies H, the output second dimension of the fully connected layer [ie:(N, H)]:param initializer: (tf.contrib initializer) The initialization scheme, He et al. normal or Xavier normal are recommended.:param l2_strength:(weight decay) (float) L2 regularization parameter.:param bias: (float) Amount of bias.:param activation: (tf.graph operator) The activation function applied after the convolution operation. If None, linear is applied.:param batchnorm_enabled: (boolean) for enabling batch normalization.:param dropout_keep_prob: (float) for the probability of keeping neurons. If equals -1, it means no dropout:param is_training: (boolean) to diff. between training and testing (important for batch normalization and dropout):return out: The output of the layer. (N, H)"""with tf.variable_scope(name) as scope:# 全链接  操作 (N,D)*(D,output_dim) + Baise --> (N,output_dim)dense_o_b = __dense_p(name='dense', x=x, w=w, output_dim=output_dim, initializer=initializer,l2_strength=l2_strength,bias=bias)# BN 批规范化 + relu激活if batchnorm_enabled:dense_o_bn = tf.layers.batch_normalization(dense_o_b, training=is_training, epsilon=1e-5)if not activation:dense_a = dense_o_bnelse:dense_a = activation(dense_o_bn)else:if not activation:dense_a = dense_o_belse:dense_a = activation(dense_o_b)# 随机失活def dropout_with_keep():return tf.nn.dropout(dense_a, dropout_keep_prob)def dropout_no_keep():return tf.nn.dropout(dense_a, 1.0)if dropout_keep_prob != -1:dense_o_dr = tf.cond(is_training, dropout_with_keep, dropout_no_keep)else:dense_o_dr = dense_adense_o = dense_o_drreturn dense_o############################################################################################################
# Pooling Methods 池化方法  最大值池化   均值池化
###########################################################################
# 11. 最大值池化
def max_pool_2d(x, size=(2, 2), stride=(2, 2), name='pooling'):"""Max pooling 2D Wrapper:param x: (tf.tensor) The input to the layer (N,H,W,C).:param size: (tuple) This specifies the size of the filter as well as the stride.:param name: (string) Scope name.:return: The output is the same input but halfed in both width and height (N,H/2,W/2,C)."""size_x, size_y = sizestride_x, stride_y = stridereturn tf.nn.max_pool(x, ksize=[1, size_x, size_y, 1], strides=[1, stride_x, stride_y, 1], padding='VALID',name=name)
###############################
# 12. 均值池化##############
def avg_pool_2d(x, size=(2, 2), stride=(2, 2), name='avg_pooling', padding='VALID'):"""Average pooling 2D Wrapper:param x: (tf.tensor) The input to the layer (N,H,W,C).:param size: (tuple) This specifies the size of the filter as well as the stride.:param name: (string) Scope name.:return: The output is the same input but halfed in both width and height (N,H/2,W/2,C)."""size_x, size_y = sizestride_x, stride_y = stridereturn tf.nn.avg_pool(x, ksize=[1, size_x, size_y, 1], strides=[1, stride_x, stride_y, 1], padding=padding,name=name)############################################################################################################
# 13. 权重参数初始化  可带有  L2 正则项
#######################################
def __variable_with_weight_decay(kernel_shape, initializer, wd):"""Create a variable with L2 Regularization (Weight Decay):param kernel_shape: the size of the convolving weight kernel.:param initializer: The initialization scheme, He et al. normal or Xavier normal are recommended.:param wd:(weight decay) L2 regularization parameter.:return: The weights of the kernel initialized. The L2 loss is added to the loss collection."""w = tf.get_variable('weights', kernel_shape, tf.float32, initializer=initializer)collection_name = tf.GraphKeys.REGULARIZATION_LOSSESif wd and (not tf.get_variable_scope().reuse):weight_decay = tf.multiply(tf.nn.l2_loss(w), wd, name='w_loss')tf.add_to_collection(collection_name, weight_decay)return w#######################################################################
# 14. 参数记录 均值 方差  最大值 最小值 直方图
# Summaries for variables
def __variable_summaries(var):"""Attach a lot of summaries to a Tensor (for TensorBoard visualization).:param var: variable to be summarized:return: None"""with tf.name_scope('summaries'):mean = tf.reduce_mean(var)#均值tf.summary.scalar('mean', mean)with tf.name_scope('stddev'):#方差stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))tf.summary.scalar('stddev', stddev)tf.summary.scalar('max', tf.reduce_max(var))#最大值tf.summary.scalar('min', tf.reduce_min(var))#最小值tf.summary.histogram('histogram', var)#直方图

ShuffleNet_tf.py

#-*- coding:utf-8 -*-
# 轻量级网络--ShuffleNet 分组点卷积+通道重排+逐通道卷积
"""
0. 图像预处理 减去均值 乘以归一化系数
1. conv1 3*3*3*24 卷积 步长2 BN RELU
2. 最大值池化 3*3 步长2
3. 一次 步长为2 非分组点卷积 concate通道扩展合并模块， 再进行3次步长为1的 add通道叠加模块
4. 一次 步长为2 分组点卷积   concate通道扩展合并模块， 再进行7次步长为1的 add通道叠加模块
5. 一次 步长为2 分组点卷积   concate通道扩展合并模块， 再进行3次步长为1的 add通道叠加模块
6. 全局均值池化层 7*7 池化核 步长1
7. 1*1点卷积 输出 类别数量个 卷积特征图
8. 摊平 到 一维
"""
# layers
"""
1. 卷积操作
2. 卷积层  卷积 + BN + RELU + droupout + maxpooling
3. 分组卷积层 每组输入通道数量平分  输出通道数量平分 卷积后 各通道 concat通道扩展合并 + BN + relu激活
4. 通道重排 channel_shuffle 分组再分组 取每个组中的一部分重新排序
5. 逐通道卷积操作 逐通道卷积 每个卷积核 只和输入数据的一个通道卷积
6. 逐通道卷积层  逐通道卷积 + BN 批规范化 + 激活
7. ShuffleNet 核心模块 1x1分组点卷积 + 通道重排 + 3x3DW + 1x1分组点卷积
8. 全连接层之前 最后的卷积之后 摊平操作 (N,H,W,C)----> (N,D)   D = H*W*C
9. 全连接 操作 (N,D)*(D,output_dim) + Baise --> (N,output_dim)
10. 全连接层 全链接 + BN + 激活 + 随机失活dropout
11. 最大值池化
12. 均值池化
13. 权重参数初始化  可带有  L2 正则项
14. 参数记录 均值 方差  最大值 最小值 直方图
"""
import tensorflow as tf
#                ShuffleNet核心模块  2D卷积   最大值池化    均值池化    全链接层    全链接之前的摊平层
from layers import shufflenet_unit, conv2d, max_pool_2d, avg_pool_2d, dense, flattenclass ShuffleNet:"""ShuffleNet is implemented here!"""MEAN = [103.94, 116.78, 123.68]# 个通道 像素值 减去的值 均值NORMALIZER = 0.017# 归一化比例def __init__(self, args):self.args = argsself.X = Noneself.y = Noneself.logits = Noneself.is_training = Noneself.loss = Noneself.regularization_loss = Noneself.cross_entropy_loss = Noneself.train_op = Noneself.accuracy = Noneself.y_out_argmax = Noneself.summaries_merged = None# A number stands for the num_groups# Output channels for conv1 layerself.output_channels = {'1': [144, 288, 576], '2': [200, 400, 800], '3': [240, 480, 960], '4': [272, 544, 1088],'8': [384, 768, 1536], 'conv1': 24}self.__build()# 初始化输入def __init_input(self):batch_size = self.args.batch_size if self.args.train_or_test == 'train' else 1with tf.variable_scope('input'):# 输入图片 Input images 图片数量*长*宽*通道数量self.X = tf.placeholder(tf.float32,[batch_size, self.args.img_height, self.args.img_width,self.args.num_channels])# 数据对应得标签self.y = tf.placeholder(tf.int32, [batch_size])# is_training is for batch normalization and dropout, if they existself.is_training = tf.placeholder(tf.bool)# 改变 图像大小 def __resize(self, x):#双三次插值return tf.image.resize_bicubic(x, [224, 224])# 先进行一次 步长为2 的下采样 concate合并模块， 再进行多次步长为1的 add通道叠加模块def __stage(self, x, stage=2, repeat=3):if 2 <= stage <= 4:stage_layer = shufflenet_unit('stage' + str(stage) + '_0', x=x, w=None,num_groups=self.args.num_groups,group_conv_bottleneck=not (stage == 2),# stage = 2 时 先不进行分组点卷积num_filters=self.output_channels[str(self.args.num_groups)][stage - 2],stride=(2, 2),# concate通道扩展合并fusion='concat', l2_strength=self.args.l2_strength,bias=self.args.bias,batchnorm_enabled=self.args.batchnorm_enabled,is_training=self.is_training)for i in range(1, repeat + 1):stage_layer = shufflenet_unit('stage' + str(stage) + '_' + str(i),x=stage_layer, w=None,num_groups=self.args.num_groups,group_conv_bottleneck=True,# 分组点卷积num_filters=self.output_channels[str(self.args.num_groups)][stage - 2],stride=(1, 1),#ADD 通道叠加fusion='add',l2_strength=self.args.l2_strength,bias=self.args.bias,batchnorm_enabled=self.args.batchnorm_enabled,is_training=self.is_training)return stage_layerelse:raise ValueError("Stage should be from 2 -> 4")# 输出def __init_output(self):with tf.variable_scope('output'):# Lossesself.regularization_loss = tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))self.cross_entropy_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=self.logits, labels=self.y, name='loss'))self.loss = self.regularization_loss + self.cross_entropy_loss# Optimizerupdate_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)with tf.control_dependencies(update_ops):self.optimizer = tf.train.AdamOptimizer(learning_rate=self.args.learning_rate)self.train_op = self.optimizer.minimize(self.loss)# This is for debugging NaNs. Check TensorFlow documentation.self.check_op = tf.add_check_numerics_ops()# Output and Metricsself.y_out_softmax = tf.nn.softmax(self.logits)# softmax 归一化分类self.y_out_argmax = tf.argmax(self.y_out_softmax, axis=-1, output_type=tf.int32)# 最大值得到分类结果self.accuracy = tf.reduce_mean(tf.cast(tf.equal(self.y, self.y_out_argmax), tf.float32))#准确度# 记录参数with tf.name_scope('train-summary-per-iteration'):tf.summary.scalar('loss', self.loss)tf.summary.scalar('acc', self.accuracy)self.summaries_merged = tf.summary.merge_all()def __build(self):self.__init_global_epoch()self.__init_global_step()self.__init_input()# 0. 图像预处理 减去均值 乘以归一化系数##################################with tf.name_scope('Preprocessing'):# 分割成三通道red, green, blue = tf.split(self.X, num_or_size_splits=3, axis=3)# 每个通道 减去均值 乘以归一化系数 后再concat/merge 通道扩展合并preprocessed_input = tf.concat([tf.subtract(blue, ShuffleNet.MEAN[0]) * ShuffleNet.NORMALIZER,tf.subtract(green, ShuffleNet.MEAN[1]) * ShuffleNet.NORMALIZER,tf.subtract(red, ShuffleNet.MEAN[2]) * ShuffleNet.NORMALIZER,], 3)# 1. conv1 3*3*3*24 卷积 步长 2 BN RELU ################################################################# 周围填充 x_padded = tf.pad(preprocessed_input, [[0, 0], [1, 1], [1, 1], [0, 0]], "CONSTANT")######## convconv1 = conv2d('conv1', x=x_padded, w=None, num_filters=self.output_channels['conv1'], kernel_size=(3, 3),stride=(2, 2), l2_strength=self.args.l2_strength, bias=self.args.bias,batchnorm_enabled=self.args.batchnorm_enabled, is_training=self.is_training,activation=tf.nn.relu, padding='VALID')# 2. 最大值池化 3*3 步长2 ##################################################padded = tf.pad(conv1, [[0, 0], [0, 1], [0, 1], [0, 0]], "CONSTANT")max_pool = max_pool_2d(padded, size=(3, 3), stride=(2, 2), name='max_pool')# 3. 一次 步长为2 非分组点卷积 concate通道扩展合并模块， 再进行3次步长为1的 add通道叠加模块stage2 = self.__stage(max_pool, stage=2, repeat=3)# 4. 一次 步长为2 分组点卷积   concate通道扩展合并模块， 再进行7次步长为1的 add通道叠加模块stage3 = self.__stage(stage2, stage=3, repeat=7)# 5. 一次 步长为2 分组点卷积   concate通道扩展合并模块， 再进行3次步长为1的 add通道叠加模块stage4 = self.__stage(stage3, stage=4, repeat=3)# 6. 全局均值池化层 7*7 池化核 步长1global_pool = avg_pool_2d(stage4, size=(7, 7), stride=(1, 1), name='global_pool', padding='VALID')# 7. 1*1点卷积 输出 类别数量个 卷积特征图logits_unflattened = conv2d('fc', global_pool, w=None, num_filters=self.args.num_classes,kernel_size=(1, 1),# 1*1点卷积l2_strength=self.args.l2_strength,bias=self.args.bias,is_training=self.is_training)# 8. 摊平 到 一维self.logits = flatten(logits_unflattened)# 9. 计算误差 self.__init_output()def __init_global_epoch(self):"""Create a global epoch tensor to totally save the process of the training:return:"""with tf.variable_scope('global_epoch'):self.global_epoch_tensor = tf.Variable(-1, trainable=False, name='global_epoch')self.global_epoch_input = tf.placeholder('int32', None, name='global_epoch_input')self.global_epoch_assign_op = self.global_epoch_tensor.assign(self.global_epoch_input)def __init_global_step(self):"""Create a global step variable to be a reference to the number of iterations:return:"""with tf.variable_scope('global_step'):self.global_step_tensor = tf.Variable(0, trainable=False, name='global_step')self.global_step_input = tf.placeholder('int32', None, name='global_step_input')self.global_step_assign_op = self.global_step_tensor.assign(self.global_step_input)

11. SqueezeNet 1*1卷积降低通道数量

11.1 常用的模型压缩技术有：

1. 奇异值分解(singular value decomposition (SVD))
2. 网络剪枝（Network Pruning）：使用网络剪枝和稀疏矩阵
3. 深度压缩（Deep compression）：使用网络剪枝，数字化和huffman编码
4. 硬件加速器（hardware accelerator）
1. 贝叶斯优化
2. 模拟退火
3. 随机搜索
4. 遗传算法

11.2 SqueezeNet 简化网络模型参数的设计

使用以下三个策略来减少SqueezeNet设计参数

1. 使用1∗1卷积代替3∗3 卷积：参数减少为原来的1/9
2. 减少输入通道数量：这一部分使用squeeze layers来实现
3. 将欠采样操作延后，可以给卷积层提供更大的激活图：更大的激活图保留了更多的信息，可以提供更高的分类准确率
其中，1 和 2 可以显著减少参数数量，3 可以在参数数量受限的情况下提高准确率。

SqueezeNet 的核心模块 Fire Module

1. 只使用1∗1 卷积 filter 构建 squeeze convolution layer    减少参数 策略1
2. 使用1∗1 和3∗3 卷积 filter的组合 构建的 expand layer
3. squeeze convolution layer 中 1∗1 卷积 filter数量可调 s1expand layer  中 1∗1 卷积 filter数量可调  e1expand layer  中 3∗3 卷积 filter数量可调  e2
4. s1 < e1 + e2                                          减少参数 策略2

11.3 Fire Module 结构

                                | ----->  1 * 1卷积------|输入----->1 * 1卷积（全部） -->                        ---> concat 通道合并 -------> 输出| ----->  3 * 3卷积  ----|

与Inception module 区别

                             部分输出 ----->  1 * 1卷积------|输入----->1 * 1卷积 ---->                               ---> concat 通道合并 -------> 输出 部分输出 ----->  3 * 3卷积  ----|

11.4 SqueezeNet 网络结构

1. SqueezeNet以卷积层（conv1）开始，
2. 接着使用8个Fire modules (fire2-9)，
3. 最后以卷积层（conv10）结束卷积特征提取
4. 再通过 全局均值池化 + softmax分类得到结果每个fire module中的filter数量逐渐增加，
并且在conv1, fire4, fire8, 和 conv10这几层 之后 使用步长为2的max-pooling，

即将池化层放在相对靠后的位置，这使用了以上的策略（3）

11.5 以fire2模块为例

1. maxpool1层的输出为55∗55∗96 55∗55∗96，一共有96个通道输出。
2. 之后紧接着的Squeeze层有16个1∗1∗96 卷积核，96个通道输入，16个通道输出，输出尺寸为 55*55*16
3. 之后将输出分别送到expand层中的1∗1∗16 （64个）和3∗3∗16（64个）进行处理，注意这里不对16个通道进行切分。
4. 对3∗3∗16 的卷积输入进行尺寸为１的zero padding，分别得到55∗55∗64 和 55∗55∗64 大小相同的两个feature map。
5. 将这两个feature map连接到一起得到55∗55∗128 大小的feature map。SqueezeNet_tf.py

#-*- coding: utf-8 -*-
# 论文
# https://arxiv.org/pdf/1602.07360.pdf
# 论文源码  caffe model
# https://github.com/DeepScale/SqueezeNet
"""
2018/04/27
SqueezeNet的工作为以下几个方面：1. 提出了新的网络架构Fire Module，通过减少参数来进行模型压缩2. 使用其他方法对提出的SqeezeNet模型进行进一步压缩3. 对参数空间进行了探索，主要研究了压缩比和3∗3卷积比例的影响Fire Module 结构----->  1 * 1卷积 RELU -----|输入----->1 * 1卷积（全部） RELU                   ---> concat 通道扩展 -------> 输出----->  3 * 3卷积 RELU  ----|
网络结构
0. conv1  7*7*3*96 7*7卷积 3通道输入 96通道输出 滑动步长2  relu激活
1. 最大值池化 maxpool1 3*3池化核尺寸 滑动步长2
2. fire2 squeeze层 16个输出通道， expand层  64个输出通道
3. fire3 squeeze层 16个输出通道， expand层  64个输出通道
4. fire4 squeeze层 32个输出通道， expand层  128个输出通道
5. maxpool4 最大值池化 maxpool1 3*3池化核尺寸 滑动步长2
6. fire5 squeeze层 32个输出通道， expand层  128个输出通道
7. fire6 squeeze层 48个输出通道， expand层  192个输出通道
8. fire7 squeeze层 48个输出通道， expand层  196个输出通道
9. fire8 squeeze层 64个输出通道， expand层  256个输出通道
10. maxpool8 最大值池化 maxpool1 3*3池化核尺寸 滑动步长2
11. fire9 squeeze层 64个输出通道， expand层  256个输出通道
12. 随机失活层 dropout 神经元以0.5的概率不输出
13. conv10 类似于全连接层 1*1的点卷积 将输出通道 固定为 1000类输出 + relu激活
14. avgpool10 13*13的均值池化核尺寸 13*13*1000 ---> 1*1*1000
15. softmax归一化分类概率输出
"""import tensorflow as tf
import numpy as npclass SqueezeNet(object):def __init__(self, inputs, nb_classes=1000, is_training=True):######## 0. conv1  7*7*3*96 7*7卷积 3通道输入 96通道输出 滑动步长2  relu激活 ###############net = tf.layers.conv2d(inputs, 96, [7, 7], strides=[2, 2],padding="SAME", activation=tf.nn.relu,name="conv1")#### 224*224*3 >>>> 112*112*96######## 1. 最大值池化 maxpool1 3*3池化核尺寸 滑动步长2net = tf.layers.max_pooling2d(net, [3, 3], strides=[2, 2],name="maxpool1")## 112*112*96 >>>> 56*56*96######## 2. fire2 squeeze层 16个输出通道， expand层  64个输出通道 #########################net = self._fire(net, 16, 64, "fire2")#### 56*56*96  >>>> 56*56*128   64+64=128 ######## 3. fire3 squeeze层 16个输出通道， expand层  64个输出通道 #########################net = self._fire(net, 16, 64, "fire3")#### 56*56*128 >>>> 56*56*128######## 4. fire4 squeeze层 32个输出通道， expand层  128个输出通道 #######################net = self._fire(net, 32, 128, "fire4")### 56*56*128 >>>> 56*56*256   128+128=256######## 5. maxpool4 最大值池化 maxpool1 3*3池化核尺寸 滑动步长2net = tf.layers.max_pooling2d(net, [3, 3], strides=[2, 2],name="maxpool4")## 56*56*256 >>> 28*28*256 ######## 6. fire5 squeeze层 32个输出通道， expand层  128个输出通道 #######################net = self._fire(net, 32, 128, "fire5")### 28*28*256 >>> 28*28*256######## 7. fire6 squeeze层 48个输出通道， expand层  192个输出通道 #########################net = self._fire(net, 48, 192, "fire6")### 28*28*256 >>> 28*28*384    192+192=384######## 8. fire7 squeeze层 48个输出通道， expand层  196个输出通道 #########################net = self._fire(net, 48, 192, "fire7")### 28*28*584 >>> 28*28*384######## 9. fire8 squeeze层 64个输出通道， expand层  256个输出通道 #########################net = self._fire(net, 64, 256, "fire8")### 28*28*584 >>> 28*28*512    256+256=512######## 10. maxpool8 最大值池化 maxpool1 3*3池化核尺寸 滑动步长2net = tf.layers.max_pooling2d(net, [3, 3], strides=[2, 2],name="maxpool8")## 28*28*512 >>> 14*14*512######## 11. fire9 squeeze层 64个输出通道， expand层  256个输出通道 #########################net = self._fire(net, 64, 256, "fire9")######## 12. 随机失活层 dropout 神经元以0.5的概率不输出######################################net = tf.layers.dropout(net, 0.5, training=is_training)######## 13. conv10 类似于全连接层 1*1的点卷积 将输出通道 固定为 1000类输出 + relu激活 ########net = tf.layers.conv2d(net, 1000, [1, 1], strides=[1, 1],padding="SAME", activation=tf.nn.relu,name="conv10")######## 14. avgpool10 13*13的均值池化核尺寸 13*13*1000 ---> 1*1*1000net = tf.layers.average_pooling2d(net, [13, 13], strides=[1, 1],name="avgpool10")# squeeze the axis  1*1*1000 ---> 1*1000net = tf.squeeze(net, axis=[1, 2])self.logits = net#逻辑值######### 15. softmax归一化分类概率输出 ###################################################self.prediction = tf.nn.softmax(net)# softmax归一化分类概率输出# Fire Module 结构#                             ----->  1 * 1卷积------|#         输入----->1 * 1卷积（全部）                   ---> concat 通道合并 -------> 输出#                             ----->  3 * 3卷积  ----|def _fire(self, inputs, squeeze_depth, expand_depth, scope):with tf.variable_scope(scope):# squeeze 层 1 * 1卷积 + relusqueeze = tf.layers.conv2d(inputs, squeeze_depth, [1, 1],strides=[1, 1], padding="SAME",activation=tf.nn.relu, name="squeeze")# expand  层 1*1 卷积 +relu    3*3卷积  + reluexpand_1x1 = tf.layers.conv2d(squeeze, expand_depth, [1, 1],strides=[1, 1], padding="SAME",activation=tf.nn.relu, name="expand_1x1")expand_3x3 = tf.layers.conv2d(squeeze, expand_depth, [3, 3],strides=[1, 1], padding="SAME",activation=tf.nn.relu, name="expand_3x3")# 通道扩展 concatreturn tf.concat([expand_1x1, expand_3x3], axis=3)if __name__ == "__main__":# 随机初始化测试数据  32张图片 224*224尺寸 3通道inputs = tf.random_normal([32, 224, 224, 3])# 经过 SqueezeNet网络得到输出net = SqueezeNet(inputs)# 打印预测结果 print(net.prediction)

深度学习目标检测 RCNN F-RCNN SPP yolo-v1 v2 v3 残差网络ResNet MobileNet SqueezeNet ShuffleNet相关推荐

【论文解读】深度学习目标检测的开山鼻祖 |R-CNN详解 | 两阶段目标检测代表
目录前言目标检测近年里程碑深度学习目标检测 1 R-CNN简介 1.1 何为R-CNN? 1.2 摘要 1.2.1 论文综述 1.2.2 优点汇总 1.2.3 缺点汇总 2. RCNN网络结构解 ...
深度学习目标检测系列：RCNN系列算法图解
在生活中,经常会遇到这样的一种情况,上班要出门的时候,突然找不到一件东西了,比如钥匙.手机或者手表等.这个时候一般在房间翻一遍各个角落来寻找不见的物品,最后突然一拍大脑,想到在某一个地方,在整个过程中 ...
深度学习目标检测数据VisDrone2019（to yolo / voc / coco）---MMDetection数据篇
1.VisDrone2019数据集介绍配备摄像头的无人机(或通用无人机)已被快速部署到广泛的应用领域,包括农业.航空摄影.快速交付和监视.因此,从这些平台上收集的视觉数据的自动理解要求越来越高,这使 ...
计算机视觉与深度学习 | 目标检测综述（RCNN、RPN、YOLOv1 v2 v3、FPN、Mask RCNN、SSD代码类）
博主github:https://github.com/MichaelBeechan 博主CSDN:https://blog.csdn.net/u011344545 github链接:https:// ...
cnn 句向量_深度学习目标检测Fast R-CNN论文解读
前言我们知道,R-CNN存在着以下几个问题: 分步骤进行,过程繁琐.Selective Search生成候选区域region proposal->fine tune预训练网络->针对每个 ...
动手学深度学习——目标检测 SSD R-CNN Fast R-CNN Faster R-CNN Mask R-CNN
来源:13.4. 锚框 - 动手学深度学习 2.0.0-beta1 documentation 目标检测:锚框算法原理与实现.SSD.R-CNN_神洛华的博客目录目标检测简介目标检测模型编辑 ...
深度学习目标检测之RCNN、SPP-net、Fast RCNN、Faster RCNN
一.目标检测介绍目标检测(目标提取)是一种基于目标几何和统计特征的图像分割,将目标的分割和识别合二为一,主要是明确从图中看到了什么物体.它们分别在什么位置.传统的目标检测方法一般分为三个阶段:首先在 ...
深度学习目标检测详细解析以及Mask R-CNN示例
深度学习目标检测详细解析以及Mask R-CNN示例本文详细介绍了R-CNN走到端到端模型的Faster R-CNN的进化流程,以及典型的示例算法Mask R-CNN模型.算法如何变得更快,更强! ...
深度学习目标检测模型全面综述：Faster R-CNN、R-FCN和SSD
为什么80%的码农都做不了架构师?>>> Faster R-CNN.R-FCN 和 SSD 是三种目前最优且应用最广泛的目标检测模型,其他流行的模型通常与这三者类似.本文介绍了 ...

深度学习目标检测 RCNN F-RCNN SPP yolo-v1 v2 v3 残差网络ResNet MobileNet SqueezeNet ShuffleNet

1. RCNN 区域卷积神经网络

RCNN 网络思想：

RCNN存在三个明显的问题：

2. SPP网络 空间金字塔网络

SPP网络思想：

3. Fast RCNN 快速-区域卷积神经网络

Fast RCNN 网络思想：

问题在以下方面得到改进：

4. Faster RCNN 更快速-区域卷积神经网络

Faster RCNN 网络思想：

5.YOLO - v1 一个阶段的卷积神经网络

YOLO 算法描述为：

YOLO算法的问题有以下几点：

预测格子设置及方案：

yolo-v1 结构

yolo-v1 tensorflow实现

6.YOLO - v2 k-means聚类得到候选框 Passthrough层结合不同尺度层特征

YOLO - v2 在 v1上的改进：

yolo-v2 结构

yolo-v2 tensorflow实现

6.YOLO - v3 ResNet残差网络结构，使得网络深度更深 FPN层结合不同尺度层特征

YOLO - v3 在 v2上的改进：

7. 残差网络 ResNet f(x) + W*x

ResNet 网络思想：

ResNet核心模块

ResNet 网络模型

ResNet 网络代码

8. MobileNet 深度可分解卷积 v1

模型简化思想

MobileNet-v1 网络结构

MobileNet v1 代码

github代码

9. MobileNet 深度可分解卷积 v2 结构 借鉴 ResNet结构

残差模块 结构

MobileNet v2 核心模块

MobileNet v2 结构

MobileNet v2 代码

10.轻量级网络--ShuffleNet 分组点卷积+通道重排+逐通道卷积

10.1 ResNet 残差网络 结合不同层特征

10.2MobileNet 普通点卷积 + 逐通道卷积 + 普通点卷积

10.3 ShuffleNet 普通点卷积 变为分组点卷积+通道重排 逐通道卷积

版本1：

版本2：（特征图降采样）

11. SqueezeNet 1*1卷积降低通道数量

11.1 常用的模型压缩技术有：

11.2 SqueezeNet 简化网络模型参数的 设计

使用以下三个策略来减少SqueezeNet设计参数

SqueezeNet 的核心模块 Fire Module

11.3 Fire Module 结构

11.4 SqueezeNet 网络结构

11.5 以fire2模块为例

深度学习目标检测 RCNN F-RCNN SPP yolo-v1 v2 v3 残差网络ResNet MobileNet SqueezeNet ShuffleNet相关推荐

最新文章

热门文章

`2. SPP网络空间金字塔网络`

9. MobileNet 深度可分解卷积 v2 结构借鉴 ResNet结构

残差模块结构

10.1 ResNet 残差网络结合不同层特征

10.3 ShuffleNet 普通点卷积变为分组点卷积+通道重排逐通道卷积

11.2 SqueezeNet 简化网络模型参数的设计