resnet论文解读及代码实现

ResNet是2015年ImageNet比赛的冠军，将识别错误率降低到了3.6%，这个结果甚至超出了正常人眼识别的精度。

通过观察学习vggnet等经典神经网络模型，我们可以发现随着深度学习的不断发展，模型的层数越来越多，网络结构也越来越复杂。那么是否加深网络结构，就一定会得到更好的效果呢？

从理论上来说，假设新增加的层都是恒等映射，只要原有的层学出跟原模型一样的参数，那么深模型结构就能达到原模型结构的效果。换句话说，原模型的解只是新模型的解的子空间，在新模型解的空间里应该能找到比原模型解对应的子空间更好的结果。但是实践表明，增加网络的层数之后，训练误差往往不降反升。这是因为梯度消失或是梯度爆炸导致的。

Kaiming He等人提出了残差网络ResNet来解决上述所说的退化问题，其基本思想如图6所示。

图6(a)：表示增加网络的时候，将x映射成y＝F(x)输出。
图6(b)：对图6(a)作了改进，输出y＝F(x)＋x。这时不是直接学习输出特征y的表示，而是学习y－x。
- 如果想学习出原模型的表示，只需将F(x)的参数全部设置为0，则是恒等映射。
- F(x)＝y-x也叫做残差项，如果x→y的映射接近恒等映射，图6(b)中通过学习残差项也比图6(a)学习完整映射形式更加容易。

图6 残差块设计思想
图6(b)的结构是残差网络的基础，这种结构也叫做残差块（residual block）。输入x通过跨层连接，能更快的向前传播数据，或者向后传播梯度。残差块的具体设计方案如图7 所示，这种设计方案也成称作瓶颈结构（BottleNeck）。

图7 残差块结构示意图

深层残差学习

残差恒等映射

F(x，{Wi})表示待学习的恒等映射。x和F维数必须相等，如果不相等，可以通过快捷连接执行线性投影Ws来匹配维度或是额外填充0输入以增加维数，stride=2.

下图为VGG-19，Plain-34(没有使用residual结构)和ResNet-34网络结构对比：

对上图进行如下说明：

相比于VGG-19，ResNet没有使用全连接层，而使用了全局平均池化层，可以减少大量参数。VGG-19大量参数集中在全连接层；
ResNet-34中跳跃连接“实线”为identity mapping和residual mapping通道数相同，“虚线”部分指的是两者通道数不同，需要使用1x1卷积调整通道维度，使其可以相加。

论文一共提出5种ResNet网络，网络参数统计表如下：

实现

基本设置遵循以前的经典网络，可以看原文的参考文献。在每次卷积之后和激活之前，我们采用批量归一化(BN) ，紧接着，我们初始化权重，并从头开始训练所有普通/残差网。我们使用最小批量为256的SGD。当误差平稳时，学习率从0.1开始除以10，模型被训练达到600000次迭代。我们使用0.0001的重量衰减和0.9的动量。
下图表示出了ResNet-50的结构，一共包含49层卷积和1层全连接，所以被称为ResNet-50。

由上图可知，Resnet的训练或验证误差都小于简单网络，同一Resnet结构，随着网络层次的加深，误差越来越小。

代码实现

本文代码用keras实现Resnet_18

from keras.layers import Input
from keras.layers import Conv2D, MaxPool2D, Dense, BatchNormalization, Activation, add, GlobalAvgPool2D
from keras.models import Model
from keras import regularizers
from keras.utils import plot_model
from keras import backend as Kdef conv2d_bn(x, nb_filter, kernel_size, strides=(1, 1), padding='same'):"""conv2d -> batch normalization -> relu activation"""x = Conv2D(nb_filter, kernel_size=kernel_size,strides=strides,padding=padding,kernel_regularizer=regularizers.l2(0.0001))(x)x = BatchNormalization()(x)x = Activation('relu')(x)return xdef shortcut(input, residual):"""shortcut连接，也就是identity mapping部分。"""input_shape = K.int_shape(input)residual_shape = K.int_shape(residual)stride_height = int(round(input_shape[1] / residual_shape[1]))stride_width = int(round(input_shape[2] / residual_shape[2]))equal_channels = input_shape[3] == residual_shape[3]identity = input# 如果维度不同，则使用1x1卷积进行调整if stride_width > 1 or stride_height > 1 or not equal_channels:identity = Conv2D(filters=residual_shape[3],kernel_size=(1, 1),strides=(stride_width, stride_height),padding="valid",kernel_regularizer=regularizers.l2(0.0001))(input)return add([identity, residual])def basic_block(nb_filter, strides=(1, 1)):"""基本的ResNet building block，适用于ResNet-18和ResNet-34."""def f(input):conv1 = conv2d_bn(input, nb_filter, kernel_size=(3, 3), strides=strides)residual = conv2d_bn(conv1, nb_filter, kernel_size=(3, 3))return shortcut(input, residual)return fdef residual_block(nb_filter, repetitions, is_first_layer=False):"""构建每层的residual模块，对应论文参数统计表中的conv2_x -> conv5_x"""def f(input):for i in range(repetitions):strides = (1, 1)if i == 0 and not is_first_layer:strides = (2, 2)input = basic_block(nb_filter, strides)(input)return inputreturn fdef resnet_18(input_shape=(224,224,3), nclass=1000):"""build resnet-18 model using keras with TensorFlow backend.:param input_shape: input shape of network, default as (224,224,3):param nclass: numbers of class(output shape of network), default as 1000:return: resnet-18 model"""input_ = Input(shape=input_shape)conv1 = conv2d_bn(input_, 64, kernel_size=(7, 7), strides=(2, 2))pool1 = MaxPool2D(pool_size=(3, 3), strides=(2, 2), padding='same')(conv1)conv2 = residual_block(64, 2, is_first_layer=True)(pool1)conv3 = residual_block(128, 2, is_first_layer=True)(conv2)conv4 = residual_block(256, 2, is_first_layer=True)(conv3)conv5 = residual_block(512, 2, is_first_layer=True)(conv4)pool2 = GlobalAvgPool2D()(conv5)output_ = Dense(nclass, activation='softmax')(pool2)model = Model(inputs=input_, outputs=output_)model.summary()return modelif __name__ == '__main__':model = resnet_18()plot_model(model, 'ResNet-18.png')  # 保存模型图

用Resnet实现cifar100的分类

import keras
import argparse
import numpy as np
from keras.datasets import cifar10, cifar100
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.normalization import BatchNormalization
from keras.layers import Conv2D, Dense, Input, add, Activation, GlobalAveragePooling2D
from keras.callbacks import LearningRateScheduler, TensorBoard, ModelCheckpoint
from keras.models import Model
from keras import optimizers, regularizers
from keras import backend as K# set GPU memory
if('tensorflow' == K.backend()):import tensorflow as tffrom keras.backend.tensorflow_backend import set_sessionconfig = tf.ConfigProto()config.gpu_options.allow_growth = Truesess = tf.Session(config=config)# set parameters via parser
parser = argparse.ArgumentParser()
parser.add_argument('-b','--batch_size', type=int, default=128, metavar='NUMBER',help='batch size(default: 128)')
parser.add_argument('-e','--epochs', type=int, default=200, metavar='NUMBER',help='epochs(default: 200)')
parser.add_argument('-n','--stack_n', type=int, default=5, metavar='NUMBER',help='stack number n, total layers = 6 * n + 2 (default: 5)')
parser.add_argument('-d','--dataset', type=str, default="cifar10", metavar='STRING',help='dataset. (default: cifar10)')args = parser.parse_args()stack_n            = args.stack_n
layers             = 6 * stack_n + 2
num_classes        = 10
img_rows, img_cols = 32, 32
img_channels       = 3
batch_size         = args.batch_size
epochs             = args.epochs
iterations         = 50000 // batch_size + 1
weight_decay       = 1e-4def color_preprocessing(x_train,x_test):x_train = x_train.astype('float32')x_test = x_test.astype('float32')mean = [125.307, 122.95, 113.865]std  = [62.9932, 62.0887, 66.7048]for i in range(3):x_train[:,:,:,i] = (x_train[:,:,:,i] - mean[i]) / std[i]x_test[:,:,:,i] = (x_test[:,:,:,i] - mean[i]) / std[i]return x_train, x_testdef scheduler(epoch):if epoch < 81:return 0.1if epoch < 122:return 0.01return 0.001def residual_network(img_input,classes_num=10,stack_n=5):def residual_block(x,o_filters,increase=False):stride = (1,1)if increase:stride = (2,2)o1 = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(x))conv_1 = Conv2D(o_filters,kernel_size=(3,3),strides=stride,padding='same',kernel_initializer="he_normal",kernel_regularizer=regularizers.l2(weight_decay))(o1)o2  = Activation('relu')(BatchNormalization(momentum=0.9, epsilon=1e-5)(conv_1))conv_2 = Conv2D(o_filters,kernel_size=(3,3),strides=(1,1),padding='same',kernel_initializer="he_normal",kernel_regularizer=regularizers.l2(weight_decay))(o2)if increase:projection = Conv2D(o_filters,kernel_size=(1,1),strides=(2,2),padding='same',kernel_initializer="he_normal",kernel_regularizer=regularizers.l2(weight_decay))(o1)block = add([conv_2, projection])else:block = add([conv_2, x])return block# build model ( total layers = stack_n * 3 * 2 + 2 )# stack_n = 5 by default, total layers = 32# input: 32x32x3 output: 32x32x16x = Conv2D(filters=16,kernel_size=(3,3),strides=(1,1),padding='same',kernel_initializer="he_normal",kernel_regularizer=regularizers.l2(weight_decay))(img_input)# input: 32x32x16 output: 32x32x16for _ in range(stack_n):x = residual_block(x,16,False)# input: 32x32x16 output: 16x16x32x = residual_block(x,32,True)for _ in range(1,stack_n):x = residual_block(x,32,False)# input: 16x16x32 output: 8x8x64x = residual_block(x,64,True)for _ in range(1,stack_n):x = residual_block(x,64,False)x = BatchNormalization(momentum=0.9, epsilon=1e-5)(x)x = Activation('relu')(x)x = GlobalAveragePooling2D()(x)# input: 64 output: 10x = Dense(classes_num,activation='softmax',kernel_initializer="he_normal",kernel_regularizer=regularizers.l2(weight_decay))(x)return xif __name__ == '__main__':print("========================================") print("MODEL: Residual Network ({:2d} layers)".format(6*stack_n+2)) print("BATCH SIZE: {:3d}".format(batch_size)) print("WEIGHT DECAY: {:.4f}".format(weight_decay))print("EPOCHS: {:3d}".format(epochs))print("DATASET: {:}".format(args.dataset))print("== LOADING DATA... ==")# load dataglobal num_classesif args.dataset == "cifar100":num_classes = 100(x_train, y_train), (x_test, y_test) = cifar100.load_data()else:(x_train, y_train), (x_test, y_test) = cifar10.load_data()y_train = keras.utils.to_categorical(y_train, num_classes)y_test = keras.utils.to_categorical(y_test, num_classes)print("== DONE! ==\n== COLOR PREPROCESSING... ==")# color preprocessingx_train, x_test = color_preprocessing(x_train, x_test)print("== DONE! ==\n== BUILD MODEL... ==")# build networkimg_input = Input(shape=(img_rows,img_cols,img_channels))output    = residual_network(img_input,num_classes,stack_n)resnet    = Model(img_input, output)# print model architecture if you need.# print(resnet.summary())# set optimizersgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)resnet.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])# set callbackcbks = [TensorBoard(log_dir='./resnet_{:d}_{}/'.format(layers,args.dataset), histogram_freq=0),LearningRateScheduler(scheduler)]# dump checkpoint if you need.(add it to cbks)# ModelCheckpoint('./checkpoint-{epoch}.h5', save_best_only=False, mode='auto', period=10)# set data augmentationprint("== USING REAL-TIME DATA AUGMENTATION, START TRAIN... ==")datagen = ImageDataGenerator(horizontal_flip=True,width_shift_range=0.125,height_shift_range=0.125,fill_mode='constant',cval=0.)datagen.fit(x_train)# start trainingresnet.fit_generator(datagen.flow(x_train, y_train,batch_size=batch_size),steps_per_epoch=iterations,epochs=epochs,callbacks=cbks,validation_data=(x_test, y_test))resnet.save('resnet_{:d}_{}.h5'.format(layers,args.dataset))