最终目标：为课题组做一个人脸打卡系统。

项目1阶段已更新完毕，如有错误请不吝赐教~

注：作为一个负责任的博主，虽然过了好几个月了，但必须要说明一下，文中代码有bug，cv2.resize时，参数输入是 宽×高×通道。

遂在inference做高斯金字塔的代码中：scale_img=cv2.resize(img,((int(img.shape[0]*scale)),(int(img.shape[1]*scale))))

更改为：scale_img=cv2.resize(img,((int(img.shape[1]*scale)),(int(img.shape[0]*scale))))。

同时作为一个懒惰的博主，这只是个入门，弄着玩的所以懒得再把测试的地方重新修改博客了，反正修改bug后结果是挺好的。

2019.3.20

项目1阶段：基于Alexnet的人脸检测

项目环境及配置：Window10+GTX 1060+Python3.6+Anaconda5.2.0+Tensorflow1.9+gpu

1、数据获取

人脸数据网上的资源非常非常的多，如下附上了几个获取数据的网站。在下载和查找数据的时候需要同时百度一下这个数据的使用方法，如：原始图片在哪，含有标注的.txt文件或.mat文件都在哪等，否则自己瞎搞很容易浪费时间。

这两个链接包含了大多数开源的人脸数据

http://www.cvmart.net/community/article/detail/148

https://blog.csdn.net//chenriwei2/article/details/50631212

项目1阶段训练与验证使用的人脸数据集均为AFLW，如下可下载，其中aflw-images-0.tar.gz,aflw-images-2.tar.gz,aflw-images-3.tar.gz这三个文件是图片数据，windows下解压的时候把后面那个.gz删掉就可以解压了。AFLW的标注在aflw/data下的aflw.txt文件。至于标注的格式我认为是矩形框的左上角x、y坐标值和长度w高度h。

AFLW：https://pan.baidu.com/s/14McWGRZCnOcP2SBhK2ryrQ

非人脸数据截取于人脸数据，具体截取代码见后文。

2、数据集制作

本项目数据集使用Tensorflow的TFRecord格式，比较方便。

首先要从ALFW数据集种截取出人脸图片和非人脸图片，ALFW总共有21123张图片。25000多张人脸，我要做出人脸和非人脸数据训练集规模1:3，5W人脸对15万非人脸，测试集规模1:1，2W人脸和2万非人脸，此程序运行完人脸图片可得到7.2W张左右，非人脸可得到17.2W张左右。

正常的流程此时需要引入IoU的概念。

IoU代码如下：

def IoU(box,boxes):"""box为实际，boxes为人脸"""face_area=boxes[:,2]*boxes[:,3]actual_area=(box[2]-box[0])**2x1=np.maximum(box[0],boxes[:,0])y1=np.maximum(box[1],boxes[:,1])    x2=np.minimum(box[2],boxes[:,2]+boxes[:,0])y2=np.minimum(box[3],boxes[:,3]+boxes[:,1]) w=np.maximum(0,x2-x1+1)h=np.maximum(0,y2-y1+1)inter_area=w*hreturn inter_area/(face_area+actual_area-inter_area)

由于ALFW的标注非常烦人，一张图片内有几张人脸，可是他的标注是一个个给出的，还好一张图片的几乎都是挨着的，运行了3638.5秒，设定的阈值IoU<0.3为非人脸样本，IoU>0.65为人脸样本。

制作数据集代码如下：

import os
import cv2
import time
import numpy as np
from numpy.random import randint
import randomdef gen_pic(path,boxes,j,k):img=cv2.imread(path)if img is None:return j,kh1,w1,_=img.shapeif(min(w1,h1)<=210):return j,knum=8while(num):size=randint(100,min(w1,h1)/2)x1=randint(1,w1-size)y1=randint(1,h1-size)box=np.array([x1,y1,x1+size,y1+size])_boxes=boxes.copy()if(np.max(IoU(box,np.array(_boxes)))<0.3):resize_img=cv2.resize(img[y1:y1+size,x1:x1+size,:],(224,224))cv2.imwrite('E:\\friedhelm\\Data\\Alexnet_data\\non-face\\non_face_%d.jpg'%(j),resize_img)            j=j+1num=num-1for bbox in boxes:x=max(bbox[0],0)y=max(bbox[1],0)w=max(bbox[2],0)h=max(bbox[3],0)num=2 while(num):size=randint(np.floor(0.5*min(w,h))+1,np.ceil(1.5*max(w,h))+2)x1=randint(np.floor(0.5*x)+1,np.ceil(1.5*x)+2)y1=randint(np.floor(0.5*y)+1,np.ceil(1.5*y)+2)box=np.array([x1,y1,x1+size,y1+size])_bbox=np.array(bbox).reshape(1,-1)if(IoU(box,_bbox)>0.65):resize_img=cv2.resize(img[y1:y1+size,x1:x1+size,:],(224,224))cv2.imwrite('E:\\friedhelm\\Data\\Alexnet_data\\face\\face_%d.jpg'%(k),resize_img)            k=k+1if(random.choice([0,1])):resize_img=cv2.flip(resize_img,1)cv2.imwrite('E:\\friedhelm\\Data\\Alexnet_data\\face\\face_%d.jpg'%(k),resize_img)             k=k+1num=num-1  return j,kdef IoU(box,boxes):"""box为实际，boxes为人脸"""face_area=boxes[:,2]*boxes[:,3]actual_area=(box[2]-box[0])**2x1=np.maximum(box[0],boxes[:,0])y1=np.maximum(box[1],boxes[:,1])    x2=np.minimum(box[2],boxes[:,2]+boxes[:,0])y2=np.minimum(box[3],boxes[:,3]+boxes[:,1]) w=np.maximum(0,x2-x1+1)h=np.maximum(0,y2-y1+1)inter_area=w*hreturn inter_area/(face_area+actual_area-inter_area)def main():begin=time.time()addr='E:\\friedhelm\\AFLW\\'with open(r'E:\friedhelm\alfw.txt') as f:j=0k=0boxes=[]path_compare="1"for line in f.readlines():line=line.strip().split()path=addr+line[0]x=int(line[1])if x<0:x=0y=int(line[2])if y<0:y=0w=int(line[3])h=int(line[4])if(path_compare==path):boxes.append([x,y,w,h])else:if(path_compare!="1"):j,k=gen_pic(path_compare,boxes,j,k)boxes=[]path_compare=pathboxes.append([x,y,w,h])print(time.time()-begin) if __name__=='__main__':if not os.path.exists("E:\\friedhelm\\Data\\Alexnet_data\\face"):os.makedirs("E:\\friedhelm\\Data\\Alexnet_data\\face")if not os.path.exists("E:\\friedhelm\\Data\\Alexnet_data\\non-face"):os.makedirs("E:\\friedhelm\\Data\\Alexnet_data\\non-face")   main()

其中有一些需要注意的地方：

1、在制作出负样本时会出现很多的人脸，不要删，那也是负样本一张图片里如果人脸占得比重太小（或者缺眼睛少鼻子）就认为不是人脸，这就是IoU的作用；

2、在截取时可以参考截取数据的trick中负样本的选择，不要偷懒选风景图片，模型基本学不到太多东西的；

随后我们将这两种图片转化为tensorflow的文件格式：TFRecord。

制作过程中发现了一个问题，兴许是anaconda的bug由jupyter notebook制作出的文件总会出现DATALOSS的错误，详情请看我的另一篇问题解答，使用Spyder IDE即可解决：

程序如下：

# -*- coding: utf-8 -*-import os
import tensorflow as tf
import cv2
import time
import randombegin=time.time()
classes=['non-face','face']face=os.listdir('E:\\friedhelm\\Data\\Alexnet_data\\face\\')
others=os.listdir('E:\\friedhelm\\Data\\Alexnet_data\\non-face\\')
random.shuffle(face)
random.shuffle(others)kkk=0
print('train_start')
writer = tf.python_io.TFRecordWriter("E:\\friedhelm\\Data\\face_train_224.tfrecords")
for i in range(1,1001):if i%50==0:        print(i)print(time.time()-begin)for index, name in enumerate(classes):class_path='E:\\friedhelm\\Data\\'+name+'\\'if name=='face':docu_name=facep=list(range(50*(i-1),50*i))else:docu_name=othersp=list(range(150*(i-1),150*i))for q in p:img_name=docu_name[q]img_path = class_path + img_nameimg = cv2.imread(img_path)if img is None:continueimg = cv2.resize(img,(224, 224))img_raw = img.tobytes()              #将图片转化为原生bytesexample = tf.train.Example(features=tf.train.Features(feature={"label": tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),'img': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))}))writer.write(example.SerializeToString())  #序列化为字符串kkk+=1
writer.close()
print('train_end')
print(time.time()-begin)
print(kkk)kkk=0
print('test_start')
writer = tf.python_io.TFRecordWriter("E:\\friedhelm\\Data\\face_test_224.tfrecords")
for i in range(1001,1401):if i%50==0:        print(i)print(time.time()-begin)for index, name in enumerate(classes):class_path='E:\\friedhelm\\Data\\'+name+'\\'if name=='face':docu_name=faceelse:docu_name=othersfor img_name in docu_name[50*(i-1):50*i]:img_path = class_path + img_nameimg = cv2.imread(img_path)if img is None:continueimg = cv2.resize(img,(224, 224))img_raw = img.tobytes()              #将图片转化为原生bytesexample = tf.train.Example(features=tf.train.Features(feature={"label": tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),'img': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))}))writer.write(example.SerializeToString())  #序列化为字符串kkk+=1
writer.close()
print('test_end')
print(time.time()-begin)
print(kkk)

此程序运行了7373秒，如果有人想制作测试数据的话也可以根据这个照葫芦画瓢做一下，我程序这里对validate和test混淆了，大家不要乱就好。但是一定要注意，train、validate、test这三种数据一定不要重复，否则数据泄露后点子不好那对你的系统就是核打击。。运行完以后，流过无数血泪的我习惯性的检查数据错误与否，使用如下所示的TFRecord的测试程序，epochs设为1，全部运行一遍，如果没有错误就可以使用这个数据集了。

import tensorflow as tf
import numpy as np
import cv2filename_queue = tf.train.string_input_producer(['E:\\friedhelm\\face_detection_VGG\\face_test_224.tfrecords'],shuffle=True,num_epochs=1)reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue) #返回文件名和文件features = tf.parse_single_example(serialized_example,features={'label':tf.FixedLenFeature([],tf.int64),'img':tf.FixedLenFeature([],tf.string),})
img=tf.decode_raw(features['img'],tf.uint8)
label=tf.cast(features['label'],tf.int32)
img = tf.reshape(img, [224,224,3])
#     img=img_preprocess(img)
min_after_dequeue = 10000
batch_size = 64
capacity = min_after_dequeue + 10 * batch_size
image_batch, label_batch = tf.train.shuffle_batch([img, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue,num_threads=7)  i=0
with tf.Session() as sess:sess.run((tf.global_variables_initializer(),tf.local_variables_initializer()))coord = tf.train.Coordinator()threads = tf.train.start_queue_runners(sess=sess,coord=coord)    while(1):i=i+1if(i%9==1):print(sess.run(label_batch))

至此我们的第二阶段数据集制作就告一段落了。

3、AlexNet模型训练

AlexNet论文地址如下：http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

由于AlexNet中LRN层已经被淘汰，所以说被我弃用了，其中还加了BN层加速收敛，全连接层换成全卷积层。代码大部分使用原生API，我对Keras和slim有点不感冒。

训练代码如下：

import tensorflow as tf
import numpy as np
import time
from tensorflow.python.framework import graph_util
import cv2begin=time.time()# def img_preprocess(img):#     return tf.image.convert_image_dtype(img, dtype=tf.float32)def train_doc():filename_queue = tf.train.string_input_producer(['E:\\friedhelm\\Data\\face_train_224.tfrecords'],shuffle=True)reader = tf.TFRecordReader()_, serialized_example = reader.read(filename_queue) #返回文件名和文件features = tf.parse_single_example(serialized_example,features={'label':tf.FixedLenFeature([],tf.int64),'img':tf.FixedLenFeature([],tf.string),})img=tf.decode_raw(features['img'],tf.uint8)label=tf.cast(features['label'],tf.int32)img = tf.reshape(img, [224,224,3])   min_after_dequeue = 10000batch_size = 64capacity = min_after_dequeue + 10 * batch_sizeimage_batch, label_batch = tf.train.shuffle_batch([img, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue,num_threads=7)  return image_batch,label_batchdef test_doc():filename_queue = tf.train.string_input_producer(['E:\\friedhelm\\Data\\face_test_224.tfrecords'],shuffle=True,seed=77)reader = tf.TFRecordReader()_, serialized_example = reader.read(filename_queue) #返回文件名和文件features = tf.parse_single_example(serialized_example,features={'label':tf.FixedLenFeature([],tf.int64),'img':tf.FixedLenFeature([],tf.string),})img=tf.decode_raw(features['img'],tf.uint8)label=tf.cast(features['label'],tf.int32)img = tf.reshape(img, [224,224,3])   min_after_dequeue = 1000batch_size = 16capacity = min_after_dequeue + 10 * batch_sizeimage_batch, label_batch = tf.train.shuffle_batch([img, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue,num_threads=7)  return image_batch,label_batch    def model(x,prob,is_training):with tf.variable_scope('conv1',reuse=tf.AUTO_REUSE):weight1=tf.get_variable('weight',[11,11,3,64],initializer=tf.truncated_normal_initializer(stddev=0.1))bias1=tf.get_variable('bias',[64],initializer=tf.constant_initializer(0.1))        conv1=tf.nn.conv2d(x,weight1,strides=[1,4,4,1],padding='SAME')he1=tf.nn.bias_add(conv1,bias1)bn1=tf.layers.batch_normalization(he1,training=is_training)relu1=tf.nn.relu(bn1)       with tf.variable_scope('pool1',reuse=tf.AUTO_REUSE):  pool1=tf.nn.max_pool(relu1,ksize=[1,3,3,1],strides=[1,2,2,1],padding='VALID')with tf.variable_scope('conv2',reuse=tf.AUTO_REUSE):weight2=tf.get_variable('weight',[5,5,64,192],initializer=tf.truncated_normal_initializer(stddev=0.1))bias2=tf.get_variable('bias',[192],initializer=tf.constant_initializer(0.1))        conv2=tf.nn.conv2d(pool1,weight2,strides=[1,1,1,1],padding='SAME')he2=tf.nn.bias_add(conv2,bias2)bn2=tf.layers.batch_normalization(he2,training=is_training)relu2=tf.nn.relu(bn2)     with tf.variable_scope('pool2',reuse=tf.AUTO_REUSE):  pool2=tf.nn.max_pool(relu2,ksize=[1,3,3,1],strides=[1,2,2,1],padding='VALID')with tf.variable_scope('conv3',reuse=tf.AUTO_REUSE):weight3=tf.get_variable('weight',[3,3,192,384],initializer=tf.truncated_normal_initializer(stddev=0.1))bias3=tf.get_variable('bias',[384],initializer=tf.constant_initializer(0.1))        conv3=tf.nn.conv2d(pool2,weight3,strides=[1,1,1,1],padding='SAME')he3=tf.nn.bias_add(conv3,bias3)mean3,vias3=tf.nn.moments(he3,0)bn3=tf.layers.batch_normalization(he3,training=is_training)relu3=tf.nn.relu(bn3)    with tf.variable_scope('conv4',reuse=tf.AUTO_REUSE):weight4=tf.get_variable('weight',[3,3,384,256],initializer=tf.truncated_normal_initializer(stddev=0.1))bias4=tf.get_variable('bias',[256],initializer=tf.constant_initializer(0.1))        conv4=tf.nn.conv2d(relu3,weight4,strides=[1,1,1,1],padding='SAME')he4=tf.nn.bias_add(conv4,bias4)bn4=tf.layers.batch_normalization(he4,training=is_training)relu4=tf.nn.relu(bn4)          with tf.variable_scope('conv5',reuse=tf.AUTO_REUSE):weight5=tf.get_variable('weight',[3,3,256,256],initializer=tf.truncated_normal_initializer(stddev=0.1))bias5=tf.get_variable('bias',[256],initializer=tf.constant_initializer(0.1))        conv5=tf.nn.conv2d(relu4,weight5,strides=[1,1,1,1],padding='SAME')he5=tf.nn.bias_add(conv5,bias5)bn5=tf.layers.batch_normalization(he5,training=is_training)relu5=tf.nn.relu(bn5)    with tf.variable_scope('pool3',reuse=tf.AUTO_REUSE):  pool3=tf.nn.max_pool(relu5,ksize=[1,3,3,1],strides=[1,2,2,1],padding='VALID')with tf.variable_scope('fc1',reuse=tf.AUTO_REUSE):     weight6=tf.get_variable('weight',[6,6,256,1024],initializer=tf.truncated_normal_initializer(stddev=0.1))bias6=tf.get_variable('bias',[1024],initializer=tf.constant_initializer(0.1))        conv6=tf.nn.conv2d(pool3,weight6,strides=[1,1,1,1],padding='VALID')he6=tf.nn.bias_add(conv6,bias6)bn6=tf.layers.batch_normalization(he6,training=is_training)relu6=tf.nn.relu(bn6)     fc1=tf.nn.dropout(relu6,prob)with tf.variable_scope('fc2',reuse=tf.AUTO_REUSE):     weight7=tf.get_variable('weight',[1,1,1024,512],initializer=tf.truncated_normal_initializer(stddev=0.1))bias7=tf.get_variable('bias',[512],initializer=tf.constant_initializer(0.1))        conv7=tf.nn.conv2d(fc1,weight7,strides=[1,1,1,1],padding='VALID')he7=tf.nn.bias_add(conv7,bias7)bn7=tf.layers.batch_normalization(he7,training=is_training)relu7=tf.nn.relu(bn7)     fc2=tf.nn.dropout(relu7,prob)with tf.variable_scope('fc3',reuse=tf.AUTO_REUSE):     weight8=tf.get_variable('weight',[1,1,512,2],initializer=tf.truncated_normal_initializer(stddev=0.1))bias8=tf.get_variable('bias',[2],initializer=tf.constant_initializer(0.1))        conv8=tf.nn.conv2d(fc2,weight8,strides=[1,1,1,1],padding='VALID')logit=tf.nn.bias_add(conv8,bias8,name='he')     return logit  def train(x,y_,prob,is_training):y=model(x,prob,is_training)_y=tf.reshape(y,(-1,2))output=tf.nn.softmax(_y,name='logit') with tf.name_scope('train_loss'):
#         loss_all=tf.add(-y_*tf.log(y_to_loss+1e-9),-(1-y_)*tf.log(1-y_to_loss+1e-9))
#         y_=tf.stop_gradient(y_)loss_all=tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_,logits=_y)loss=tf.reduce_mean(loss_all,name='train_loss')tf.summary.scalar('loss',loss) update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)with tf.control_dependencies(update_ops):opt=tf.train.AdamOptimizer(0.01).minimize(loss)    with tf.name_scope('test_accuracy'):test_accuracy=tf.reduce_mean(tf.cast(tf.equal(y_,tf.argmax(output,1)),tf.float32),name='test_accuracy')tf.summary.scalar('test_accuracy',test_accuracy) saver=tf.train.Saver()merged=tf.summary.merge_all() image_batch,label_batch=train_doc()image_batch1,label_batch1=test_doc()    with tf.Session() as sess:sess.run((tf.global_variables_initializer(),tf.local_variables_initializer()))coord = tf.train.Coordinator()threads = tf.train.start_queue_runners(sess=sess,coord=coord)writer_train=tf.summary.FileWriter('E:\\friedhelm\\face_detection_VGG\\train\\',sess.graph)writer_test=tf.summary.FileWriter('E:\\friedhelm\\face_detection_VGG\\test\\')for i in range(100001):img,label=sess.run([image_batch,label_batch])label=np.reshape(label,(64))    sess.run(opt,feed_dict={x:img,y_:label,prob:0.7,is_training:True})if(i%100==0):img1,label1=sess.run([image_batch1,label_batch1])label1=np.reshape(label1,(16))summary=sess.run(merged,feed_dict={x:img,y_:label,prob:0.7,is_training:True})writer_train.add_summary(summary,i) summary,_=sess.run([merged,test_accuracy],feed_dict={x:img1,y_:label1,prob:1,is_training:False})writer_test.add_summary(summary,i) if(i%1000==0):print('次数',i)    print('output',sess.run(output,feed_dict={x:img,y_:label,prob:0.7,is_training:True}))print(label)print('test_accuracy',sess.run(test_accuracy,feed_dict={x:img1,y_:label1,prob:1,is_training:False}))               print('loss',sess.run(loss,feed_dict={x:img,y_:label,prob:0.7,is_training:True}))               print('time',time.time()-begin)if(i<=50000):saver.save(sess,"E:\\friedhelm\\model\\model_5W.ckpt")else: saver.save(sess,"E:\\friedhelm\\model\\model_10W.ckpt")writer_train.close()writer_test.close()def main():with tf.name_scope('input'):x=tf.placeholder(tf.float32,name='x')y_=tf.placeholder(tf.int64,name='y')prob=tf.placeholder(tf.float32,name='prob')is_training = tf.placeholder(tf.bool,name='is_train')    train(x,y_,prob,is_training)if __name__=='__main__':main()
# tensorboard --logdir=C:\\Users\\312\\Desktop\\
# tensorboard --logdir=train:"E:\\friedhelm\\face_detection_VGG\\train\\",test:"E:\\friedhelm\\face_detection_VGG\\test\\"

训练时间约1W秒。

关于代码的解释：

1、在建立模型并更换全卷积层的时候需要特别注意上一pooling层的输出大小，感受野与你的stride有关，和size无关；

2、我设置的label不是one-hot的，是稀疏的tf.int64格式的0和1；

3、关于BN层，没有BN层的话收敛特别慢，BN层的解释可以看看别人的博客。tensorflow的BN如果要用low-level的API的话需要自己写滑动平均，而且还要写tf.cond判断，tf.layers.batch_normalization(he2,training=is_training,fused=False)是现成的，通过查看源码可以深入理解一下；

4、文件读取时需要先run出来再feed进去，必须同时run出来，否则出来的数据会对应不上，这是tensorflow的机制决定的（眼瞎的我踩了两天坑一直以为是读取的代码错了，直到我看到了两个run。。）；

5、模型中以后用到的tensor的名称要提前指定好，否则将来用的时候会发现还需要重新训练；

6、从文件中读取出的img一定要经过tf.reshape操作，否则会报shape的错误，我看很多网上的教程里都没有写这个；

7、如果在训练时长时间读取不了文件流的数据，兴许是TFRecord格式的文件损坏了，再生成一份就好了；

8、关于Tensorboard的问题，可参考博文。

关于训练时的坑：

具体的理论大家都懂，但是实践的时候就会遇到一堆的错误，这里有必要重点写一下训练时我遇到的坑。

1、关于loos不变的问题，如果遇到训练时遇到loss=0.69XX几乎不变时，这时说明你的模型根本啥都没学到，判断是不是人脸的概率还是0.5，网上有帖子说这个事，此时第一点想到的就是要去看看自己的数据有没有问题，img和label是否对应，其次就是激活函数和损失函数写错了，这两点都检查过确实都没问题，那就是优化器和学习率的问题，改一改学习率，换一换优化器；

2、关于图片输入预处理的问题，对输入图片进行归一化（注意不是标准化）可以解决由于其他原因导致训练loss发散的问题，但不是根本的解决办法，我尝试过次方法，只能保证不发散，但是模型学不到东西；

3、关于损失函数选择的问题，尽量用tensorflow内置的损失函数，了解损失函数的定义，以及其要求的输入输出，softmax损失函数可参考博文；

4、关于参数初始化的问题，这个是个很玄学的问题，如果遇到训练平清可尝试更换参数分布，本人一般使用截尾正态分布来初始化参数，特别好用的Xavier初始化方法还尚未用到。

其他的坑要不别人都写了，要不就是我暂时还没有踩到，暂时总结到这吧。。

训练的结果如下：

其中红色为训练值，蓝色为验证值。

可见经过2W次迭代训练集已经明显收敛，4万次迭代时验证集达到最优。这里可以明显看到，AlexNet的模型容量还是不足，对于一些极端样本依旧无法辨认。

4、人脸检测

具体的代码参照了唐宇迪大佬的视频,首先输入图片的纯手工造高斯金字塔：

import tensorflow as tf
import numpy as np
import cv2
import matplotlib.pyplot as plt
import timebegin=time.time()total_box=[]
scales=[]
img=cv2.imread(r'C:\\Users\\312\\Desktop\\image00004.jpg')# img=cv2.resize(img,(224,224))
factor=0.9
# large=5000/max(img.shape[0:2])
scale=10
small=10*min(img.shape[0:2])
i=j=0
while small>=224:scales.append(scale)scale*=factorsmall*=factorj+=1
print(j)graph_path='E:\\friedhelm\\model\\model_10W.ckpt.meta'
model_path='E:\\friedhelm\\model\\model_10W.ckpt'
saver=tf.train.import_meta_graph(graph_path)
blue = (0, 255, 0) with tf.Session() as sess: saver.restore(sess,model_path)graph = tf.get_default_graph()for scale in scales:scale_img=cv2.resize(img,((int(img.shape[0]*scale)),(int(img.shape[1]*scale))))boxes=featuremap(scale_img,scale,sess,graph)if(boxes):for box in boxes:total_box.append(box)i+=1print(i)k=NMS(total_box)
print(time.time()-begin)

其中featuremap为特征图函数，具体代码如下，经过这个函数可得到人脸框：

def featuremap(img,scale,sess,graph):boundingBox=[]blue = (0, 255, 0) stride=32x=graph.get_tensor_by_name("input/x:0")y=graph.get_tensor_by_name("input/prob:0")p=graph.get_tensor_by_name("input/is_train:0")sliding= graph.get_tensor_by_name("logit:0")img1=np.reshape(img,(-1,img.shape[0],img.shape[1],img.shape[2]))a=sliding.eval(feed_dict={x:img1,y:1,p:False})c=0d=0for prob in a:if (c*32+224<img.shape[0]):c+=1else: c=0d+=1if prob[1]>0.85:boundingBox.append([float(c*stride)/scale,float( d*stride)/scale, float(c*stride+227)/scale, float(d*stride+227)/scale,prob[1]])return boundingBox

随后经过如下NMS函数，这个函数在人脸识别上还是蛮重要的，真心建议大家一定要看懂，理解它每一步都干了啥，像我一样，特别明白，然后就忘了。。

def NMS(box):if len(box) == 0:return []#xmin, ymin, xmax, ymax, score, cropped_img, scalebox.sort(key=lambda x :x[4])box.reverse()pick = []x_min = np.array([box[i][0] for i in range(len(box))],np.float32)y_min = np.array([box[i][1] for i in range(len(box))],np.float32)x_max = np.array([box[i][2] for i in range(len(box))],np.float32)y_max = np.array([box[i][3] for i in range(len(box))],np.float32)area = (x_max-x_min)*(y_max-y_min)idxs = np.array(range(len(box)))while len(idxs) > 0:i = idxs[0]pick.append(i)xx1 = np.maximum(x_min[i],x_min[idxs[1:]])yy1 = np.maximum(y_min[i],y_min[idxs[1:]])xx2 = np.minimum(x_max[i],x_max[idxs[1:]])yy2 = np.minimum(y_max[i],y_max[idxs[1:]])w = np.maximum(xx2-xx1,0)h = np.maximum(yy2-yy1,0)overlap = (w*h)/(area[idxs[1:]] + area[i] - w*h)idxs = np.delete(idxs, np.concatenate(([0],np.where(((overlap >= 0.5) & (overlap <= 1)))[0]+1)))return [box[i] for i in pick]

经过NMS函数后就出来检测的结果了。

采用如下代码测试：

blue = (255, 0, 0)
for a in k:             cv2.rectangle(img,(int(a[0]),int(a[1])), (int(a[2]), int(a[3])),blue,3,8,0)
plt.imshow(img)
plt.show()

结果如下：

其实还是之前说过的，模型容量不足，训练样本不足，训练样本选择不好，导致这样的检测结果。

1、模型容量不足或训练样本不足，模型对极端样本的拟合太差，在测试曲线上也能看到，碰到部分极端样本结果特别差，还有种可能是极端训练样本太少，最近的论文都提出了针对这类样本的hard sample算法，在复现MTCNN时我会顺便学习和使用这种方法，图中很多地方会误判很多地方为人脸，个人认为可能负样本质量很差的缘故；

2、训练样本选择不好，对部分脸部模型也判定为人脸，所以同一张脸上即使经过NMS滤过了将近80%的人脸框还是有好几个框，从检测结果看出模型对人脸的置信度高达99.999999%，模型的训练对一般样本的拟合程度还是不错的；

3、检测时间太长，一张图片用GPU需要1~3秒的时间，真是要了血命了。

5、总结

本文基本算是个入门文章，自己走了一遍流程。文内主要提到了一些我遇到的坑，对于CNN理论方面并没有涉及，现在CNN的理论烂大街，大家可以随意百度，我也不在此赘述了。其中尤其重要的就是特征图的问题，大家一定要搞透，对于以后还想弄目标检测的未来大佬们，fast和faster-rcnn都是你们要踩的坑，我估计是没时间喽。。

在做具体系统的时候，地址变量以及模型参数变量都要封装起来留接口，本文就是走个流程，所以代码比较凌乱，尤其是模型那块，因为复制粘贴的也挺快的就懒得改成模块化了。

文中不懂的地方可以查找fast和faster-rcnn的论文，以及百度，当然唐宇迪大神的视频内还是有一定的讲解的。

在接下来的文章，我会争取复现《Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks》这篇论文，使用MTCNN。该论文目前是我看到的人脸识别相当不错的论文，希望弄出的结果不要像现在这么烂（由于本人只是兴趣爱好以及项目需求，更好的论文兴许没有看到，如有推荐，不胜感激！）

github地址：https://github.com/friedhelm739/Face-Detection-with-AlexNet

2018.12.28

完

人脸检测与识别：AlexNet人脸检测相关推荐

python红绿灯检测opencv识别红绿灯信号灯检测
python红绿灯检测opencv识别红绿灯信号灯检测交通信号灯的检测与识别是无人驾驶与辅助驾驶必不可少的一部分,其识别精度直接关乎智能驾驶的安全.一般而言,在实际的道路场景中采集的交通信号灯图像具 ...
cvpr2020 人脸检测与识别_Python人脸检测识别实例教程
前言随着科技的发展,人脸识别技术在许多领域得到的非常广泛的应用,手机支付.银行身份验证.手机人脸解锁等等. 识别废话少说,这里我们使用 opencv 中自带了 haar人脸特征分类器,利用训练好的 ...
人脸检测和识别：人脸检测（Python）
在静态图像或者视频中检测人脸的操作非常相似.视频人脸检测只是从摄像头读出每帧图像,然后采用静态图像中的人脸检测方法进行检测.至于视频人脸检测涉及到的其他处理过程,例如轨迹跟踪,将在以后完善.本文只介绍 ...
python毕业设计深度学习卫星遥感图像检测与识别 opencv 目标检测
文章目录 0 前言 1 课题背景 2 实现效果 3 Yolov5算法 4 数据处理和训练 0 前言
测一测！中科视拓免费开放口罩人脸检测与识别技术
全民抗疫形势下,口罩已成为复工复产的标配.对于人脸识别技术厂商而言,两个应用需求应运而生: 1.检测人脸是否佩戴口罩: 2.在戴口罩的情况下依旧能够实现高精度人脸识别. 疫情初期,中科视拓紧急研发口罩 ...
基于emgucv的人脸检测及识别
文章目录前言一.窗体设计二.代码部分三.结果四.总结前言这是该系列文章的第二篇.本篇文章主要实现的功能是对人脸进行检测及识别.人脸检测部分用的算法是级联分类器,人脸识别部分用的LBPH算 ...
Qt之OpenCV人脸检测以及识别
简介最近做了一个人脸检测以及识别的程序,很多的文章都有比较详细的叙述,可以自行查找.但是个人觉得大部分文章都太细致了以至于初学者无法快速领会主干(不是否认质量),是侧重点问题.所以结合我遇到了一些问 ...
pytorch 实现人脸检测与识别
pytorch + opencv 实现人脸检测与识别准备工作人脸检测 opencv实现人脸检测卷积神经网络 CNN 实现人脸检测数据导入 CNN模型训练人脸检测存在的问题人脸识别获取数 ...
Jetson Nano 从入门到实战（转载）（案例：Opencv配置、人脸检测、二维码检测）
目录 1. Jetson Nano简介 2. Jetson Nano环境配置 2.1 开箱配件介绍 2.2 烧录系统 2.3 开机和基本设置 2.4 开发环境配置 2.4.1 更新源和软件 2.4.2 ...
Jetson Nano 从入门到实战（案例：Opencv配置、人脸检测、二维码检测）
目录 1. Jetson Nano简介 2. Jetson Nano环境配置 2.1 开箱配件介绍 2.2 烧录系统 2.3 开机和基本设置 2.4 开发环境配置 2.4.1 更新源和软件 2.4.2 ...

人脸检测与识别：AlexNet人脸检测