
对计算机而言,能够“看到”的是图像被编码之后的数字,但它很难解高层语义概念,比如图像或者视频帧中出现目标的是人还是物体,更无法定位目标出现在图像中哪个区域。目标检测的主要目的是让计算机可以自动识别图片或者视频帧中所有目标的类别,并在该目标周围绘制边界框,标示出每个目标的位置,如 图1 所示。


  • 图1(a)是图像分类任务,只需识别出这是一张斑马的图片(但需要先选出候选区域)。

  • 图1(b)是目标检测任务,不仅要识别出这是一张斑马的图片,还要标出图中斑马的位置。


图像分类处理基本流程,先使用卷积神经网络提取图像特征,然后再用这些特征预测分类概率,根据训练样本标签建立起分类损失函数,开启端到端的训练,如 图2 所示。


但对于目标检测问题,按照 图2 的流程则行不通。因为在图像分类任务中,对整张图提取特征的过程中没能体现出不同目标之间的区别,最终也就没法分别标示出每个物体所在的位置。





  • 如图3(a)所示:A在图片左上角位置,B遍历除A之外的所有位置,生成矩形框A1B1, …, A1Bn, …

  • 如图3(b)所示:A在图片中间某个位置,B遍历A右下方所有位置,生成矩形框AkB1, …, AkBn, …


只要我们对每个候选区域的分类足够的准确,则一定能找到跟实际物体足够接近的区域来。穷举法也许能得到正确的预测结果,但其计算量也是非常巨大的,在100*100图像尺寸下其所生成的总的候选区域数目,总数将会达到2.5×10^7 7个,如此多的候选区域使得这种方法几乎没有什么实用性。但是通过这种方式,我们可以看出,假设分类任务完成的足够完美,从理论上来讲检测任务也是可以解决的,亟待解决的问题是如何设计出合适的方法来产生候选区域。


  • 2013年,Ross Girshick 等人于首次将CNN的方法应用在目标检测任务上,他们使用传统图像算法selective search产生候选区域,取得了极大的成功,这就是对目标检测领域影响深远的区域卷积神经网络(R-CNN)模型。

  • 2015年,Ross Girshick 对此方法进行了改进,提出了Fast RCNN模型。通过将不同区域的物体共用卷积层的计算,大大缩减了计算量,提高了处理速度,而且还引入了调整目标物体位置的回归方法,进一步提高了位置预测的准确性。

  • 2015年,Shaoqing Ren 等人提出了Faster RCNN模型,提出了RPN的方法来产生物体的候选区域,这一方法里面不再需要使用传统的图像处理算法来产生候选区域,进一步提升了处理速度。

  • 2017年,Kaiming He 等人于提出了Mask RCNN模型,只需要在Faster RCNN模型上添加比较少的计算量,就可以同时实现目标检测和物体实例分割两个任务。

以上都是基于R-CNN系列的著名模型,对目标检测方向的发展有着较大的影响力。此外,还有一些其他模型,比如SSD、YOLO(1, 2, 3)、R-FCN等也都是目标检测领域流行的模型结构。


  • 图像检测基础概念:介绍与目标检测任相关的基本概念,包括边界框、锚框和交并比等。



边界框(bounding box)

检测任务需要同时预测物体的类别和位置,因此需要引入一些跟位置相关的概念。通常使用边界框(bounding box,bbox)来表示物体的位置,边界框是正好能包含住物体的矩形框,如 图4 所示,图中人脸对应的边界框。



,这样的边界框也被称为真实框(ground truth box),如 图4 所示,图中画出了人像所对应的真实框。模型会对目标物体可能出现的位置进行预测,由模型预测出的边界框则称为预测框(prediction box)。


  1. 不同的代码使用的可能是不同方式。

  2. 图片坐标的原点在左上角,x轴向右为正方向,y轴向下为正方向。

要完成一项检测任务,我们通常希望模型能够根据输入的图片,输出一些预测的边界框,以及边界框中所包含的物体的类别或者说属于某个类别的概率,例如【L ,P x1,y1,x2,y2】



锚框与物体边界框不同,是由人们假想出来的一种框。先设定好锚框的大小和形状,再以图像上某一个点为中心画出矩形框。,以像素点[300, 500]为中心可以使用下面的程序生成3个框,其中锚框A1跟目标区域非常接近。

# 画图展示如何绘制边界框和锚框import numpy as npimport matplotlib.pyplot as pltimport matplotlib.patches as patchesfrom matplotlib.image import imreadimport math

# 定义画矩形框的程序def draw_rectangle(currentAxis, bbox, edgecolor = 'k', facecolor = 'y', fill=False, linestyle='-'):# currentAxis,坐标轴,通过plt.gca()获取# bbox,边界框,包含四个数值的list, [x1, y1, x2, y2]# edgecolor,边框线条颜色# facecolor,填充颜色# fill, 是否填充# linestype,边框线型# patches.Rectangle需要传入左上角坐标、矩形区域的宽度、高度等参数    rect=patches.Rectangle((bbox[0], bbox[1]), bbox[2]-bbox[0]+1, bbox[3]-bbox[1]+1, linewidth=1,                           edgecolor=edgecolor,facecolor=facecolor,fill=fill, linestyle=linestyle)    currentAxis.add_patch(rect)

plt.figure(figsize=(10, 10))

filename = '/home/aistudio/work/images/section3/000000086956.jpg'im = imread(filename)plt.imshow(im)

# 使用xyxy格式表示物体真实框bbox1 = [214.29, 325.03, 399.82, 631.37]bbox2 = [40.93, 141.1, 226.99, 515.73]bbox3 = [247.2, 131.62, 480.0, 639.32]


draw_rectangle(currentAxis, bbox1, edgecolor='r')draw_rectangle(currentAxis, bbox2, edgecolor='r')draw_rectangle(currentAxis, bbox3,edgecolor='r')

# 绘制锚框def draw_anchor_box(center, length, scales, ratios, img_height, img_width):"""    以center为中心,产生一系列锚框    其中length指定了一个基准的长度    scales是包含多种尺寸比例的list    ratios是包含多种长宽比的list    img_height和img_width是图片的尺寸,生成的锚框范围不能超出图片尺寸之外    """    bboxes = []for scale in scales:for ratio in ratios:            h = length*scale*math.sqrt(ratio)            w = length*scale/math.sqrt(ratio)            x1 = max(center[0] - w/2., 0.)            y1 = max(center[1] - h/2., 0.)            x2 = min(center[0] + w/2. - 1.0, img_width - 1.0)            y2 = min(center[1] + h/2. - 1.0, img_height - 1.0)            print(center[0], center[1], w, h)            bboxes.append([x1, y1, x2, y2])

for bbox in bboxes:        draw_rectangle(currentAxis, bbox, edgecolor = 'b')

img_height = im.shape[0]img_width = im.shape[1]draw_anchor_box([300., 500.], 100., [2.0], [0.5, 1.0, 2.0], img_height, img_width)

################# 以下为添加文字说明和箭头###############################

plt.text(285, 285, 'G1', color='red', fontsize=20)plt.arrow(300, 288, 30, 40, color='red', width=0.001, length_includes_head=True, \         head_width=5, head_length=10, shape='full')

plt.text(190, 320, 'A1', color='blue', fontsize=20)plt.arrow(200, 320, 30, 40, color='blue', width=0.001, length_includes_head=True, \         head_width=5, head_length=10, shape='full')

plt.text(160, 370, 'A2', color='blue', fontsize=20)plt.arrow(170, 370, 30, 40, color='blue', width=0.001, length_includes_head=True, \         head_width=5, head_length=10, shape='full')

plt.text(115, 420, 'A3', color='blue', fontsize=20)plt.arrow(127, 420, 30, 40, color='blue', width=0.001, length_includes_head=True, \         head_width=5, head_length=10, shape='full')

#draw_anchor_box([200., 200.], 100., [2.0], [0.5, 1.0, 2.0])plt.show()

300.0 500.0 282.84271247461896 141.4213562373095300.0 500.0 200.0 200.0300.0 500.0 141.42135623730948 282.842712474619



上面我们画出了以点(300,500)(300, 500)(300,500)为中心,生成的三个锚框,我们可以看到锚框A1 与真实框 G1的重合度比较好。那么如何衡量这三个锚框跟真实框之间的关系呢,在检测任务中是使用交并比(Intersection of Union,IoU)作为衡量指标。这一概念来源于数学中的集合,用来描述两个集合AAA和BBB之间的关系,它等于两个集合的交集里面所包含的元素个数,除以它们的并集里面所包含的元素个数,具体计算公式如下:

IoU=A∩BA∪BIoU = \frac{A\cap B}{A \cup B}IoU=A∪BA∩B

我们将用这个概念来描述两个框之间的重合度。两个框可以看成是两个像素的集合,它们的交并比等于两个框重合部分的面积除以它们合并起来的面积。下图a中红色区域是两个框的重合面积,图b中蓝色区域是两个框的相并面积。用这两个面积相除即可得到它们之间的交并比,如 图5 所示。



A:[xa1,ya1,xa2,ya2]A: [x_{a1}, y_{a1}, x_{a2}, y_{a2}]a[xa1,ya1,xa2,ya2]


假如位置关系如 图6 所示:



x1=max(xa1,xb1),     y1=max(ya1,yb1)x_1 = max(x_{a1}, x_{b1}), \ \ \ \ \ y_1 = max(y_{a1}, y_{b1})x1=max(xa1,xb1),     y1=max(ya1,yb1)





### 数据读取import cv2

def get_bbox(gt_bbox, gt_class):# 对于一般的检测任务来说,一张图片上往往会有多个目标物体# 设置参数MAX_NUM = 50, 即一张图片最多取50个真实框;如果真实# 框的数目少于50个,则将不足部分的gt_bbox, gt_class和gt_score的各项数值全设置为0    MAX_NUM = 50    gt_bbox2 = np.zeros((MAX_NUM, 4))    gt_class2 = np.zeros((MAX_NUM,))for i in range(len(gt_bbox)):        gt_bbox2[i, :] = gt_bbox[i, :]        gt_class2[i] = gt_class[i]if i >= MAX_NUM:breakreturn gt_bbox2, gt_class2

def get_img_data_from_file(record):"""    record is a dict as following,      record = {            'im_file': img_file,            'im_id': im_id,            'h': im_h,            'w': im_w,            'is_crowd': is_crowd,            'gt_class': gt_class,            'gt_bbox': gt_bbox,            'gt_poly': [],            'difficult': difficult            }    """    im_file = record['im_file']    h = record['h']    w = record['w']    is_crowd = record['is_crowd']    gt_class = record['gt_class']    gt_bbox = record['gt_bbox']    difficult = record['difficult']

    img = cv2.imread(im_file)    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# check if h and w in record equals that read from imgassert img.shape[0] == int(h), \"image height of {} inconsistent in record({}) and img file({})".format(               im_file, h, img.shape[0])

assert img.shape[1] == int(w), \"image width of {} inconsistent in record({}) and img file({})".format(               im_file, w, img.shape[1])

    gt_boxes, gt_labels = get_bbox(gt_bbox, gt_class)

# gt_bbox 用相对值    gt_boxes[:, 0] = gt_boxes[:, 0] / float(w)    gt_boxes[:, 1] = gt_boxes[:, 1] / float(h)    gt_boxes[:, 2] = gt_boxes[:, 2] / float(w)    gt_boxes[:, 3] = gt_boxes[:, 3] / float(h)

return img, gt_boxes, gt_labels, (h, w)
record = records[0]img, gt_boxes, gt_labels, scales = get_img_data_from_file(record)
(1268, 1268, 3)
(50, 4)
array([1., 0., 2., 3., 4., 5., 5., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
(1268.0, 1268.0)

get_img_data_from_file()函数可以返回图片数据的数据,它们是图像数据img, 真实框坐标gt_boxes, 真实框包含的物体类别gt_labels, 图像尺寸scales。




import numpy as npimport cv2from PIL import Image, ImageEnhanceimport random

# 随机改变亮暗、对比度和颜色等def random_distort(img):# 随机改变亮度def random_brightness(img, lower=0.5, upper=1.5):        e = np.random.uniform(lower, upper)return ImageEnhance.Brightness(img).enhance(e)# 随机改变对比度def random_contrast(img, lower=0.5, upper=1.5):        e = np.random.uniform(lower, upper)return ImageEnhance.Contrast(img).enhance(e)# 随机改变颜色def random_color(img, lower=0.5, upper=1.5):        e = np.random.uniform(lower, upper)return ImageEnhance.Color(img).enhance(e)

    ops = [random_brightness, random_contrast, random_color]    np.random.shuffle(ops)

    img = Image.fromarray(img)    img = ops[0](img)    img = ops[1](img)    img = ops[2](img)    img = np.asarray(img)

return img


# 随机填充def random_expand(img,                  gtboxes,                  max_ratio=4.,                  fill=None,                  keep_ratio=True,                  thresh=0.5):if random.random() > thresh:return img, gtboxes

if max_ratio < 1.0:return img, gtboxes

    h, w, c = img.shape    ratio_x = random.uniform(1, max_ratio)if keep_ratio:        ratio_y = ratio_xelse:        ratio_y = random.uniform(1, max_ratio)    oh = int(h * ratio_y)    ow = int(w * ratio_x)    off_x = random.randint(0, ow - w)    off_y = random.randint(0, oh - h)

    out_img = np.zeros((oh, ow, c))if fill and len(fill) == c:for i in range(c):            out_img[:, :, i] = fill[i] * 255.0

    out_img[off_y:off_y + h, off_x:off_x + w, :] = img    gtboxes[:, 0] = ((gtboxes[:, 0] * w) + off_x) / float(ow)    gtboxes[:, 1] = ((gtboxes[:, 1] * h) + off_y) / float(oh)    gtboxes[:, 2] = gtboxes[:, 2] / ratio_x    gtboxes[:, 3] = gtboxes[:, 3] / ratio_y

return out_img.astype('uint8'), gtboxes



import numpy as np

def multi_box_iou_xywh(box1, box2):"""    In this case, box1 or box2 can contain multi boxes.    Only two cases can be processed in this method:       1, box1 and box2 have the same shape, box1.shape == box2.shape       2, either box1 or box2 contains only one box, len(box1) == 1 or len(box2) == 1    If the shape of box1 and box2 does not match, and both of them contain multi boxes, it will be wrong.    """assert box1.shape[-1] == 4, "Box1 shape[-1] should be 4."assert box2.shape[-1] == 4, "Box2 shape[-1] should be 4."

    b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2    b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2    b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2    b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2

    inter_x1 = np.maximum(b1_x1, b2_x1)    inter_x2 = np.minimum(b1_x2, b2_x2)    inter_y1 = np.maximum(b1_y1, b2_y1)    inter_y2 = np.minimum(b1_y2, b2_y2)    inter_w = inter_x2 - inter_x1    inter_h = inter_y2 - inter_y1    inter_w = np.clip(inter_w, a_min=0., a_max=None)    inter_h = np.clip(inter_h, a_min=0., a_max=None)

    inter_area = inter_w * inter_h    b1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1)    b2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1)

return inter_area / (b1_area + b2_area - inter_area)

def box_crop(boxes, labels, crop, img_shape):    x, y, w, h = map(float, crop)    im_w, im_h = map(float, img_shape)

    boxes = boxes.copy()    boxes[:, 0], boxes[:, 2] = (boxes[:, 0] - boxes[:, 2] / 2) * im_w, (        boxes[:, 0] + boxes[:, 2] / 2) * im_w    boxes[:, 1], boxes[:, 3] = (boxes[:, 1] - boxes[:, 3] / 2) * im_h, (        boxes[:, 1] + boxes[:, 3] / 2) * im_h

    crop_box = np.array([x, y, x + w, y + h])    centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0    mask = np.logical_and(crop_box[:2] <= centers, centers <= crop_box[2:]).all(        axis=1)

    boxes[:, :2] = np.maximum(boxes[:, :2], crop_box[:2])    boxes[:, 2:] = np.minimum(boxes[:, 2:], crop_box[2:])    boxes[:, :2] -= crop_box[:2]    boxes[:, 2:] -= crop_box[:2]

    mask = np.logical_and(mask, (boxes[:, :2] < boxes[:, 2:]).all(axis=1))    boxes = boxes * np.expand_dims(mask.astype('float32'), axis=1)    labels = labels * mask.astype('float32')    boxes[:, 0], boxes[:, 2] = (boxes[:, 0] + boxes[:, 2]) / 2 / w, (        boxes[:, 2] - boxes[:, 0]) / w    boxes[:, 1], boxes[:, 3] = (boxes[:, 1] + boxes[:, 3]) / 2 / h, (        boxes[:, 3] - boxes[:, 1]) / h

return boxes, labels, mask.sum()
# 随机裁剪def random_crop(img,                boxes,                labels,                scales=[0.3, 1.0],                max_ratio=2.0,                constraints=None,                max_trial=50):if len(boxes) == 0:return img, boxes

if not constraints:        constraints = [(0.1, 1.0), (0.3, 1.0), (0.5, 1.0), (0.7, 1.0),                       (0.9, 1.0), (0.0, 1.0)]

    img = Image.fromarray(img)    w, h = img.size    crops = [(0, 0, w, h)]for min_iou, max_iou in constraints:for _ in range(max_trial):            scale = random.uniform(scales[0], scales[1])            aspect_ratio = random.uniform(max(1 / max_ratio, scale * scale), \                                          min(max_ratio, 1 / scale / scale))            crop_h = int(h * scale / np.sqrt(aspect_ratio))            crop_w = int(w * scale * np.sqrt(aspect_ratio))            crop_x = random.randrange(w - crop_w)            crop_y = random.randrange(h - crop_h)            crop_box = np.array([[(crop_x + crop_w / 2.0) / w,                                  (crop_y + crop_h / 2.0) / h,                                  crop_w / float(w), crop_h / float(h)]])

            iou = multi_box_iou_xywh(crop_box, boxes)if min_iou <= iou.min() and max_iou >= iou.max():                crops.append((crop_x, crop_y, crop_w, crop_h))break

while crops:        crop = crops.pop(np.random.randint(0, len(crops)))        crop_boxes, crop_labels, box_num = box_crop(boxes, labels, crop, (w, h))if box_num < 1:continue        img = img.crop((crop[0], crop[1], crop[0] + crop[2],                        crop[1] + crop[3])).resize(img.size, Image.LANCZOS)        img = np.asarray(img)return img, crop_boxes, crop_labels    img = np.asarray(img)return img, boxes, labels


# 随机缩放def random_interp(img, size, interp=None):    interp_method = [        cv2.INTER_NEAREST,        cv2.INTER_LINEAR,        cv2.INTER_AREA,        cv2.INTER_CUBIC,        cv2.INTER_LANCZOS4,    ]if not interp or interp not in interp_method:        interp = interp_method[random.randint(0, len(interp_method) - 1)]    h, w, _ = img.shape    im_scale_x = size / float(w)    im_scale_y = size / float(h)    img = cv2.resize(        img, None, None, fx=im_scale_x, fy=im_scale_y, interpolation=interp)return img


# 随机翻转def random_flip(img, gtboxes, thresh=0.5):if random.random() > thresh:        img = img[:, ::-1, :]        gtboxes[:, 0] = 1.0 - gtboxes[:, 0]return img, gtboxes


# 随机打乱真实框排列顺序def shuffle_gtbox(gtbox, gtlabel):    gt = np.concatenate(        [gtbox, gtlabel[:, np.newaxis]], axis=1)    idx = np.arange(gt.shape[0])    np.random.shuffle(idx)    gt = gt[idx, :]return gt[:, :4], gt[:, 4]



# 获取一个批次内样本随机缩放的尺寸def get_img_size(mode):if (mode == 'train') or (mode == 'valid'):        inds = np.array([0,1,2,3,4,5,6,7,8,9])        ii = np.random.choice(inds)        img_size = 320 + ii * 32else:        img_size = 608return img_size

# 将 list形式的batch数据 转化成多个array构成的tupledef make_array(batch_data):    img_array = np.array([item[0] for item in batch_data], dtype = 'float32')    gt_box_array = np.array([item[1] for item in batch_data], dtype = 'float32')    gt_labels_array = np.array([item[2] for item in batch_data], dtype = 'int32')    img_scale = np.array([item[3] for item in batch_data], dtype='int32')return img_array, gt_box_array, gt_labels_array, img_scale

# 批量读取数据,同一批次内图像的尺寸大小必须是一样的,# 不同批次之间的大小是随机的,# 由上面定义的get_img_size函数产生def data_loader(datadir, batch_size= 10, mode='train'):    cname2cid = get_insect_names()    records = get_annotations(cname2cid, datadir)

def reader():if mode == 'train':            np.random.shuffle(records)        batch_data = []        img_size = get_img_size(mode)for record in records:#print(record)            img, gt_bbox, gt_labels, im_shape = get_img_data(record,                                                             size=img_size)            batch_data.append((img, gt_bbox, gt_labels, im_shape))if len(batch_data) == batch_size:yield make_array(batch_data)                batch_data = []                img_size = get_img_size(mode)if len(batch_data) > 0:yield make_array(batch_data)

return reader
d = data_loader('/home/aistudio/work/insects/train', batch_size=2, mode='train')
img, gt_boxes, gt_labels, im_shape = next(d())
img.shape, gt_boxes.shape, gt_labels.shape, im_shape.shape
((2, 3, 544, 544), (2, 50, 4), (2, 50), (2, 2))

由于在数据预处理耗时较长,可能会成为网络训练速度的瓶颈,所以需要对预处理部分进行优化。通过使用Paddle提供的API paddle.reader.xmap_readers可以开启多线程读取数据,具体实现代码如下。

import functoolsimport paddle

# 使用paddle.reader.xmap_readers实现多线程读取数据def multithread_loader(datadir, batch_size= 10, mode='train'):    cname2cid = get_insect_names()    records = get_annotations(cname2cid, datadir)def reader():if mode == 'train':            np.random.shuffle(records)        img_size = get_img_size(mode)        batch_data = []for record in records:            batch_data.append((record, img_size))if len(batch_data) == batch_size:yield batch_data                batch_data = []                img_size = get_img_size(mode)if len(batch_data) > 0:yield batch_data

def get_data(samples):        batch_data = []for sample in samples:            record = sample[0]            img_size = sample[1]            img, gt_bbox, gt_labels, im_shape = get_img_data(record, size=img_size)            batch_data.append((img, gt_bbox, gt_labels, im_shape))return make_array(batch_data)

    mapper = functools.partial(get_data, )

return paddle.reader.xmap_readers(mapper, reader, 8, 10)
d = multithread_loader('/home/aistudio/work/insects/train', batch_size=2, mode='train')
img, gt_boxes, gt_labels, im_shape = next(d())
img.shape, gt_boxes.shape, gt_labels.shape, im_shape.shape
((2, 3, 352, 352), (2, 50, 4), (2, 50), (2, 2))

至此,我们完成了如何查看数据集中的数据、提取数据标注信息、从文件读取图像和标注数据、数据增多、批量读取和加速等过程,通过multithread_loader可以返回img, gt_boxes, gt_labels, im_shape等数据,接下来就可以将它们输入神经网络应用在具体算法上面了。


# 测试数据读取

# 将 list形式的batch数据 转化成多个array构成的tupledef make_test_array(batch_data):    img_name_array = np.array([item[0] for item in batch_data])    img_data_array = np.array([item[1] for item in batch_data], dtype = 'float32')    img_scale_array = np.array([item[2] for item in batch_data], dtype='int32')return img_name_array, img_data_array, img_scale_array

# 测试数据读取def test_data_loader(datadir, batch_size= 10, test_image_size=608, mode='test'):"""    加载测试用的图片,测试数据没有groundtruth标签    """    image_names = os.listdir(datadir)def reader():        batch_data = []        img_size = test_image_sizefor image_name in image_names:            file_path = os.path.join(datadir, image_name)            img = cv2.imread(file_path)            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)            H = img.shape[0]            W = img.shape[1]            img = cv2.resize(img, (img_size, img_size))

            mean = [0.485, 0.456, 0.406]            std = [0.229, 0.224, 0.225]            mean = np.array(mean).reshape((1, 1, -1))            std = np.array(std).reshape((1, 1, -1))            out_img = (img / 255.0 - mean) / std            out_img = out_img.astype('float32').transpose((2, 0, 1))            img = out_img #np.transpose(out_img, (2,0,1))            im_shape = [H, W]

            batch_data.append((image_name.split('.')[0], img, im_shape))if len(batch_data) == batch_size:yield make_test_array(batch_data)                batch_data = []if len(batch_data) > 0:yield make_test_array(batch_data)

return reader


