阿里天池--宫颈癌检测（基于fastRCNN）新手初次尝试

阿里天池–宫颈癌检测

数据下载（自取嗷，链接失效了是阿里的事= =）：https://blog.csdn.net/abyss_miracle/article/details/104720413

官方说明：
赛题链接：https://blog.csdn.net/xiaosongshine/article/details/102497362
赛题背景
大赛旨在通过提供大规模经过专业医师标注的宫颈癌液基薄层细胞检测数据，选手能够提出并综合运用目标检测、深度学习等方法对宫颈癌细胞学异常鳞状上皮细胞进行定位以及对宫颈癌细胞学图片分类，提高模型检测的速度和精度，辅助医生进行诊断。

赛题数据
本次大赛提供数千份宫颈癌细胞学图片和对应异常鳞状上皮细胞位置标注，数据为kfb格式，需要使用大赛指定SDK读取。每张数据在20倍数字扫描仪下获取，大小300～400M。

初赛环节允许选手下载数据，初赛提供的数据如下：宫颈癌细胞学图片800张，其中阳性图片500张，阴性图片300张。阳性图片会提供多个ROI区域，在ROI区域里面标注异常鳞状上皮细胞位置，阴性图片不包含异常鳞状上皮细胞，无标注。初赛讨论的异常鳞状上皮细胞主要包括四类：ASC-US(非典型鳞状细胞不能明确意义)，LSIL(上皮内低度病变)，ASC-H(非典型鳞状细胞倾向上皮细胞内高度)，HSIL(上皮内高度病变)。（特别注明：阳性图片ROI区域之外不保证没有异常鳞状上皮细胞）
在复赛环节，通过线上赛的方式，不允许选手下载数据，在线完成模型训练，同时在线赛也为选手模型的代码复现和成果落地过程中的工程化开发提供支持。复赛预计提供1000份宫颈癌细胞学数据，通过检测多种细胞类别，进一步判断整个细胞学图片的类别。
本次大赛将合理划分训练集和测试集，隐藏测试标注数据作为模型测评依据。初赛的数据分为train和test两部分：train用来给选手训练模型，会提供给选手宫颈癌细胞学图片kfb文件和对应标注json文件，test用来进行评测。标注json文件内容是一个list文件，里面记录了每个ROI区域的位置和异常鳞状上皮细胞的位置坐标（细胞所在矩形框的左上角坐标和矩形宽高）。类别roi表示感兴趣区域，pos表示异常鳞状上皮细胞。json标注文件示例如下：
[{“x”: 33842, “y”: 31905, “w”: 101, “h”: 106, “class”: “pos”},
{“x”: 31755, “y”: 31016, “w”: 4728, “h”: 3696, “class”: “roi”},
{“x”: 32770, “y”: 34121, “w”: 84, “h”: 71, “class”: “pos”},
{“x”: 13991, “y”: 38929, “w”: 131, “h”: 115, “class”: “pos”},
{“x”: 9598, “y”: 35063, “w”: 5247, “h”: 5407, “class”: “roi”},
{“x”: 25030, “y”: 40115, “w”: 250, “h”: 173, “class”: “pos”}]

本次大赛还特别设置附加赛—VNNI赛道，VNNI赛的赛题和复赛一样，但是限定了深度学习训练框架（TensorFlow和MXNet）,要求根据intel提供的模型压缩工具进行模型压缩，并在intel提供的VNNI平台上进行推理测评。VNNI赛道在复赛开赛后开放，需要单独报名，只有报名前30只队伍有资格参加比赛，要求必须在10天内提交一次有效结果，否则报名资格取消，其他队伍可以替补继续报名。
本次比赛将从数据安全角度保证医疗数据安全。本次比赛数据集将基于专门的数据安全脱敏软件，所有宫颈癌细胞学影像数据严格按照国际通行的医疗信息脱敏标准，进行脱敏处理，脱敏信息包括：医院信息、患者信息和标注医师信息，所有数据不可溯，切实保障数据安全，保护患者的隐私。

提交说明
参赛者提交多个json文件组成的文件夹打包压缩成ZIP文件，文件夹名自由选定，用英文小写表示（如：tianchi.zip），文件夹内的每个文件对应一个宫颈癌细胞学图片的检测结果，文件名是图像id号（如: T2019_600.json），json文件的内容是一个list文件，每个元素对应检测到的一个异常细胞，依次为包含肿瘤细胞的矩形左上角坐标xy,矩形宽高wh的数值以及置信度p。样例如下：
T2019_600.json
[{“x”: 22890, “y”: 3877, “w”: 396, “h”: 255，“p”: 0.94135},
{“x”: 20411, “y”: 2260, “w”: 8495, “h”: 7683，“p”: 0.67213},
{“x”: 26583, “y”: 7937, “w”: 172, “h”: 128，“p”: 0.73228},
{“x”: 2594, “y”: 18627, “w”: 1296, “h”: 1867，“p”: 0.23699}]

注意：一共有200张测试集，选手需要提交200个预测结果json文件打包的压缩包，即使没有预测出任何异常细胞，也需要提交一个对应空列表json文件。异常细胞字典一共包含5个key，分别是x,y,w,h和p,均为小写。

评估指标
赛题组会初赛采用目标检测任务常用的mAP（mean Average Precision）指标作为本次宫颈癌肿瘤细胞检测的评测指标。我们采用两个IoU阈值（0.3，0.5）分别来计算AP，再综合平均作为最终的评测结果。我们的评测程序参考VOC2010之后的方法（https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/datasets/voc_eval.py）。
具体地，对于每张宫颈癌细胞学图片，参赛选手通过检测模型输出整张图像多个预测框位置和置信度，我们后台评测算法会随机生成一些ROI区域，且只在ROI区域中计算mAP。
AP计算过程：首先固定一个IoU阈值，计算每个预测框和真实标签的IoU大小，根据阈值判断预测框是否正确。然后在对预测框根据置信度排序，设定不同的置信度阈值得到一系列召回率和精确率值，在不同的召回率下对准确率求平均，即为AP。
Recall（召回率）=TP/(TP+FN)
Precision（精确率）=TP/(TP+FP)

START
1.由于原始数据不规则，json文件中ROI区域和POS区域混杂，并且训练集中阳性数据不需要讨论ROI以外的区域，故考虑先分割出ROI区域图，并固定此ROI，将对应的POS画出。并且由于原图过大，考虑在不读取原图数据的前提下，直接裁剪出ROI区域。

以下是分离json的程序。用来将ROI的信息规则化。需要注意的是，
json_file = open(json_path).read()
json_list = json.loads(json_file)

load可以把字符串转变成List,而List中包含着字典，这种形式方便索引。

1、分离JSON的每个ROI和对应的JSON文件

import time
import json
import osstart_time = time.time()
def paid_time(start_time,end_time):paid_time = end_time - start_timereturn print(f'it cost{paid_time}s ')def judge_inRoi(name):json_path = f'E:/ali_cervical_carcinoma_data/labels/{name}.json'print('json_path is: ',json_path)print(f'reading filename is pos_0/{name}.kfb \n')json_file = open(json_path).read()json_list = json.loads(json_file) #字符串转为List里面包含一个字典# 将pos和roi坐标分类pos_list =[]roi_list =[]for i in range(0,len(json_list)):if json_list[i]['class'] == 'roi':roi_list.append(json_list[i])elif json_list[i]['class'] == 'pos':pos_list.append(json_list[i])else:print('there are something wrong')continue#计算右下角for i in range(0,len(roi_list)):corres_list = []corres_list.append(roi_list[i])Roi_range_x = roi_list[i]['x']Roi_range_y = roi_list[i]['y']Roi_range_w = roi_list[i]['w']Roi_range_h = roi_list[i]['h']rightpoint_x = Roi_range_x +Roi_range_wrightpoint_y = Roi_range_y +Roi_range_hfor j in range(0,len(pos_list)):pos_range_x = pos_list[j]['x']pos_range_y = pos_list[j]['y']#判断某个POS是否在当前循环的ROI内部if Roi_range_x<pos_range_x<rightpoint_x and Roi_range_y<pos_range_y<rightpoint_y:corres_list.append(pos_list[j])jsondata = json.dumps(corres_list)f = open(os.path.join(r'E:/ali_cervical_carcinoma_data/corres_labels_0to',f'{name}_Roi{i}.json'),'w')f.write(jsondata)f.close()scale = 20for number in range(0,10):root = f'E:/ali_cervical_carcinoma_data/pos_{number}'all = os.walk(root)for _,_,filelist in all:for filename in filelist:name = filename[:-4]path = f'E:/ali_cervical_carcinoma_data/pos_{number}/{name}.kfb'judge_inRoi(name)#不直接输出数值，将pos_0内的ROI和Pos都分开成了若干份jsonend_time = time.time()
paid_time(start_time,end_time)

2、利用已经分离好的ROI-POS文件，切割出ROI区域，并绘制图片。

import kfbReader
import json
import os
import cv2 as cv
import timetotal_time  = 0#计算相对坐标
def caculate_relative_position(Roi_x,Roi_y,Pos_x,Pos_y,):relative_x = Pos_x - Roi_xrelative_y = Pos_y - Roi_yreturn relative_x,relative_y# 选中Roi且画框，向函数传递正在处理的labels文件名和相应的json的List
def draw_rectangle(labels_filename, corres_json_list, total_time):start_time = time.time()  # 完成画一张图记一次时间filename = labels_filename[:-10] + '.kfb'Roi_x = corres_json_list[0]['x']Roi_y = corres_json_list[0]['y']Roi_w = corres_json_list[0]['w']Roi_h = corres_json_list[0]['h']# 实例化reader类path = os.path.join(kfb_image_root, filename)image = kfbReader.reader()kfbReader.reader.ReadInfo(image, path, Scale, True)# 实例化后，按照说明文档的方法，读取kfb格式文件的Roi区域draw = image.ReadRoi(Roi_x, Roi_y, Roi_w, Roi_h, scale=20)  # 这个sacle将读取的ROI对应到相应倍数上，影响大# 将所有的pos遍历，画在同一张Roi上面for i in range(1, len(corres_json_list)):Pos_x = corres_json_list[i]['x']Pos_y = corres_json_list[i]['y']Pos_w = corres_json_list[i]['w']Pos_h = corres_json_list[i]['h']rela_x, rela_y = caculate_relative_position(Roi_x, Roi_y, Pos_x, Pos_y)draw = cv.rectangle(draw, (rela_x, rela_y), (rela_x + Pos_w, rela_y + Pos_h), (255, 0, 0), 10)#在图像上画出标记框cv.imwrite(f"E:/ali_cervical_carcinoma_data/cut_image_pos_0/{labels_filename}.jpg", draw)  #保存图像end_time = time.time()cost_time = end_time - start_timetotal_time = total_time + cost_timeprint(f'The {labels_filename}done,which cost {cost_time}s')return total_timeScale = 20  # 这个scale未知作用
kfb_image_root = r'E:/ali_cervical_carcinoma_data/pos_0'   #暂时仅对pos_0操作
corres_labels_root = 'E:/ali_cervical_carcinoma_data/corres_labels'  #由correspongding_ROI_json_maker.py得来#以kfb文件为基准设置循环
all = os.walk(kfb_image_root)
for _,_,filelist in all:for filename in filelist:#到corres_labels文件夹中找到对应json 并读取其坐标#如果包含filename  如T2019_53.kfblabels_all = os.walk(corres_labels_root)for _, _, labelslist in labels_all:for labels_filename in labelslist:if labels_filename.find(filename[:-4]) >= 0: #判断json的文件名是否包含kfb的文件名，以便全部遍历且一一对应corres_json_path = os.path.join(corres_labels_root, labels_filename)corres_json_file = open(corres_json_path).read()  # 读取jsoncorres_json_list = json.loads(corres_json_file)  # 将字符串转换为Listprint(f'\n filename is {filename},labels name is {labels_filename} , roi is {corres_json_list[0]}')total_time = draw_rectangle(labels_filename,corres_json_list,total_time)else:continueprint(' =  = '*10)
print(f'Total time cost {total_time}s')

上面的这一版程序更多的是为了可视化观察病理特征，真正放入网络中的不需要画框，只需要将JSON文件和IMAGE文件一一对应即可。
所以采用这版程序。

import kfbReader
import json
import os
import cv2 as cv
import timetotal_time  = 0#计算相对坐标
def caculate_relative_position(Roi_x,Roi_y,Pos_x,Pos_y,):relative_x = Pos_x - Roi_xrelative_y = Pos_y - Roi_yreturn relative_x,relative_y# 选中Roi且画框，向函数传递正在处理的labels文件名和相应的json的List
def draw_rectangle(labels_filename, corres_json_list, total_time):start_time = time.time()  # 完成画一张图记一次时间#读取图像filename = labels_filename[:-10] + '.kfb'Roi_x = corres_json_list[0]['x']Roi_y = corres_json_list[0]['y']Roi_w = corres_json_list[0]['w']Roi_h = corres_json_list[0]['h']# 实例化reader类path = os.path.join(kfb_image_root, filename)image = kfbReader.reader()kfbReader.reader.ReadInfo(image, path, Scale, True)#获取读取视野倍数scale = kfbReader.reader.getReadScale(image)# 实例化后，按照说明文档的方法，读取kfb格式文件的Roi区域draw = image.ReadRoi(Roi_x, Roi_y, Roi_w, Roi_h, scale=scale)  # 这个sacle将读取的ROI对应到相应倍数上，影响大# # 将所有的pos遍历，画在同一张Roi上面# for i in range(1, len(corres_json_list)):#     Pos_x = corres_json_list[i]['x']#     Pos_y = corres_json_list[i]['y']#     Pos_w = corres_json_list[i]['w']#     Pos_h = corres_json_list[i]['h']#     rela_x, rela_y = caculate_relative_position(Roi_x, Roi_y, Pos_x, Pos_y)##     draw = cv.rectangle(draw, (rela_x, rela_y), (rela_x + Pos_w, rela_y + Pos_h), (255, 0, 0), 10)#在图像上画出标记框cv.imwrite(f"E:/ali_cervical_carcinoma_data/ROI_image/{labels_filename}.jpg", draw)  #保存图像end_time = time.time()cost_time = end_time - start_timetotal_time = total_time + cost_timeprint(f'The {labels_filename}  done,which cost {cost_time}s')return total_timeScale = 20  # 这个scale未知作用corres_labels_root = 'E:/ali_cervical_carcinoma_data/corres_labels_0to9'  #由correspongding_ROI_json_maker.py得来for k in range(0,10):#遍历所有阴性病变文件夹kfb_image_root = f'E:/ali_cervical_carcinoma_data/pos_{k}'#以kfb文件为基准设置循环all = os.walk(kfb_image_root)for kfb_root,_,filelist in all:for filename in filelist:basename_num =filename[:-4].split('_')[1]#到corres_labels文件夹中找到对应json 并读取其坐标#如果包含filename  如T2019_53.kfblabels_all = os.walk(corres_labels_root)for _, _, labelslist in labels_all:for labels_filename in labelslist:labels_filename_num = labels_filename[:-5].split('_')[1]# 判断json的文件名前几位是否严格等于kfb前几位的文件名，以便全部遍历且一一对应if labels_filename_num == basename_num  : #避免53和530一起被读入图片的情况corres_json_path = os.path.join(corres_labels_root, labels_filename)corres_json_file = open(corres_json_path).read()  # 读取jsoncorres_json_list = json.loads(corres_json_file)  # 将字符串转换为Listprint(f'\n filename is {filename},labels name is {labels_filename} ,NOW we are at pos_{k}'  )total_time = draw_rectangle(labels_filename,corres_json_list,total_time)else:continueprint(' =  = '*10)
print(f'Total time cost {total_time}s')

3、由于直接把13MB左右的第一次切割后的图片放进去会Out of memory，故使用torch中自带的resize进行数据预处理（压缩并pos坐标做出相应改变）

from torchvision.models.detection.transform import GeneralizedRCNNTransform
import json
from dataset_maker import Positive_Roi_Dataset
import cv2 as cv
import timestart_time = time.time()transforms = GeneralizedRCNNTransform(min_size=800, max_size=1333,image_mean=[187.462, 187.527, 193.423], image_std=[83.423, 91.469, 92.234],)
data_train =Positive_Roi_Dataset('E:/ali_cervical_carcinoma_data',train=True, transforms=transforms)
data_test =Positive_Roi_Dataset('E:/ali_cervical_carcinoma_data',train=False, transforms=transforms)i =0
for i in range(0,len(data_train)):singal_start_time = time.time()img,bbox,imgid= data_train[i]img = img.cpu().numpy()filename = imgid[:-9]#transform维度改变了例如：(3,608,608), 故用transpose(1,2,0)*255  255是逆归一化cv.imwrite(f'E:/ali_cervical_carcinoma_data/ROI_images_clip/{filename}.jpg',img.transpose(1,2,0)*255)boxes = bbox['boxes']boxes = boxes.numpy().tolist()#list to stringjsondata = json.dumps(boxes)f = open(f'E:/ali_cervical_carcinoma_data/corres_labels_zero_to9_clip/{filename}.json', 'w')f.write(jsondata)f.close()singal_end_time = time.time()print(f'This picture used {singal_end_time-singal_start_time}s, this is {filename}')end_time = time.time()
print(f'Total used {end_time-start_time}s')

4、接下来，将数据整理成COCO数据集规定的格式

dataset_maker
import os
import numpy as np
import torch
import torch.utils.data
import json
import cv2 as cv
import transforms as T
from torchvision.transforms import functional as F
import randomclass Positive_Roi_Dataset(torch.utils.data.Dataset):def __init__(self, root,train, transforms=None):self.root = rootself.transforms = transforms# load all image files, sorting them to# ensure that they are alignedimgs_list = list(sorted(os.listdir(os.path.join(root,'ROI_images_clip'))))labels_list = list(sorted(os.listdir(os.path.join(root,'corres_labels_zero_to9_clip'))))#全部的1202个文件作为索引值排序indices = [i for i in range(len(imgs_list))]#随机打乱顺序# random.shuffle(indices)if train:self.imgs = [imgs_list[i] for i in indices[:-212]]self.labels = [labels_list[i] for i in indices[:-212]]if transforms == None:  #随机翻转图片transforms = T.Compose([T.ToTensor(),T.RandomHorizontalFlip(0.5)])else:self.imgs = [imgs_list[i] for i in indices[-212:]]self.labels = [labels_list[i] for i in indices[-212:]]if transforms == None:transforms = T.Compose([T.ToTensor()])self.transforms = transformsdef normalize(self, image):im_max, im_min = image.max(), image.min()image = (((image - im_min) / (im_max - im_min)) * 255).astype(np.uint8)return imagedef __getitem__(self, idx):# load images ad labelsimg_path = os.path.join(self.root, 'ROI_images_clip', self.imgs[idx])labels_path = os.path.join(self.root, 'corres_labels_zero_to9_clip', self.labels[idx])img = cv.imread(img_path)#[...,::-1]#打开json文件，读取# label坐标信息label_file = open(labels_path).read()label_list = json.loads(label_file)imgs_id = self.imgs[idx]boxes = []# 按照coco格式，写出标记的左上点和右下点# 注意，坐标要计算相对距离，而不是全图坐标# 对于还未resize的corres_labels_zero_to9中的文件label_list[0]是ROI  从1开始才是POS# for i in range(1,len(label_list)):#     xmin = label_list[i]['x']-label_list[0]['x']#     ymin = label_list[i]['y']-label_list[0]['y']#     xmax = xmin + label_list[i]['w']#     ymax = ymin + label_list[i]['h']#     boxes.append([xmin, ymin, xmax, ymax])boxes = label_listboxes = torch.as_tensor(boxes, dtype=torch.float32)#创建一个全为1（有无病变的二分类）的比当前json的列表少1（因为第一个是ROI而不是POS）的一维数组#改变数据集后，对应的不需要减一了，当前json和POS数量持平labels = torch.ones((len(label_list)), dtype=torch.int64)image_id = torch.tensor([idx])# print(boxes)area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])# suppose all instances are not crowdiscrowd = torch.zeros((len(label_list)), dtype=torch.int64)target = {}target["boxes"] = boxestarget["labels"] = labelstarget["image_id"] = image_idtarget["area"] = areatarget["iscrowd"] = iscrowdimg, target = self.transforms(img, target)#返回时压缩所有dim=1的维度  img.tensors.squeeze()  target是一个内存区域return img.squeeze(0), target,imgs_iddef __len__(self):return len(self.imgs)if __name__ == '__main__':from torchvision.models.detection.transform import GeneralizedRCNNTransformtransforms = GeneralizedRCNNTransform(min_size=800, max_size=1333, image_mean=[0.485, 0.456, 0.406],image_std=[0.229, 0.224, 0.225], )dataset = Positive_Roi_Dataset('E:/ali_cervical_carcinoma_data', train=True, transforms=transforms)for i in range(len(dataset)):target = dataset[i][1]image_name = dataset[i][2]print(target['boxes'],image_name)

5、主程序
ROI_training

import osimport torch
from torch.utils import data
from torchvision.models.detection import faster_rcnn
from torchvision.models.detection.rpn import AnchorGenerator
from torchvision.models.detection.transform import GeneralizedRCNNTransform
from torch import nnimport utils
from dataset_maker import Positive_Roi_Dataset
from engine import evaluate, train_one_epoch#参数提前，便于参数修改
num_classes = 2
epochs = 1000
step_size = 1000
print_freq = 50
min_size = 800
max_size = 1333
image_mean = [146.863, 141.212, 139.139,]
image_std = [32.170, 36.919, 38.612]
sizes = ((8,), (16,), (32,), (64,), (128,))
aspect_ratios = ((0.5, 1.0, 2.0),) * 5
device = torch.device('cuda')
start_epoch = 0# faster_rcnn.resnet_fpn_backbone内部将backbone的第1, 第2卷积层冻结，不参与更新
backbone = faster_rcnn.resnet_fpn_backbone(backbone_name='resnet50', pretrained=True)
rpn_anchor_generator = AnchorGenerator(sizes=sizes, aspect_ratios=aspect_ratios)
model = faster_rcnn.FasterRCNN(backbone=backbone, num_classes=num_classes, min_size=min_size, max_size=max_size,image_mean=image_mean, image_std=image_std, rpn_anchor_generator=rpn_anchor_generator)data_train =Positive_Roi_Dataset('E:/ali_cervical_carcinoma',train=True)
data_test =Positive_Roi_Dataset('E:/ali_cervical_carcinoma',train=False)
# print('data_test num=', len(data_test), '\nfileds:\n', data_test[0][1])
trainLoader = data.DataLoader(data_train, batch_size=2, shuffle=True, collate_fn=utils.collate_fn)
testLoader = data.DataLoader(data_test, batch_size=2, shuffle=False, collate_fn=utils.collate_fn)model.to(device)
print(model)# params = [p for p in model.parameters() if p.requires_grad]
# optimizer = torch.optim.SGD(model.parameters(), lr=0.0004,
#                             momentum=0.9, weight_decay=0.00005)
optimizer = torch.optim.Adam(model.parameters(), lr=3e-4, weight_decay=5e-5)lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size=step_size,gamma=0.1)# #加载之前的训练结果（可分批次训练） model_number.path  number为你上次跑到的epochs
ckpt = 'E:/ali_cervical_carcinoma/Kfbreader-win10-python36/checkpoints/model_178.pth'
checkpoint = torch.load(ckpt)
# print(f'model loaded from "{ckpt}"')
model_dict = checkpoint['model']
model.load_state_dict(model_dict)# optimizer_dict = checkpoint['optimizer']
# optimizer.load_state_dict(optimizer_dict)
# lr_scheduler_dict = checkpoint['lr_scheduler']
# lr_scheduler.load_state_dict(lr_scheduler_dict)start_epoch = checkpoint['epoch']print("starting to train model......")
for epoch in range(start_epoch, epochs):train_one_epoch(model, optimizer, trainLoader, device, epoch, print_freq=print_freq)lr_scheduler.step()# gpu_tracker.track()utils.save_on_master({'model': model.state_dict(),'optimizer': optimizer.state_dict(),'lr_scheduler': lr_scheduler.state_dict(),'epoch': epoch},os.path.join('checkpoints', 'model_{}.pth'.format(epoch)))evaluate(model, testLoader, device=device)其中image_mean = [146.863, 141.212, 139.139,]， image_std = [32.170, 36.919, 38.612]（由于每张压缩后图片的尺寸并不严格一致（处于最大最小上下限之间），故其实所有图片的std只由默认同尺寸而写的代码粗略得出mean
caculate_pixel_mean
import os
import cv2 as cv
import numpy as npfilepath = 'E:/ali_cervical_carcinoma_data/ROI_images_clip' # 数据集目录
pathDir = os.listdir(filepath)Rsum_mean = 0
Gsum_mean = 0
Bsum_mean = 0
Rsum_std = 0
Gsum_std = 0
Bsum_std = 0for idx in range(len(pathDir)):R_channel_mean = 0G_channel_mean = 0B_channel_mean = 0R_channel_std = 0G_channel_std = 0B_channel_std = 0filename = pathDir[idx]img = cv.imread(os.path.join(filepath, filename))R_channel_mean = R_channel_mean + np.mean(img[:,:,0])G_channel_mean = G_channel_mean + np.mean(img[:,:,1])B_channel_mean = B_channel_mean + np.mean(img[:,:,2])R_channel_std  = R_channel_std + np.std(img[:,:,0])G_channel_std  = G_channel_std + np.std(img[:,:,1])B_channel_std  = B_channel_std + np.std(img[:,:,2])Rsum_mean = Rsum_mean + R_channel_meanGsum_mean = Gsum_mean + G_channel_meanBsum_mean = Bsum_mean + B_channel_meanRsum_std = Rsum_std + R_channel_stdGsum_std = Gsum_std + G_channel_stdBsum_std = Bsum_std + B_channel_std# word =f'{filename} |||   MEAN  R {R_channel_mean} ,G {G_channel_mean} ,B {B_channel_mean} \n ' \#       f'                        std   R {R_channel_std} ,G {G_channel_std} ,B {B_channel_std}   \n    '# print(word)# file = open('D:/ali_cervical_carcinoma/pixel_mean.txt','a+')# file.write(word)
num = len(pathDir)
print(f'Rmean{Rsum_mean/num},Gmean{Gsum_mean/num},Bmead{Bsum_mean/num}')
print(f'Rstd{Rsum_std/num},Gstd{Gsum_std/num},Bstd{Bsum_std/num}')print('done')用来测试框框是否被划到了正确的位置的检测
from torchvision.models.detection.transform import GeneralizedRCNNTransform
import json
from dataset_maker import Positive_Roi_Dataset
import cv2 as cvdef caculate_relative_position(Roi_x,Roi_y,Pos_x,Pos_y,):relative_x = Pos_x - Roi_xrelative_y = Pos_y - Roi_yreturn relative_x,relative_y#经过一次transform转换的图像
draw = cv.imread('E:/ali_cervical_carcinoma_data/ROI_images_clip/T2019_7_Roi2.json')json_file = open('E:/ali_cervical_carcinoma_data/corres_labels_zero_to9_clip/T2019_7_Roi2.json').read()
json_list = json.loads(json_file)
xmin =json_list[0][0]
ymin =json_list[0][1]
xmax =json_list[0][2]
ymax =json_list[0][3]
print(xmin, ymin,xmax, ymax)#画出POS方框
draw = cv.rectangle(draw, (int(xmin), int(ymin)), (int(xmax), int(ymax)),(255,0,0), 2)
cv.imwrite(f"./test.jpg", draw)
print('done')#原图(未经transform转换的图像)
draw_ori = cv.imread('E:/ali_cervical_carcinoma_data/ROI_images/T2019_104_Roi0.json.jpg')
json_file = open('E:/ali_cervical_carcinoma_data/corres_labels_zero_to9/T2019_104_Roi0.json').read()
json_list = json.loads(json_file)Roi_x = json_list[0]['x']
Roi_y = json_list[0]['y']
Roi_w = json_list[0]['w']
Roi_h = json_list[0]['h']Pos_x = json_list[1]['x']
Pos_y = json_list[1]['y']
Pos_w = json_list[1]['w']
Pos_h = json_list[1]['h']
rela_x, rela_y = caculate_relative_position(Roi_x, Roi_y, Pos_x, Pos_y)draw_ori = cv.rectangle(draw_ori, (rela_x, rela_y), (rela_x + Pos_w, rela_y + Pos_h), (255, 0, 0), 10)#在图像上画出标记框
cv.imwrite(f"./test_ori.jpg", draw_ori)
print('done')#经过两次transform的图像
transforms = GeneralizedRCNNTransform(min_size=800, max_size=1333,image_mean=[187.462, 187.527, 193.423], image_std=[83.423, 91.469, 92.234],)
data_train =Positive_Roi_Dataset('E:/ali_cervical_carcinoma_data',train=True, transforms=transforms)
data_test =Positive_Roi_Dataset('E:/ali_cervical_carcinoma_data',train=False, transforms=transforms)#选取T2019_104_Roi0.json
img,bbox,imgid= data_train[0]
# print(img,bbox,imgid)
img = img.numpy().transpose(1,2,0)
# print(img,img.shape)
cv.imwrite(f"./test_doubleimg.jpg",img)boxes = bbox['boxes']
boxes = boxes.numpy().tolist()[0]
xmin = boxes[0]
ymin = boxes[1]
xmax = boxes[2]
ymax = boxes[3]
double_tras_img = cv.rectangle(img, (int(xmin), int(ymin)), (int(xmax), int(ymax)), (255, 0, 0), 5)#两次transform后的带有POS的图像
cv.imwrite(f"./test_double_trans.jpg", double_tras_img)

阿里天池--宫颈癌检测（基于fastRCNN）新手初次尝试相关推荐

干货满满~阿里天池目标检测保姆级教程
阿里天池目标检测类比赛入门 1赛前准备 1.1设备 1.2必备技术 1.3相关论文 1.4开源工具 2比赛规则分析 2.1评分指标 2.2模型限制的解决方法 3数据分析 3.1感受野&anch ...
阿里天池比赛多次拿前3，如何做到？
微信公众号推荐 AI蜗牛车公众号微信公众号<AI蜗牛车>,公众号致力于技术项目化,具体化,思考化,会写系列的项目工程文章,细致到位,也会写一个读物的读书笔记,或者一个语言/框架的学习笔记 ...
阿里天池学习赛-金融风控-贷款违约预测
阿里天池学习赛-金融风控-贷款违约预测 1 赛题理解 1.1 赛题数据 1.2 评测标准 2 探索性分析(EDA) 2.1 初窥数据 2.2 查看缺失值占比 2.3 数值型变量 2.3.1 数据分布 ...
5 款阿里常用代码检测工具，免费用！
作者 | 喻阳面临问题在日常研发过程中,我们通常面临的代码资产问题主要分为两大类:代码质量问题和代码安全漏洞. 1.代码质量问题代码质量其实是一个老生常谈的话题,但问题是大家都知道它很重要,却又 ...
【总结】言有三阿里天池深度学习模型设计直播汇总
好的模型结构是深度学习成功的关键因素之一,不仅是非常重要的学术研究方向,在工业界实践中也是模型是否能上线的关键.随着深度学习的发展,各种各样的网络结构都被设计出来,从拓扑结构到卷积核的使用方式,从追求 ...
阿里天池供应链需求预测比赛小结
阿里天池供应链需求预测比赛小结一.赛题的思路回顾 1.1赛题描述使用历史平均来预测未来的需求使用测试集真实数据进行过拟合的结果名词定义库存水位在仓库存数量,用来满足需求. 补货时长(交货时 ...
阿里天池_优秀策略答辩PPT和相关博客
简介前段时间想熟悉下机器学习完整项目,选择了阿里之前的一个相对实际的移动推荐项目(实际是分类,并非推荐),有兴趣自己研究.将本人参考借鉴的blog和ppt做了简单整理回顾.加深下印象阿里天池大数据 ...
阿里天池比赛——街景字符编码识别
文章目录前言一.街景字符编码识别 1. 目标 2. 数据集 3. 指标总结前言之前参加阿里天池比赛,好久了,一直没有时间整理,现在临近毕业,趁论文外审期间,赶紧把东西整理了,5月底学校就要让 ...
阿里天池供应链需求预测（二）
阿里天池供应链需求预测第二阶段总结一.已尝试的模型和存在的问题: LSTM单变量多步预测模型:通过循环迭代预测,实现了通过前42天的历史需求数据来预测未来14天的库存资源需求量:但是目前由于有的Un ...
第五届阿里天池中间件比赛经历分享
第五届阿里天池中间件比赛经历分享本文记录了作者与队友们参加2019年第五届阿里天池中间件的经历.初赛排名175/4000+队伍,幸运进入决赛.虽然最终方案比较简单,但是过程很是曲折.最后通过高分选手 ...

阿里天池--宫颈癌检测（基于fastRCNN）新手初次尝试

阿里天池--宫颈癌检测（基于fastRCNN）新手初次尝试相关推荐

最新文章

热门文章