LUNA16_Challange数据预处理2

ps 直接上好像有点困难，那么先整理下LUNA16_Challange中平安科技公司的技术说明中预处理部分（还是比较好理解，理解错误欢迎指正）

Data Preprocessing

At first, we get the lung area by using traditional methods, and then preprocessing is performed on the lung area. Use -600HU as a threshold to get a 0-1 3D map, based on the size of the 3D map block and the average distance traveled by the tile to the center, and make up for all small cave depths. The final 0-1 three-dimensional map is the lung area. As the CT picture up and down there will be some slices connected with the outside world, should be removed. The final image pixel values are clip to [-1200,600], then zoom to [0,255]. Pixels for non-lung regions are set to 170. Pretreatment can eliminate the noise, such as the bright spots of the bones, the metal lines of the CT bed. And finally, we get 128*128*128 cube.

For false positive reduction track, we use a multi-scale strategy, and prepare two sizes little cube: 36*48*48 and 20*36*36. We crop little cube from the whole lung area to feed into the two classification networks. Obtain different fields of vision size of nodules to predict the overall results.

More importantly, the training dataset has extremely high false positive to true positive ratio (735418:1557). To solve the problem of category imbalance, in addition to the focal loss function, we used oversampling to increase the number of positive samples. Specific methods are sliding window crop, flip respect to x-axis, y-axis and z-axis, rotate 90, 180, 270 degrees, multi-scale transform. Finally, we expanded the positive sample more than 300 times.

数据预处理

把-600hu作为阈值获得一个0-1二值三维图。并把其中的小块都补上，最终得到一个仅含肺区域三维图。其他链接外部组织的切片去掉，最终图像值在[-1200,600]，然后缩放到[0,255],非肺区域设置为170.预处理可以消除噪声，如骨骼的亮点，CT床的金属线。最终得到128*128*128的立方体。

这短短几句话其实工作量还是有点的，很多讲得不够细。下面先了解下hu值。参考：http://shartoo.github.io/medical_image_process/

substance HU 空气-1000肺-500脂肪-100到-50水0CSF15肾30血液+30到+45肌肉+10到+40灰质+37到+45白质+20到+30Liver+40到+60软组织,constrast+100到+300骨头+700(软质骨)到+3000(皮质骨) 这是具体值。

计算方法像素值转换到hu值：Hounsfield Unit = pixel_value * rescale_slope + rescale_intercept

可以查看病人的扫描HU值分布情况代码代码如下：

# -*- coding:utf-8 -*-
'''
this script is used for basic process of lung 2017 in Data Science Bowl
'''
import glob
import os
import pandas as pd
import SimpleITK as sitk
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import skimage, os
from skimage.morphology import ball, disk, dilation, binary_erosion, remove_small_objects, erosion, closing, reconstruction, binary_closing
from skimage.measure import label,regionprops, perimeter
from skimage.morphology import binary_dilation, binary_opening
from skimage.filters import roberts, sobel
from skimage import measure, feature
from skimage.segmentation import clear_border
from skimage import data
from scipy import ndimage as ndi
import matplotlib
#matplotlib.use('Agg')
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import pydicom
import scipy.misc
import numpy as npdef load_scan(path):slices = [pydicom.read_file(path + '/' + s) for s in os.listdir(path)]slices.sort(key = lambda x: int(x.ImagePositionPatient[2]))try:slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])except:slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)for s in slices:s.SliceThickness = slice_thicknessreturn slicesdef get_pixels_hu(slices):image = np.stack([s.pixel_array for s in slices])# Convert to int16 (from sometimes int16),# should be possible as values should always be low enough (<32k)image = image.astype(np.int16)# Set outside-of-scan pixels to 0# The intercept is usually -1024, so air is approximately 0image[image == -2000] = 0# Convert to Hounsfield units (HU)for slice_number in range(len(slices)):intercept = slices[slice_number].RescaleInterceptslope = slices[slice_number].RescaleSlopeif slope != 1:image[slice_number] = slope * image[slice_number].astype(np.float64)image[slice_number] = image[slice_number].astype(np.int16)image[slice_number] += np.int16(intercept)return np.array(image, dtype=np.int16)first_patient = load_scan('E:/DcmData/xlc/Fracture_data/Me/3004276169/3302845/')
first_patient_pixels = get_pixels_hu(first_patient)
plt.hist(first_patient_pixels.flatten(), bins=80, color='c')
plt.xlabel("Hounsfield Units (HU)")
plt.ylabel("Frequency")
plt.show()

结果如下：

这里的读入数据是一个ct序列（dcm文件）就是在E:/DcmData/xlc/Fracture_data/Me/3004276169/3302845/路径下，由于是公司的数据这里就不共享了。转为luna16数据集该怎么处理呢。一直在寻找MHD格式数据的处理方法，对于dicom格式的CT有很多论文根据其HU值域可以轻易地分割肺、骨头、血液等，但是对于MHD没有这样的参考。从LUNA16论坛得到的解释是，LUNA16的MHD数据已经转换为HU值了。

图像值在[-1200,600]，然后缩放到[0,255],非肺区域设置为170.代码：

import SimpleITK as sitk
import matplotlib.pyplot as plt
import numpy as np
# -*- coding:utf-8 -*-
'''
this script is used for basic process of lung 2017 in Data Science Bowl
'''
import glob
import os
import pandas as pd
import SimpleITK as sitk
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import skimage, os
from skimage.morphology import ball, disk, dilation, binary_erosion, remove_small_objects, erosion, closing, reconstruction, binary_closing
from skimage.measure import label,regionprops, perimeter
from skimage.morphology import binary_dilation, binary_opening
from skimage.filters import roberts, sobel
from skimage import measure, feature
from skimage.segmentation import clear_border
from skimage import data
from scipy import ndimage as ndi
import matplotlib
#matplotlib.use('Agg')
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import pydicom
import scipy.misc
import numpy as npfilename='E:\\JLS\\dcm_data\\luna\\subsets\\subset1\\1.3.6.1.4.1.14519.5.2.1.6279.6001.173106154739244262091404659845.mhd'
#filename='E:\\DcmData\\xlc\\Fracture_data\\mhd_raw\\1.3.12.2.1107.5.1.4.75751.30000018103122041563700126965.mhd'
itkimage = sitk.ReadImage(filename)#读取.mhd文件
numpyImage = sitk.GetArrayFromImage(itkimage)#获取数据，自动从同名的.raw文件读取#numpyImage[numpyImage > -600] = 1
#numpyImage[numpyImage <= -600] = 0
def get_segmented_lungs(im, plot=False):'''This funtion segments the lungs from the given 2D slice.'''if plot == True:f, plots = plt.subplots(4, 2, figsize=(5, 40))'''Step 1: Convert into a binary image.'''binary = im < -600if plot == True:plots[0][0].axis('off')plots[0][0].set_title('binary image')plots[0][0].imshow(binary, cmap=plt.cm.bone)'''Step 2: Remove the blobs connected to the border of the image.'''cleared = clear_border(binary)if plot == True:plots[0][1].axis('off')plots[0][1].set_title('after clear border')plots[0][1].imshow(cleared, cmap=plt.cm.bone)'''Step 3: Label the image.'''label_image = label(cleared)if plot == True:plots[1][0].axis('off')plots[1][0].set_title('found all connective graph')plots[1][0].imshow(label_image, cmap=plt.cm.bone)'''Step 4: Keep the labels with 2 largest areas.'''areas = [r.area for r in regionprops(label_image)]areas.sort()if len(areas) > 2:for region in regionprops(label_image):if region.area < areas[-2]:for coordinates in region.coords:label_image[coordinates[0], coordinates[1]] = 0binary = label_image > 0if plot == True:plots[1][1].axis('off')plots[1][1].set_title(' Keep the labels with 2 largest areas')plots[1][1].imshow(binary, cmap=plt.cm.bone)'''Step 5: Erosion operation with a disk of radius 2. This operation isseperate the lung nodules attached to the blood vessels.'''selem = disk(2)binary = binary_erosion(binary, selem)if plot == True:plots[2][0].axis('off')plots[2][0].set_title('seperate the lung nodules attached to the blood vessels')plots[2][0].imshow(binary, cmap=plt.cm.bone)'''Step 6: Closure operation with a disk of radius 10. This operation isto keep nodules attached to the lung wall.'''selem = disk(10)binary = binary_closing(binary, selem)if plot == True:plots[2][1].axis('off')plots[2][1].set_title('keep nodules attached to the lung wall')plots[2][1].imshow(binary, cmap=plt.cm.bone)'''Step 7: Fill in the small holes inside the binary mask of lungs.'''edges = roberts(binary)binary = ndi.binary_fill_holes(edges)if plot == True:plots[3][0].axis('off')plots[3][0].set_title('Fill in the small holes inside the binary mask of lungs')plots[3][0].imshow(binary, cmap=plt.cm.bone)'''Step 8: Superimpose the binary mask on the input image.'''im=(255/1800 *im +1200*255/1800)get_high_vals = binary == 0im[get_high_vals] = 170im=np.rint(im)if plot == True:plots[3][1].axis('off')plots[3][1].set_title('Superimpose the binary mask on the input image')plots[3][1].imshow(im, cmap=plt.cm.bone)return im
def plot_ct_scan(scan):'''plot a few more images of the slices:param scan::return:'''f, plots = plt.subplots(int(scan.shape[0] / 20) + 1, 4, figsize=(50, 50))for i in range(0, scan.shape[0], 5):plots[int(i / 20), int((i % 20) / 5)].axis('off')plots[int(i / 20), int((i % 20) / 5)].imshow(scan[i])
for idx in range(1):print(len(numpyImage))plot_ct_scan(numpyImage)data = numpyImage[50]print(data[314][303])plt.figure(100)plt.imshow(data, cmap='gray')im=get_segmented_lungs(data, plot=True)print(im[7][7])print(im[314][303],np.max(np.max(im)),np.min(np.min(im)))plt.figure(200)plt.imshow(im, cmap='gray')
plt.show()

结果如下：

原图与最终结果如下：

没缩放到[0,255]时的最终结果与缩放之后的图是一致的，原图与最终结果中间颜色不太一致应该是画图自适应的缘故，中间部分值是一样的（没缩放到[0,255]之前）。

把-600hu作为阈值获得一个0-1二值三维图。并把其中的小块都补上，最终得到一个仅含肺区域三维图这步直接做卡在了参考的http://shartoo.github.io/medical_image_process/中measure.marching_cubes函数已经被移除无法画出3d图，结果很不直观，其实上面一步代码中已经用到-600hu作为阈值，而并把其中的小块都补上，最终得到一个仅含肺区域三维图这一步在二维层面也已经得到了。重新拼接起来就是3维。

其他链接外部组织的切片去掉这步比较麻烦，感觉还是在上面代码中根据肺部面积占总面积的比重来判断，上面代码中添加代码如下：

    '''Step 8: Superimpose the binary mask on the input image.'''sum=0for r in regionprops(label_image):sum=sum+r.areaproportion=sum/(512*512)LP.append(proportion)

函数添加一个LP=[] 参数，最后返回LP（lung proportion），最后要选取128*128*128.根据比重最大值来选那128个切片。

但是还有一个问题是还没有归一化尺度，感觉是要放之之前要做的。那么先归一化尺度为[1，0.5556，0.5556]（zxy）另一片技术文档。代码是根据上一篇博文中的参考中的代码修改的：

from os import listdir
import numpy as np
import Provider
from scipy import ndimage# Directories
baseSubsetDirectory = r'E:/JLS/dcm_data/luna/subsets/subset'
targetSubsetDirBase = r'E:/JLS/dcm_data/luna/Resampled_1_0.7_0.7/subsets/subset'RESIZE_SPACING = [1, 0.5556, 0.5556]
NUM_SUBSET = 10for setNumber in range(NUM_SUBSET):subsetDirectory = baseSubsetDirectory + str(setNumber)list = listdir(subsetDirectory)subsetList = []# Create Subset Listfor file in list:if file.endswith(".mhd"):subsetList.append(file)for file in subsetList:try:fileName = file[:-4]filePath = subsetDirectory + '/' + filevolumeImage, numpyOrigin, numpySpacing = Provider.load_itk_image(filePath)resize_factor = numpySpacing / RESIZE_SPACINGnew_real_shape = volumeImage.shape * resize_factor# volumeImage[volumeImage <= -600]= 0# volumeImage[volumeImage > -600] = 1new_shape = np.round(new_real_shape)real_resize = new_shape / volumeImage.shapenew_volume = ndimage.zoom(volumeImage, zoom=real_resize)

然后将两段代码拼接起来，博文太长了就先不放了，等下篇博文再放。获得128*128*128的立方体的代码也是。

LUNA16_Challange数据预处理2相关推荐

LUNA16_Challange数据预处理1
ps 搞了一段时间用自己的dcm数据生成LUNA16_Challange提供的数据样式,现在对LUNA16的数据有了更加直观的理解.现在继续搞LUNA16数据的预处理. LUNA16_Challang ...
LUNA16_Challange数据预处理4
1.导入官方mask Mask,origin,spacing,isflip = load_itk_image(os.path.join(luna_segment,name+'.mhd')) 2.求取左 ...
机器学习PAL数据预处理
机器学习PAL数据预处理本文介绍如何对原始数据进行数据预处理,得到模型训练集和模型预测集. 前提条件完成数据准备,详情请参见准备数据. 操作步骤登录PAI控制台. 在左侧导航栏,选择模型开发和训 ...
深度学习——数据预处理篇
深度学习--数据预处理篇文章目录深度学习--数据预处理篇一.前言二.常用的数据预处理方法零均值化(中心化) 数据归一化(normalization) 主成分分析(PCA.Principal ...
目标检测之Faster-RCNN的pytorch代码详解(数据预处理篇)
首先贴上代码原作者的github:https://github.com/chenyuntc/simple-faster-rcnn-pytorch(非代码作者,博文只解释代码) 今天看完了simple- ...
第七篇：数据预处理(四) - 数据归约(PCA/EFA为例)
前言这部分也许是数据预处理最为关键的一个阶段. 如何对数据降维是一个很有挑战,很有深度的话题,很多理论书本均有详细深入的讲解分析. 本文仅介绍主成分分析法(PCA)和探索性因子分析法(EFA),并给 ...
数据预处理--噪声_为什么数据对您的业务很重要-以及如何处理数据
数据预处理--噪声 YES! Data is extremely important for your business. 是! 数据对您的业务极为重要. A human body has five ...
数据预处理（完整步骤）
原文:http://dataunion.org/5009.html 一:为什么要预处理数据? (1)现实世界的数据是肮脏的(不完整,含噪声,不一致) (2)没有高质量的数据,就没有高质量的挖掘结果(高 ...
3D目标检测深度学习方法数据预处理综述
作者 | 蒋天元来源 | 3D视觉工坊(ID: QYong_2014) 这一篇的内容主要要讲一点在深度学习的3D目标检测网络中,我们都采用了哪些数据预处理的方法,主要讲两个方面的知识,第一个是rep ...

LUNA16_Challange数据预处理2

LUNA16_Challange数据预处理2相关推荐

最新文章

热门文章