ps 直接上好像有点困难,那么先整理下LUNA16_Challange中平安科技公司的技术说明中预处理部分(还是比较好理解,理解错误欢迎指正)

Data Preprocessing

At first, we get the lung area by using traditional methods, and then preprocessing is performed on the lung area. Use -600HU as a threshold to get a 0-1 3D map, based on the size of the 3D map block and the average distance traveled by the tile to the center, and make up for all small cave depths. The final 0-1 three-dimensional map is the lung area. As the CT picture up and down there will be some slices connected with the outside world, should be removed. The final image pixel values are clip to [-1200,600], then zoom to [0,255]. Pixels for non-lung regions are set to 170. Pretreatment can eliminate the noise, such as the bright spots of the bones, the metal lines of the CT bed. And finally, we get 128*128*128 cube.

For false positive reduction track, we use a multi-scale strategy, and prepare two sizes little cube: 36*48*48 and 20*36*36. We crop little cube from the whole lung area to feed into the two classification networks. Obtain different fields of vision size of nodules to predict the overall results.

More importantly, the training dataset has extremely high false positive to true positive ratio (735418:1557). To solve the problem of category imbalance, in addition to the focal loss function, we used oversampling to increase the number of positive samples. Specific methods are sliding window crop, flip respect to x-axis, y-axis and z-axis, rotate 90, 180, 270 degrees, multi-scale transform. Finally, we expanded the positive sample more than 300 times.




substance HU 空气-1000肺-500脂肪-100到-50水0CSF15肾30血液+30到+45肌肉+10到+40灰质+37到+45白质+20到+30Liver+40到+60软组织,constrast+100到+300骨头+700(软质骨)到+3000(皮质骨) 这是具体值。

计算方法像素值转换到hu值:Hounsfield Unit = pixel_value * rescale_slope + rescale_intercept


# -*- coding:utf-8 -*-
this script is used for basic process of lung 2017 in Data Science Bowl
import glob
import os
import pandas as pd
import SimpleITK as sitk
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import skimage, os
from skimage.morphology import ball, disk, dilation, binary_erosion, remove_small_objects, erosion, closing, reconstruction, binary_closing
from skimage.measure import label,regionprops, perimeter
from skimage.morphology import binary_dilation, binary_opening
from skimage.filters import roberts, sobel
from skimage import measure, feature
from skimage.segmentation import clear_border
from skimage import data
from scipy import ndimage as ndi
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import pydicom
import scipy.misc
import numpy as npdef load_scan(path):slices = [pydicom.read_file(path + '/' + s) for s in os.listdir(path)]slices.sort(key = lambda x: int(x.ImagePositionPatient[2]))try:slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])except:slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)for s in slices:s.SliceThickness = slice_thicknessreturn slicesdef get_pixels_hu(slices):image = np.stack([s.pixel_array for s in slices])# Convert to int16 (from sometimes int16),# should be possible as values should always be low enough (<32k)image = image.astype(np.int16)# Set outside-of-scan pixels to 0# The intercept is usually -1024, so air is approximately 0image[image == -2000] = 0# Convert to Hounsfield units (HU)for slice_number in range(len(slices)):intercept = slices[slice_number].RescaleInterceptslope = slices[slice_number].RescaleSlopeif slope != 1:image[slice_number] = slope * image[slice_number].astype(np.float64)image[slice_number] = image[slice_number].astype(np.int16)image[slice_number] += np.int16(intercept)return np.array(image, dtype=np.int16)first_patient = load_scan('E:/DcmData/xlc/Fracture_data/Me/3004276169/3302845/')
first_patient_pixels = get_pixels_hu(first_patient)
plt.hist(first_patient_pixels.flatten(), bins=80, color='c')
plt.xlabel("Hounsfield Units (HU)")




    '''Step 8: Superimpose the binary mask on the input image.'''sum=0for r in regionprops(label_image):sum=sum+r.areaproportion=sum/(512*512)LP.append(proportion)

函数添加一个LP=[] 参数,最后返回LP(lung proportion),最后要选取128*128*128.根据比重最大值来选那128个切片。


from os import listdir
import numpy as np
import Provider
from scipy import ndimage# Directories
baseSubsetDirectory = r'E:/JLS/dcm_data/luna/subsets/subset'
targetSubsetDirBase = r'E:/JLS/dcm_data/luna/Resampled_1_0.7_0.7/subsets/subset'RESIZE_SPACING = [1, 0.5556, 0.5556]
NUM_SUBSET = 10for setNumber in range(NUM_SUBSET):subsetDirectory = baseSubsetDirectory + str(setNumber)list = listdir(subsetDirectory)subsetList = []# Create Subset Listfor file in list:if file.endswith(".mhd"):subsetList.append(file)for file in subsetList:try:fileName = file[:-4]filePath = subsetDirectory + '/' + filevolumeImage, numpyOrigin, numpySpacing = Provider.load_itk_image(filePath)resize_factor = numpySpacing / RESIZE_SPACINGnew_real_shape = volumeImage.shape * resize_factor# volumeImage[volumeImage <= -600]= 0# volumeImage[volumeImage > -600] = 1new_shape = np.round(new_real_shape)real_resize = new_shape / volumeImage.shapenew_volume = ndimage.zoom(volumeImage, zoom=real_resize)



