Tensorflow2.6实现Unet结构神经网络(3D卷积)识别脑部肿瘤并实现模型并行

  • 说明
  • Unet神经网络
    • 网络结构
    • 代码实现
  • 模型训练
    • 训练环境
    • 数据加载处理
    • 训练
    • 训练结果
  • 模型并行版本
    • 模型拆分
    • 代码实现
    • 出现的问题

说明

以下神经网络结构和实验代码是从本人本科毕业设计中摘出来的,在基础上将模型的两部分放到两张GPU上进行模型并行训练,但是模型并行的版本并没有达到预期的效果,如果有大佬看到这篇文章,希望能指出错误,谢谢。

Unet神经网络

Unet 结构神经网络是通过卷积进行采样的,属于卷积神经网络的一种。在2015年在文章 U-Net: Convolutional Networks for Biomedical Image Segmentation 中被提出。

网络结构

采用3D卷积的方式实现 Unet 网络结构,原因是单个影像是 4 维的核磁共振影像,tensorflow2 实现的模型输入是 一定数量的影像,所以模型输入是一个 5 维的张量,张量形状是 (影像数量,影像维度1,影像维度2,影像维度3,影像维度4)。

网络结构如下。

图1 构建的3D卷积版本的 Unet 结构神经网络

层数 神经网络层 卷积核形状 输出张量形状
Encoder 1 batchnormalization (batch size , 240, 240, 155, 4)
2 conv3d (3,3,3) (batch size, 240, 240, 155, 8)
3 batchnormalization (batch size, 240, 240, 155, 8)
4 conv3d (3,3,3) (batch size, 240, 240, 155, 16)
5 batchnormalization (batch size, 240, 240, 155, 16)
6 conv3d (3,3,2) (batch size, 238, 238, 155, 16)
7 batchnormalization (batch size, 238, 238, 155, 16)
8 conv3d (3,3,1) (batch size, 118, 118, 77, 32)
9 batchnormalization (batch size, 118, 118, 77, 32)
10 conv3d (3,3,1) (batch size, 58, 58, 39, 64)
11 batchnormalization (batch size, 58, 58, 39, 64)
12 maxpooling3d (2,2,1) (batch size, 29, 29, 39, 64)
Decoder 13 batchnormalization (batch size,29,29,39,64)
14 upsampling3d (2,2,1) (batch size,58,58,39,64)
15 conv3dTranspose (3,3,1) (batch size,58,58,39,32)
16 concat (batch size,58,58,39,128)
17 batchnormalization (batch size,58,58,39,32)
18 upsampling3d (2,2,2) (batch size,116,116,78,64)
19 conv3dTranspose (3,3,1) (batch size,118,118,78,32)
20 conv3d (3,3,3) (batch size,116,116,76,32)
21 batchnormalization (batch size,116,116,76,32)
22 conv3dTranspose (3,3,2) (batch size,118,118,77,16)
23 concat (batch size,118,118,77,32)
24 batchnormalization (batch size,118,118,77,16)
25 upsampling3d (2,2,2) (batch size,236,236,154,16)
26 conv3dTranspose (3,3,1) (batch size,238,238,154,16)
27 concat (batch size,238,238,154,32)
28 batchnormalization (batch size,238,238,154,16)
29 conv3dTranspose (3,3,1) (batch size,240,240,154,8)
30 conv3dTranspose (1,1,5) (batch size,240,240,158,4)
31 conv3d (1,1,4) (batch size,240,240,155,1)

表1 构建的3D卷积版本的 Unet 结构神经网络每层信息

代码实现

  1. 框架加载

    from tensorflow.keras.layers import BatchNormalization,Conv3D,MaxPooling3D,Conv3DTranspose,UpSampling3D
    import tensorflow as tf
    
  2. encoder部分实现
    class unet_encoder(tf.keras.Model):def __init__(self):super(unet_encoder,self).__init__()self.b1 = BatchNormalization()self.conv1 = Conv3D(8,3,activation='relu',padding='same')self.b2 = BatchNormalization()self.conv2 = Conv3D(16,3,activation='relu',padding='same')self.b3 = BatchNormalization()self.conv3 = Conv3D(16,(3,3,2),activation='relu')self.b4 = BatchNormalization()self.conv4 = Conv3D(32,(3,3,1),activation='relu',strides=2)self.b5 = BatchNormalization()self.conv5 = Conv3D(64,(3,3,1),activation='relu',strides=2)self.b6 = BatchNormalization()self.maxpool1 = MaxPooling3D((2,2,1))def call(self,x,features):x = self.b1(x)x = self.conv1(x)x = self.b2(x)x = self.conv2(x)x = self.b3(x)# 第一个连接特征图x = self.conv3(x)x = self.b4(x)features.append(x)# 第二个连接特征图x = self.conv4(x)x = self.b5(x)features.append(x)# 第三个连接特征图x = self.conv5(x)x = self.b6(x)features.append(x)# 输出变量outputs = self.maxpool1(x)return outputs
    
  3. decoder部分实现
    class unet_decoder(tf.keras.Model):def __init__(self):super(unet_decoder,self).__init__()self.b1 = BatchNormalization()self.up1 = UpSampling3D((2,2,1))self.conv1tp = Conv3DTranspose(64,(3,3,1),activation='relu',padding='same')self.b2 = BatchNormalization()self.up2 = UpSampling3D((2,2,2))self.conv2tp = Conv3DTranspose(32,(3,3,1),activation='relu')self.conv2 = Conv3D(32,3,activation='relu')self.b3 = BatchNormalization()self.conv3tp = Conv3DTranspose(16,(3,3,2),activation='relu')self.b4 = BatchNormalization()self.up4 = UpSampling3D((2,2,2))self.conv4tp = Conv3DTranspose(16,(3,3,1),activation='relu')self.b5 = BatchNormalization()self.conv5tp = Conv3DTranspose(8,(3,3,1),activation='relu')self.conv6tp = Conv3DTranspose(4,(1,1,5),activation='relu')self.conv_out = Conv3D(1,(1,1,4),activation='relu')def call(self,x,features):x = self.b1(x)x = self.up1(x)x = self.conv1tp(x)x = tf.concat((features[-1],x),axis=-1)x = self.b2(x)x = self.up2(x)x = self.conv2tp(x)x = self.conv2(x)x = self.b3(x)x = self.conv3tp(x)x = tf.concat((features[-2],x),axis=-1)x = self.b4(x)x = self.up4(x)x = self.conv4tp(x)x = tf.concat((features[-3],x),axis=-1)x = self.b5(x)x = self.conv5tp(x)x = self.conv6tp(x)x = self.conv_out(x)outputs = xreturn outputs
    
  4. 结合两个部分的整体类
    class Unet3D(tf.keras.Model):def __init__(self,encoder,decoder):super(Unet3D,self).__init__()self.features = []self.encoder = encoderself.decoder = decoderdef call(self,x):x = self.encoder(x,self.features)outputs = self.decoder(x,self.features)return outputs
    

模型训练

训练环境

软件 版本
Python 3.8.11
Tensorflow 2.6.0-gpu
CUDA 11.2
cuDNN 8.1.0
nibabel 3.2.2

表2 主要的软件列表

处理器 型号 显存
GPU NVIDIA Geforce GTX 3090 24G

表3 显卡信息

数据加载处理

  1. 数据来源
    MSD脑瘤数据集(百度飞桨 AI Studio)

  2. 影像放缩
    使用nearest算法对影像的某些维度进行放缩(防止输入的影像张量形状与模型设计的输入张量形状不一致)。

    import nibabel as nib
    import numpy as np
    def nearest_4d(img,size):res = np.zeros(size)for i in range(res.shape[0]):for j in range(res.shape[1]):for k in range(res.shape[2]):idx = i*img.shape[0] // res.shape[0]idy = j*img.shape[1] // res.shape[1]idz = k*img.shape[2] // res.shape[2]res[i,j,k,:] = img[idx,idy,idz,:]return res
    
  3. 数据生成器
    采用生成器和迭代器的方式,从硬盘中读取一定数量影像数据至内存。

    # 按照数据文件路径以迭代器的方式读取数据
    class DataIterator:def __init__(self,image_paths,label_paths,size=None,transp_shape=[0,1,2,3],mode='nib'):self.image_paths = image_pathsself.label_paths = label_pathsself.size = sizeself.transp = transp_shapeself.mode=modedef read_and_resize(self,img_path,lbl_path):if self.mode=='nib':img = nib.load(img_path)lbl = nib.load(lbl_path)img = img.get_fdata(caching='fill', dtype='float32')lbl = lbl.get_fdata(caching='fill', dtype='float32')elif self.mode == 'np':img = np.load(img_path)lbl = np.load(lbl_path)else:return None,Noneimg /= np.max(img)lbl /= np.max(lbl)img = img.transpose(self.transp)if len(lbl.shape)<len(img.shape):lbl = np.expand_dims(lbl,axis=-1)lbl = lbl.transpose(self.transp)if self.size != None:if len(self.size) == 3:img = nearest_3d(img,self.size)lbl = nearest_3d(lbl,self.size)else:img = nearest_4d(img,self.size)lbl = nearest_4d(lbl,self.size)return img,lbldef __iter__(self):for img_path,lbl_path in zip(self.image_paths,self.label_paths):img,lbl = self.read_and_resize(img_path,lbl_path)if isinstance(img,np.ndarray) and isinstance(lbl,np.ndarray):yield (img,lbl)else:return
    # 数据生成器,因为训练用的标签数据少了一个维度,所以在返回数据对象之前给数据对象扩充维度
    class DataGenerator:def __init__(self,image_paths,label_paths,size=None,batch_size=32,transp_shape=[0,1,2,3],mode='nib'):dataiter = DataIterator(image_paths,label_paths,size,transp_shape,mode)self.batch_size = batch_sizeself.dataiter = iter(dataiter)def __iter__(self):while 1:i = 0imgs = []lbls = []for img,lbl in self.dataiter:imgs.append(img)lbls.append(lbl)i += 1if i >= self.batch_size:breakif i == 0:breakimgs = np.stack(imgs)lbls = np.stack(lbls)if len(imgs.shape) < 5:imgs = np.expand_dims(imgs,axis=-1)lbls = np.expand_dims(lbls,axis=-1)yield (imgs,lbls)
    

训练

  1. 依赖加载

    import tensorflow as tf
    from tensorflow.keras import losses,optimizers
    from model import unet_encoder,unet_decoder,Unet3Dfrom DataGenerator import DataGenerator
    from datetime import datetime
    from time import time
    import os
    
  2. 数据加载准备
    # 数据路径
    image_dir_path = './data/train/'
    label_dir_path = './data/labels/'images_paths = os.listdir(image_dir_path)
    labels_paths = os.listdir(label_dir_path)
    image_paths = [image_dir_path+p for p in images_paths]
    label_paths = [label_dir_path+p for p in labels_paths]
    
  3. 日志文件记录
    # 日志记录文件
    log1 = open('./log/epoch_file_form','w',encoding='utf-8')
    log2 = open('./log/step_file_form','w',encoding='utf-8')
    date_mark = str(datetime.now())
    log1.write(date_mark+'\n')
    log2.write(date_mark+'\n')
    
  4. 模型定义
    # 模型定义
    encoder_model = unet_encoder()
    decoder_model = unet_decoder()
    unet = Unet3D(encoder_model,decoder_model)
    unet.build(input_shape=(None,240,240,155,4))
    unet.summary()
    
  5. 优化器和损失函数
    优化器采用Adam算法,学习率1e-5,损失函数采用交叉熵函数(二分类)。

    # 设置优化器,损失函数
    optimizer = optimizers.Adam(learning_rate=1e-5)
    losser = losses.BinaryCrossentropy()
    
  6. 训练过程实现
    # 训练
    epochs = 30
    s1 = time()
    for i in range(epochs):s2 = time()loss_sum = 0step = 0datagener = iter(DataGenerator(image_paths,label_paths,None,1,[0,1,2,3]))for batch in datagener:s3 = time()step += 1x = batch[0]y = batch[1]with tf.GradientTape() as tape:out = unet(x)loss = losser(y_pred=out,y_true=y)grads = tape.gradient(loss,unet.trainable_variables)optimizer.apply_gradients(zip(grads,unet.trainable_variables))e3 = time()loss_sum += lossinfo_step = f'step:{step:03}\tloss:{loss}\t running time: {e3-s3:.3f} s'log2.write(info_step+'\n')print('                                                                             ',end='\r')print(info_step,end='\r')e2 = time()avg_loss = loss_sum/step if step != 0 else 'non samples'info_epoch = f'epoch {i+1:02}\t average loss {avg_loss}\t running time {e2-s2:.3f} s'log1.write(info_epoch+'\n')print('                                                                                ',end='\r')print(info_epoch)
    e1 = time()
    all_time = f'Training time {e1-s1:.3f} s'
    log1.write(all_time+' s\n')
    log2.write(all_time+' s\n')
    print(all_time)log1.close()
    log2.close()# 保存模型
    encoder_model.save_weights('./models/encoder_params_formal')
    decoder_model.save_weights('./models/decoder_params_formal')
    unet.save_weights('./models/unet_params_formal')

训练结果

  1. 30个epoch训练过程中,每个 epoch 训练平均损失值和训练时间可视化

图2 训练信息可视化

  1. 不同训练 epoch 训练出的模型输出 对比

图3 不同训练批次模型输出对比

模型并行版本

模型拆分

使用两块GPU,将 encoder 部分放置到GPU0上,decoder部分放置到GPU1上。

代码实现

  1. 模型实现

    from tensorflow.keras.layers import BatchNormalization,Conv3D,MaxPooling3D,Conv3DTranspose,UpSampling3D
    import tensorflow as tfdef copy_tensor_to_gpu(tensor,gpu_id):with tf.device(f'/gpu: {gpu_id}'):res = tf.zeros_like(tensor)res = res + tensorreturn res
    def copy_tensor_to_cpu(tensor,cpu_id):with tf.device(f'/cpu: {cpu_id}'):res = tf.zeros_like(tensor,cpu_id)res = res + tensorreturn resclass unet_encoder(tf.keras.Model):def __init__(self):super(unet_encoder,self).__init__()self.b1 = BatchNormalization()self.conv1 = Conv3D(8,3,activation='relu',padding='same')self.b2 = BatchNormalization()self.conv2 = Conv3D(16,3,activation='relu',padding='same')self.b3 = BatchNormalization()self.conv3 = Conv3D(16,(3,3,2),activation='relu')self.b4 = BatchNormalization()self.conv4 = Conv3D(32,(3,3,1),activation='relu',strides=2)self.b5 = BatchNormalization()self.conv5 = Conv3D(64,(3,3,1),activation='relu',strides=2)self.b6 = BatchNormalization()self.maxpool1 = MaxPooling3D((2,2,1))def call(self,x,features,gpu_id):x = self.b1(x)x = self.conv1(x)x = self.b2(x)x = self.conv2(x)x = self.b3(x)# 第一个连接特征图x = self.conv3(x)x = self.b4(x)features[0] = copy_tensor_to_gpu(x,gpu_id)# 第二个连接特征图x = self.conv4(x)x = self.b5(x)features[1] = copy_tensor_to_gpu(x,gpu_id)# 第三个连接特征图x = self.conv5(x)x = self.b6(x)features[2] = copy_tensor_to_gpu(x,gpu_id)# 输出变量outputs = self.maxpool1(x)return outputsclass unet_decoder(tf.keras.Model):def __init__(self):super(unet_decoder,self).__init__()self.b1 = BatchNormalization()self.up1 = UpSampling3D((2,2,1))self.conv1tp = Conv3DTranspose(64,(3,3,1),activation='relu',padding='same')self.b2 = BatchNormalization()self.up2 = UpSampling3D((2,2,2))self.conv2tp = Conv3DTranspose(32,(3,3,1),activation='relu')self.conv2 = Conv3D(32,3,activation='relu')self.b3 = BatchNormalization()self.conv3tp = Conv3DTranspose(16,(3,3,2),activation='relu')self.b4 = BatchNormalization()self.up4 = UpSampling3D((2,2,2))self.conv4tp = Conv3DTranspose(16,(3,3,1),activation='relu')self.b5 = BatchNormalization()self.conv5tp = Conv3DTranspose(8,(3,3,1),activation='relu')self.conv6tp = Conv3DTranspose(4,(1,1,5),activation='relu')self.conv_out = Conv3D(1,(1,1,4),activation='relu')def call(self,x,features):x = self.b1(x)x = self.up1(x)x = self.conv1tp(x)x = tf.concat((features[-1],x),axis=-1)x = self.b2(x)x = self.up2(x)x = self.conv2tp(x)x = self.conv2(x)x = self.b3(x)x = self.conv3tp(x)x = tf.concat((features[-2],x),axis=-1)x = self.b4(x)x = self.up4(x)x = self.conv4tp(x)x = tf.concat((features[-3],x),axis=-1)x = self.b5(x)x = self.conv5tp(x)x = self.conv6tp(x)x = self.conv_out(x)outputs = xreturn outputsclass Unet3DParallel(tf.keras.Model):def __init__(self,gpu_group):super(Unet3DParallel,self).__init__()self.gpus = gpu_groupwith tf.device(f'/gpu:{gpu_group[1]}'):self.features = [None for i in range(3)]with tf.device(f'/gpu:{gpu_group[0]}'):self.encoder = unet_encoder()with tf.device(f'/gpu:{gpu_group[1]}'):self.decoder = unet_decoder()def call(self,x):x = self.encoder(x,self.features,self.gpus[1])outputs = self.decoder(x,self.features)return outputs
    
  2. 训练过程同上,但是开启GPU显存使用增长
    import tensorflow as tf
    gpus = tf.config.experimental.list_physical_devices('GPU')
    for gpu in gpus:tf.config.experimental.set_memory_growth(gpu,True)
    

出现的问题

  1. 模型两部分可训练参数不同,encoder 少于 decoder,但是GPU0使用显存、使用率和功耗均多于GPU1。

    # 模型可训练参数统计
    Model: "unet3d_parallel"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #
    =================================================================
    unet_encoder (unet_encoder)  multiple                  32664
    _________________________________________________________________
    unet_decoder (unet_decoder)  multiple                  121373
    =================================================================
    Total params: 154,037
    Trainable params: 153,149
    Non-trainable params: 888
    _________________________________________________________________
    # 显卡使用情况监控
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.94       Driver Version: 470.94       CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  On   | 00000000:3E:00.0 Off |                  N/A |
    | 59%   61C    P2   211W / 350W |  23746MiB / 24268MiB |     67%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   1  NVIDIA GeForce ...  On   | 00000000:88:00.0 Off |                  N/A |
    | 46%   56C    P2   120W / 350W |   4504MiB / 24268MiB |     22%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
  2. 按理来说,模型差分后,两个GPU应该形成流水线,从训练第二步开始,单步训练时间应该少于单GPU训练单步时间。但是实验结果相反,单GPU每步训练时间大约在 0.7s 左右,模型拆分后单步训练时间却在 1.2s 左右(由于换了机器,训练时间跟上面训练信息可视化的图表时间不同)。

Tensorflow2.6实现Unet结构神经网络(3D卷积)识别脑部肿瘤并实现模型并行相关推荐

  1. 3D 卷积神经网络 视频动作识别

    转自:http://blog.csdn.net/AUTO1993/article/details/70948249 https://zhuanlan.zhihu.com/p/25912625 http ...

  2. 卷积神经网络(2D卷积神经网络和3D卷积神经网络理解)

    前言 卷积神经⽹络(convolutional neural network,CNN)是⼀类强⼤的神经⽹络,正是为处理图像 数据而设计的.基于卷积神经⽹络结构的模型在计算机视觉领域中已经占主导地位,当 ...

  3. 基于U-Net的递归残差卷积神经网络在医学图像分割中的应用

    转载: 版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明. 本文链接:https://blog.csdn.net/weixin_45723705/ ...

  4. 卷积神经网络语音识别_用于物体识别的3D卷积神经网络

    本文提出了一种基于CNN的3D物体识别方法,能够从3D图像表示中识别3D物体,并在比较了不同的体素时的准确性.已有文献中,3D CNN使用3D点云数据集或者RGBD图像来构建3D CNNs,但是CNN ...

  5. 3D卷积神经网络详解

    1 3d卷积的官方详解 2 2D卷积与3D卷积 1)2D卷积 2D卷积:卷积核在输入图像的二维空间进行滑窗操作. 2D单通道卷积 对于2维卷积,一个3*3的卷积核,在单通道图像上进行卷积,得到输出的动 ...

  6. 多时间尺度 3D 卷积神经网络的步态识别

    多时间尺度 3D 卷积神经网络的步态识别 论文题目:Gait Recognition with Multiple-Temporal-Scale 3D Convolutional Neural Netw ...

  7. python图像人类检测_OpenCV人类行为识别(3D卷积神经网络)

    1. 3D卷积神经网络 相比于2D 卷积神经网络,3D卷积神经网络更能很好的利用视频中的时序信息.因此,其主要应用视频.行为识别等领域居多.3D卷积神经网络是将时间维度看成了第三维. 人类行为识别的实 ...

  8. GNN-图卷积模型-2016:PATCHY-SAN【图结构序列化:将图结构转换成了序列结构,然后直接利用卷积神经网络在转化成的序列结构上做卷积】

    我们之前曾提到卷积神经网络不能应用在图结构上是因为图是非欧式空间,所以大部分算法都沿着找到适用于图的卷积核这个思路来走. 而 PATCHY-SAN 算法 <Learning Convolutio ...

  9. 3D点云初探:基于全卷积神经网络实现3D物体识别

    基于全卷积神经网络实现3D物体识别 一.从2D图像识别到3D物体识别 二.ModelNet10:3D CAD数据集 1.存储格式 2.读取方法 3.点云可视化 可视化工具 plt可视化 4.数据集定义 ...

最新文章

  1. 为创业者保驾护航 “无安全 不创业” 安全狗全国路演北京站
  2. 大数据时代,如何根据业务选择合适的分布式框架
  3. Sebastian Ruder 发文:Benchmark 的挑战与机遇!
  4. Linux 格式化输出当前系统时间
  5. normalize.css
  6. Codeforces Round #762 (Div. 3)
  7. java开发岗位招聘,吊打面试官
  8. php中重写和final关键字的使用
  9. typescript的基本类型
  10. C语言以字符形式读写文件
  11. Tricks(三十四)—— 判断某一属性列是数值型还是标称型
  12. html语言个人网页,个人网页介绍家乡纯html
  13. windows 下安装图片标注软件 labelling和出错解决
  14. html计算梯形的面积,梯形的面积计算
  15. ream完美转换XML、JSON 转载
  16. 50、ubuntu18.0420.04+CUDA11.1+cudnn11.3+TensorRT7.2/8.6+Deepsteam5.1+vulkan环境搭建和YOLO5部署
  17. MySQL Workbench 已停止工作 错误模块名称: KERNELBASE.dll 异常代码: 0xe0434352 程序无法正常启动:( 0xc000007b)
  18. element UI中table操作栏更多按钮展示与折叠的实现
  19. coredns hosts插件
  20. 基于.NET平台常用的框架整理

热门文章

  1. 跟着小马哥学系列之 Spring AOP(Pointcut 组件详解)
  2. (转)C#中两个问号和一个问号 ??
  3. 极客日报第123期:华为鸿蒙商标被驳回复审;《王者荣耀》蝉联全球手游畅销榜冠军;苹果承认iOS 14.5.1令旧款iPhone性能下降
  4. 摄影口诀--针对不同情景
  5. 机械硬盘(HDD)与固态硬盘(SSD)
  6. 为什么很多人家里不准备尺子了?手机怎么测量长度?
  7. 【报告分享】2020快手母婴生态报告-快手大数据研究院(附下载)
  8. 一个和尚挑水吃,两个和尚抬水吃,三个和尚没水吃
  9. 房租租赁租房系统都包含哪些功能?
  10. CF869C The Intriguing Obsession 题解