PyTorch Data Parrallel数据并行

PyTorch Data Parrallel数据并行

• 可选择：数据并行处理
• 本文将学习如何用 DataParallel 来使用多 GPU。通过 PyTorch 使用多个 GPU 非常简单。可以将模型放在一个 GPU：
• device = torch.device(“cuda:0”)
• model.to(device)
• 可以复制所有的张量到 GPU：
•
• mytensor = my_tensor.to(device)
•
• 调用 my_tensor.to(device) 返回一个 my_tensor，新的复制在GPU上，而不是重写 my_tensor。需要分配一个新的张量并且在 GPU 上使用这个张量。
• 在多 GPU 中执行前馈，后继操作是非常自然的。尽管如此，PyTorch 默认只会使用一个 GPU。通过使用 DataParallel 让模型并行运行，可以很容易的在多 GPU 上运行操作。
• model = nn.DataParallel(model)
•
• 这是整个教程的核心，接下来将会详细讲解。引用和参数
• 引入 PyTorch 模块和定义参数
• import torch
• import torch.nn as nn
• from torch.utils.data import Dataset, DataLoader
•
• 参数
• input_size = 5
• output_size = 2
•
• batch_size = 30
• data_size = 100
• 设备
• device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
•
• 实验（玩具）数据
• 生成一个玩具数据。只需要实现 getitem.
•
• class RandomDataset(Dataset):
•
• def init(self, size, length):
• self.len = length
• self.data = torch.randn(length, size)
•
• def getitem(self, index):
• return self.data[index]
•
• def len(self):
• return self.len
•
• rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),batch_size=batch_size, shuffle=True)
• 简单模型
• 做一个小 demo，模型只是获得一个输入，执行一个线性操作，然后给一个输出。可以使用 DataParallel 在任何模型(CNN, RNN, Capsule Net 等等.)
• 放置了一个输出声明在模型中，检测输出和输入张量的大小。在 batch rank 0 中的输出。
•
• class Model(nn.Module):
• # Our model
•
• def init(self, input_size, output_size):
• super(Model, self).init()
• self.fc = nn.Linear(input_size, output_size)
•
• def forward(self, input):
• output = self.fc(input)
• print("\tIn Model: input size", input.size(),
• “output size”, output.size())
•
• return output
• 创建模型并且数据并行处理
• 这是本文的核心。首先需要一个模型的实例，然后验证是否有多个 GPU。如果有多个 GPU，可以用 nn.DataParallel 来包裹模型。然后使用 model.to(device) 把模型放到多 GPU 中。
• model = Model(input_size, output_size)
• if torch.cuda.device_count() > 1:
• print(“Let’s use”, torch.cuda.device_count(), “GPUs!”)
• # dim = 0 [30, xxx] -> [10, …], [10, …], [10, …] on 3 GPUs
• model = nn.DataParallel(model)
•
• model.to(device)
• 输出：
•
• Let’s use 2 GPUs!
•
• 运行模型：现在可以看到输入和输出张量的大小了。
• for data in rand_loader:
• input = data.to(device)
• output = model(input)
• print(“Outside: input size”, input.size(),
• “output_size”, output.size())
• 输出：
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
• In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
• Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
• 结果：
• 如果没有 GPU 或者只有一个 GPU，当获取 30 个输入和 30 个输出，模型将期望获得 30 个输入和 30 个输出。但是如果有多个 GPU ，会获得这样的结果。
• 多 GPU
• 如果有 2 个GPU，会看到：
• # on 2 GPUs
• Let’s use 2 GPUs!
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
• In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
• Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
•
• 如果有 3个GPU，会看到：
• Let’s use 3 GPUs!
• In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
• In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
• In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
• In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
• In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
• In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
• In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
• Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
• 如果有 8个GPU，会看到：
• Let’s use 8 GPUs!
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
• In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
• Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
• In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
• In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
• In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
• In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
• In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
• Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
• 总结
• 数据并行自动拆分了数据并，将任务单发送到多个 GPU 上。当每一个模型都完成任务之后，DataParallel 收集并且合并这些结果，然后再返回。

PyTorch Data Parrallel数据并行相关推荐

PyTorch如何加速数据并行训练？分布式秘籍大揭秘
本文转载自机器之心. 选自arXiv 作者:Shen Li等机器之心编译参与:小舟.杜伟 PyTorch 在学术圈里已经成为最为流行的深度学习框架,如何在使用 PyTorch 时实现高效的并行化? ...
深度学习之pytorch(二) 数据并行
又是好久没更新博客,最近琐事缠身,写文档写到吐.没时间学习新的知识,刚空闲下来立刻就学习之前忘得差不多得Pytorch.Pytorch和tensorflow差不多,具体得就不多啰嗦了,觉得还有疑问的童 ...
pytorch快速上手（9）-----多GPU数据并行训练方法
文章目录总览 1. 必知概念代码示例 1. DP(torch.nn.DataParallel) 2. DDP(torch.nn.parallel.DistributedDataParallel) ...
PyTorch—torch.utils.data.DataLoader 数据加载类
文章目录 DataLoader(object)类: _DataLoaderIter(object)类 __next__函数 pin_memory_batch() _get_batch函数 _proce ...
Pytorch中的数据加载
Pytorch中的数据加载 1. 模型中使用数据加载器的目的在前面的线性回归模型中,使用的数据很少,所以直接把全部数据放到模型中去使用. 但是在深度学习中,数据量通常是都非常多,非常大的,如此大量的 ...
PyTorch 深度剖析：并行训练的 DP 和 DDP 分别在什么情况下使用及实例
↑ 点击蓝字关注极市平台作者丨科技猛兽编辑丨极市平台极市导读这篇文章从应用的角度出发,介绍 DP 和 DDP 分别在什么情况下使用,以及各自的使用方法.以及 DDP 的保存和加载模型的策略, ...
PyTorch基础-自定义数据集和数据加载器（2）
处理数据样本的代码可能会变得混乱且难以维护: 理想情况下,我们想要数据集代码与模型训练代码解耦,以获得更好的可读性和模块化.PyTorch 域库提供了许多预加载的数据(例如 FashionMNIST) ...
pytorch中的数据加载(dataset基类，以及pytorch自带数据集）
目录 pytorch中的数据加载模型中使用数据加载器的目的数据集类 Dataset基类介绍数据加载案例数据加载器类 pytorch自带的数据集 torchvision.datasets MIN ...
torch dataloader 数据并行_PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）
0 写在前面这篇文章是我做实验室组会汇报的时候顺带整理的文档,在1-3部分参考了很多知乎文章,感谢这些大佬们的工作,所以先贴出Reference,本篇文章结合了这些内容,加上了我的一些理解,不足之处 ...

PyTorch Data Parrallel数据并行

PyTorch Data Parrallel数据并行相关推荐

最新文章

热门文章