torch.cuda

参考 torch.cuda - 云+社区 - 腾讯云

torch.cuda.current_blas_handle()

torch.cuda.current_device()

torch.cuda.current_stream(device=None)

torch.cuda.default_stream(device=None)

class torch.cuda.device(device)

torch.cuda.device_count()

class torch.cuda.device_of(obj)

torch.cuda.empty_cache()

torch.cuda.get_device_capability(device=None)

torch.cuda.get_device_name(device=None)

torch.cuda.init()

torch.cuda.ipc_collect()

torch.cuda.is_available()

torch.cuda.max_memory_allocated(device=None)

torch.cuda.max_memory_cached(device=None)

torch.cuda.memory_allocated(device=None)

torch.cuda.memory_cached(device=None)

torch.cuda.reset_max_memory_allocated(device=None)

torch.cuda.reset_max_memory_cached(device=None)

torch.cuda.set_device(device)

torch.cuda.stream(stream)

torch.cuda.synchronize(device=None)

Random Number Generator

torch.cuda.get_rng_state(device='cuda')

torch.cuda.get_rng_state_all()

torch.cuda.set_rng_state(new_state, device='cuda')

torch.cuda.set_rng_state_all(new_states)

torch.cuda.manual_seed(seed)

torch.cuda.seed()

torch.cuda.seed_all()

torch.cuda.initial_seed()

Communication collectives

torch.cuda.comm.broadcast(tensor, devices)

torch.cuda.comm.broadcast_coalesced(tensors, devices, buffer_size=10485760)

torch.cuda.comm.reduce_add(inputs, destination=None)

torch.cuda.comm.scatter(tensor, devices, chunk_sizes=None, dim=0, streams=None)

torch.cuda.comm.gather(tensors, dim=0, destination=None)

Streams and events

class torch.cuda.Stream

query()

record_event(event=None)

synchronize()

wait_event(event)

wait_stream(stream)

class torch.cuda.Event

elapsed_time(end_event)

ipc_handle(）

query()

record(stream=None)

synchronize()

wait(stream=None)

Memory management

torch.cuda.empty_cache()

torch.cuda.memory_allocated(device=None)

torch.cuda.max_memory_allocated(device=None)

torch.cuda.reset_max_memory_allocated(device=None)

torch.cuda.memory_cached(device=None)

torch.cuda.max_memory_cached(device=None)

torch.cuda.reset_max_memory_cached(device=None)

NVIDIA Tools Extension (NVTX)

torch.cuda.nvtx.mark(msg)

torch.cuda.nvtx.range_push(msg)

torch.cuda.nvtx.range_pop()

这个包增加了对CUDA张量类型的支持，它实现了与CPU张量相同的功能，但是它们利用gpu进行计算。它是惰性初始化的，所以你总是可以导入它，并使用is_available()来确定您的系统是否支持CUDA。CUDA语义提供了更多关于使用CUDA的细节。

torch.cuda.current_blas_handle()

返回指向当前cuBLAS句柄的cublasHandle_t指针。

torch.cuda.current_device()

返回当前选定设备的索引。

torch.cuda.current_stream(device=None)

返回给定设备当前选定的流。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备当前选择的流，如果设备为None(默认)，则由current_device()给出。

`torch.cuda.default_stream`(device=None)

返回给定设备的默认流。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的默认流，如果设备为None(默认)，则由current_device()提供。

class `torch.cuda.device`(device)

更改所选设备的上下文管理器。

参数：

device (torch.device or int) – 要选择的设备索引。如果这个参数是负整数或None，那么它就是no-op。

`torch.cuda.device_count`()

返回可用的gpu数量。

class `torch.cuda.device_of`(obj)

将当前设备更改为给定对象的设备的上下文管理器。您可以同时使用张量和存储作为参数。如果一个给定的对象没有分配在GPU上，这是一个no-op。

参数：

obj (Tensor or Storage) – 在选定设备上分配的对象。

`torch.cuda.empty_cache`()

释放缓存分配器当前持有的所有未占用的缓存内存，以便这些内存可以在其他GPU应用程序中使用，并在nvidia-smi中可见。

注意

empty_cache()不会增加PyTorch可用的GPU内存。有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.get_device_capability`(device=None)

获取设备的cuda功能。

参数：

device (torch.device or int, optional) – 用于返回设备功能的设备。如果这个参数是一个负整数，那么这个函数就是no-op。如果设备为None(默认)，则使用current_device()提供的当前设备。

返回值：

主要和次要cuda功能的设备，返回类型。

tuple(int, int)

`torch.cuda.get_device_name`(device=None)

获取设备的名称。

参数：

device (torch.device or int, optional) – 用于返回名称的设备。如果这个参数是一个负整数，那么这个函数就是no-op。如果设备为None(默认)，则使用current_device()提供的当前设备。

`torch.cuda.init`()

初始化PyTorch的CUDA状态。如果您通过PyTorch的C API与它进行交互，可能需要显式地调用这个函数，因为在初始化之前，CUDA功能的Python绑定不会这样做。普通用户不应该需要这样做，因为PyTorch的所有CUDA方法都会根据需要自动初始化CUDA状态。如果CUDA状态已经初始化，则不执行任何操作。

`torch.cuda.ipc_collect`()

Force在CUDA IPC释放GPU内存后收集GPU内存。

注意：

检查是否有任何已发送的CUDA张量可以从内存中清除。如果没有活动计数器，则强制关闭用于引用计数的共享内存文件。当生成器进程停止主动发送张量并希望释放未使用的内存时，此函数非常有用。

`torch.cuda.is_available`()

返回一个bool，指示CUDA当前是否可用。

`torch.cuda.max_memory_allocated`(device=None)

返回给定设备张量占用的最大GPU内存(以字节为单位)。默认情况下，这将返回自该程序开始以来分配的内存峰值。reset_max_memory_assigned()可用于重置跟踪此指标的起始点。例如，这两个函数可以测量训练循环中每个迭代的分配内存使用量峰值。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的统计信息，由current_device()给出，如果设备为None(缺省值)。

注意

有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.max_memory_cached`(device=None)

返回缓存分配器为给定设备管理的最大GPU内存(以字节为单位)。默认情况下，这将返回自该程序开始以来的峰值缓存内存。reset_max_memory_cached()可用于重置跟踪此指标的起始点。例如，这两个函数可以测量训练循环中每个迭代的峰值缓存内存量。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的统计信息，由current_device()给出，如果设备为None(缺省值)。

注意

有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.memory_allocated`(device=None)

返回给定设备的张量占用的当前GPU内存(以字节为单位)。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的统计信息，由current_device()给出，如果设备为None(缺省值)。

注意

这可能比nvidia-smi中显示的要少，因为缓存分配器可以保存一些未使用的内存，并且需要在GPU上创建一些上下文。有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.memory_cached`(device=None)

返回缓存分配器为给定设备管理的当前GPU内存(以字节为单位)。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的统计信息，由current_device()给出，如果设备为None(缺省值)。

注意：

有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.reset_max_memory_allocated`(device=None)

重置跟踪给定设备的张量占用的最大GPU内存的起始点。有关详细信息，请参见max_memory_assigned()。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的统计信息，由current_device()给出，如果设备为None(缺省值)。

注意

有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.reset_max_memory_cached`(device=None)

重置跟踪由给定设备的缓存分配器管理的最大GPU内存的起始点。有关详细信息，请参见max_memory_cached()。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的统计信息，由current_device()给出，如果设备为None(缺省值)。

注意

有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.set_device`(device)

设置当前设备。这个功能的使用是不鼓励有利于设备。在大多数情况下，最好使用CUDA_VISIBLE_DEVICES环境变量。

参数：

device (torch.device or int) – 选定的设备。如果这个参数是负数，这个函数就是no-op。

`torch.cuda.stream`(stream)

选择给定流的上下文管理器。在其上下文中排队的所有CUDA内核都将在选定的流上排队。

参数：

stream (Stream) – selected stream. This manager is a no-op if it’s None.

注意

流是种每设备。如果所选的流不在当前设备上，此函数还将更改当前设备以匹配流。

`torch.cuda.synchronize`(device=None)

等待CUDA设备上所有流中的所有内核完成。

参数：

device (torch.device or int, optional) – 用于同步的设备。如果设备为None(默认)，则使用current_device()提供的当前设备。

Random Number Generator

`torch.cuda.get_rng_state`(device='cuda')

以字节张量的形式返回指定GPU的随机数生成器状态。

参数：

device (torch.device or int, optional) – 返回的RNG状态的设备。默认值:“cuda”(即torch.device('cuda')，当前cuda设备)。

警告

这个函数急切地初始化CUDA。

`torch.cuda.get_rng_state_all`()

返回一个字节张量元组，表示所有设备的随机数状态。

`torch.cuda.set_rng_state`(new_state, device='cuda')

设置指定GPU的随机数生成器状态。

参数：

new_state (torch.ByteTensor) – 理想的状态
device (torch.device or int, optional) – 设置RNG状态的设备。默认值:“cuda”(即。， torch.device('cuda')，当前cuda设备)。

`torch.cuda.set_rng_state_all`(new_states)

设置所有设备的随机数生成器状态。

参数：

new_state (tuple of torch.ByteTensor) – 每个设备所需的状态。

`torch.cuda.manual_seed`(seed)

设置为当前GPU生成随机数的种子。如果CUDA不可用，调用这个函数是安全的;在这种情况下，它将被静静地忽略。

参数：

seed (int) ：需要的种子

警告

如果您使用的是多gpu模型，这个函数不足以获得确定性。要为所有gpu播种，请使用manual_seed_all()。

torch.cuda.manual_seed_all(seed)[source]

设置在所有gpu上生成随机数的种子。如果CUDA不可用，调用这个函数是安全的;在这种情况下，它将被静静地忽略。

参数

seed (int) – The desired seed.

`torch.cuda.seed`()

将生成随机数的种子设置为当前GPU的随机数。如果CUDA不可用，调用这个函数是安全的;在这种情况下，它将被静静地忽略。

警告

如果你使用的是多GPU模型，这个函数只会在一个GPU上初始化种子。要初始化所有gpu，请使用seed_all()。

`torch.cuda.seed_all`()

将生成随机数的种子设置为所有gpu上的随机数。如果CUDA不可用，调用这个函数是安全的;在这种情况下，它将被静静地忽略。

`torch.cuda.initial_seed`()

返回当前GPU的当前随机种子。

警告

这个函数急切地初始化CUDA。

Communication collectives

`torch.cuda.comm.broadcast`(tensor, devices)

向多个gpu广播一个张量。

参数：

tensor (Tensor) – 张量广播。
devices (Iterable) – 广播设备的一种可迭代的设备。注意，它应该类似于(src, dst1, dst2，…)，其中的第一个元素是要广播的源设备。

返回值：

一个包含张量副本的元组，放在与张量的指标相对应的设备上。

`torch.cuda.comm.broadcast_coalesced`(tensors, devices, buffer_size=10485760)

向指定的gpu广播序列张量。首先将小张量合并到缓冲区中，以减少同步的数量。

参数：

tensors (sequence) – 张量广播。
devices (Iterable) – 广播设备的一种可迭代的设备。注意，它应该类似于(src, dst1, dst2，…)，其中的第一个元素是要广播的源设备。
buffer_size (int) – 用于合并的缓冲区的最大大小

返回值：

一个包含张量副本的元组，放在与张量的指标相对应的设备上。

`torch.cuda.comm.reduce_add`(inputs, destination=None)

从多个gpu求和张量。所有输入都应该有匹配的形状。

参数：

inputs (Iterable[Tensor]) – 要添加的张量的迭代。
destination (int, optional) – 将输出放在其上的设备(默认值:当前设备)。

返回值：

一个包含所有输入的元素和的张量，放在目标设备上。

`torch.cuda.comm.scatter`(tensor, devices, chunk_sizes=None, dim=0, streams=None)

在多个gpu上散射张量。

参数：

tensor (Tensor) – 散射张量。
devices (Iterable[int]) – int的可迭代性，指定张量应该分散在哪些设备中。
chunk_sizes (Iterable[int], optional) – 要放置在每个设备上的块的大小。它应该与设备的长度和和匹配。如果没有指定，张量将被分成相等的块。
dim (int, optional) – 把张量分成块的维数。

返回值：

包含张量块的元组，分布在给定的设备上。

`torch.cuda.comm.gather`(tensors, dim=0, destination=None)

从多个gpu收集张量。不同于dim的张量大小必须匹配。

参数：

tensors (Iterable[Tensor]) – 可迭代的张量集合。
dim (int) – 将张量连接在一起的维度。
destination (int, optional) – 输出设备(-1意味着CPU,默认:当前设备)

返回值：

一个位于目标器件上的张量，它是沿dim将张量连接起来的结果。

Streams and events

class `torch.cuda.Stream`

包装一个CUDA流。CUDA流是一个线性执行序列，属于一个特定的设备，独立于其他流。详见CUDA语义。

参数：

device (torch.device or int, optional) – 一种分配数据流的设备。如果设备为None(默认值)或负整数，则使用当前设备。
priority (int, optional) – 流的优先级。较低的数字代表较高的优先级。

`query`()

检查提交的所有工作是否已经完成。

返回值：

一个布尔值，指示该流中的所有内核是否已完成。

`record_event`(event=None)

记录一个事件。

参数：

event (Event, optional) – event to record. If not given, a new one will be allocated.

返回值：

记录事件。

`synchronize`()

等待流中的所有内核完成。

注意

这是一个cudaStreamSynchronize()的包装器:有关更多信息，请参见“CUDA文档”_。

`wait_event`(event)

让所有提交到流的未来工作等待一个事件。

参数：

event (Event) – 要等待的事件。

注意：

这是一个cudaStreamWaitEvent()的包装器:更多信息请参见“CUDA documentation”_。这个函数返回时不需要等待事件:只有未来的操作受到影响。

`wait_stream`(stream)

与另一个流同步。所有提交到此流的未来工作都将等待，直到调用时提交到给定流的所有内核都完成。

参数：

stream (Stream) – a stream to synchronize.

注意：

此函数返回时不需要在流中等待当前排队的内核:只有未来的操作受到影响。

class `torch.cuda.Event`

包装CUDA事件。CUDA事件是同步标记，可以用来监控设备的进程，准确地测量时间，并同步CUDA流。当事件首次被记录或导出到另一个进程时，底层的CUDA事件被惰性地初始化。创建后，只有同一设备上的流才能记录事件。然而，任何设备上的流都可以等待事件。

参数

enable_timing (bool, optional) – 指示事件是否应该度量时间(默认值:False)
blocking (bool, optional) – 如果为真，wait()将被阻塞(默认值:False)
interprocess () – 如果为真，则事件可以在进程之间共享(默认值:False)

`elapsed_time`(end_event)

返回事件被记录后和end_event被记录前经过的时间(以毫秒为单位)。

classmethod from_ipc_handle(device, handle)[source]

从给定设备上的IPC句柄重构事件。

`ipc_handle`(）

返回此事件的IPC句柄。如果尚未记录，事件将使用当前设备。

`query`()

检查事件当前捕获的所有工作是否已完成。

返回值：

一个布尔值，指示当前由事件捕获的所有工作是否已完成。

`record`(stream=None)

在给定的流中记录事件。如果没有指定流，则使用torch.cuda.current_stream()。流的设备必须匹配事件的设备。

`synchronize`()

等待事件完成。直到完成此事件中当前捕获的所有工作。这将阻止CPU线程在事件完成之前继续执行。

注意

这是一个cudaEventSynchronize()的包装器:有关更多信息，请参见“CUDA documentation”_。

`wait`(stream=None)

使提交给给定流的所有未来工作等待此事件。如果没有指定流，那么使用torch.cuda.current_stream()。

Memory management

`torch.cuda.empty_cache`()

释放缓存分配器当前持有的所有未占用的缓存内存，以便这些内存可以在其他GPU应用程序中使用，并在nvidia-smi中可见。

注意

empty_cache()不会增加PyTorch可用的GPU内存。有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.memory_allocated`(device=None)

返回给定设备的张量占用的当前GPU内存(以字节为单位)。

参数：

device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by current_device(), if device is None (default).

注意

`torch.cuda.max_memory_allocated`(device=None)

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的统计信息，由current_device()给出，如果设备为None(缺省值)。

注意

有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.reset_max_memory_allocated`(device=None)

重置跟踪给定设备的张量占用的最大GPU内存的起始点。有关详细信息，请参见max_memory_assigned()。

参数

device (torch.device or int, optional) – 选定的设备。返回当前设备的统计信息，由current_device()给出，如果设备为None(缺省值)。

注意

有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.memory_cached`(device=None)

返回缓存分配器为给定设备管理的当前GPU内存(以字节为单位)。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的统计信息，由current_device()给出，如果设备为None(缺省值)。

注意：

有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.max_memory_cached`(device=None)

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的统计信息，由current_device()给出，如果设备为None(缺省值)。

注意：

有关GPU内存管理的更多细节，请参见内存管理。

`torch.cuda.reset_max_memory_cached`(device=None)

重置跟踪由给定设备的缓存分配器管理的最大GPU内存的起始点。有关详细信息，请参见max_memory_cached()。

参数：

device (torch.device or int, optional) – 选定的设备。返回当前设备的统计信息，由current_device()给出，如果设备为None(缺省值)。

注意

有关GPU内存管理的更多细节，请参见内存管理。

NVIDIA Tools Extension (NVTX)

`torch.cuda.nvtx.mark`(msg)

描述某一时刻发生的瞬时事件。

参数：

msg (string) – 与事件关联的ASCII消息。

`torch.cuda.nvtx.range_push`(msg)

将范围推到嵌套范围跨度的堆栈上。返回启动的范围的从零开始的深度。

参数：

msg (string) – 与范围相关联的ASCII消息

`torch.cuda.nvtx.range_pop`()

从嵌套范围范围堆栈中弹出一个范围。返回结束的范围的从零开始的深度。

torch.cuda相关推荐

Pytorch 类型错误：Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor.
Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor Pytorcht调试过程中,将数据传入模 ...
torch.cuda.is_available()返回false
torch.cuda.is_available()返回false 解决方法:已经安装torch的: import platformimport torchsysstr = platform.syste ...
Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False
Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False 此时改为: torch ...
type torch.cuda.FloatTensor but found type torch.cuda.ByteTensor
type torch.cuda.FloatTensor but found type torch.cuda.ByteTensor train_label_batch = torch.from_nump ...
yolov5的3.0版本代码在训练的时候报错：ImportError: cannot import name ‘amp‘ from ‘torch.cuda‘ 以及yolov5的3.0环境安装
欢迎大家关注笔者,你的关注是我持续更博的最大动力原创文章,转载告知,盗版必究 yolov5的3.0版本代码在训练的时候报错:ImportError: cannot import name 'amp' ...
PyTorch笔记： GPU上训练的模型加载到CPU/错误处理Attempting to deserialize object on a CUDA device but torch.cuda.is_a
我之前在GPU上训练了一个模型,同时把模型的参数记录在resnet18_ultimate.pkl上在本地的CPU上,我想把参数加载,于是一开始我是这么写代码的: import torch impor ...
错误处理：RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be
使用torchsummary的时候,出现以下错误: (涉及板块: python 库整理:Timm(1)_UQI-LIUWJ的博客-CSDN博客 pytorch 笔记:torchsummary_UQI- ...
成功解决torch\cuda\__init__.py“, line 208, in check_error raise Cuda Error(res) torch.cuda.Cuda Error: C
成功解决torch\cuda\__init__.py", line 208, in check_error raise Cuda Error(res) torch.cuda.Cuda Err ...
成功解决torch.cuda.CudaError: CUDA driver version is insufficient for CUDA runtime version (35) [ WARN:0
成功解决torch.cuda.CudaError: CUDA driver version is insufficient for CUDA runtime version (35) [ WARN:0 ...
解决torch.cuda.is_available()为False的问题
解决torch.cuda.is_available为False 问题:在Anaconda环境下,电脑拥有GPU并且已经通过conda install安装了Pytorch.cudatoolkit,但是t ...

torch.cuda

torch.cuda.current_blas_handle()

torch.cuda.current_device()

torch.cuda.current_stream(device=None)

torch.cuda.default_stream(device=None)

class torch.cuda.device(device)

torch.cuda.device_count()

class torch.cuda.device_of(obj)

torch.cuda.empty_cache()

torch.cuda.get_device_capability(device=None)

torch.cuda.get_device_name(device=None)

torch.cuda.init()

torch.cuda.ipc_collect()

torch.cuda.is_available()

torch.cuda.max_memory_allocated(device=None)

torch.cuda.max_memory_cached(device=None)

torch.cuda.memory_allocated(device=None)

torch.cuda.memory_cached(device=None)

torch.cuda.reset_max_memory_allocated(device=None)

torch.cuda.reset_max_memory_cached(device=None)

torch.cuda.set_device(device)

torch.cuda.stream(stream)

torch.cuda.synchronize(device=None)

Random Number Generator

torch.cuda.get_rng_state(device='cuda')

torch.cuda.get_rng_state_all()

torch.cuda.set_rng_state(new_state, device='cuda')

torch.cuda.set_rng_state_all(new_states)

torch.cuda.manual_seed(seed)

torch.cuda.seed()

torch.cuda.seed_all()

torch.cuda.initial_seed()

Communication collectives

torch.cuda.comm.broadcast(tensor, devices)

torch.cuda.comm.broadcast_coalesced(tensors, devices, buffer_size=10485760)

torch.cuda.comm.reduce_add(inputs, destination=None)

torch.cuda.comm.scatter(tensor, devices, chunk_sizes=None, dim=0, streams=None)

torch.cuda.comm.gather(tensors, dim=0, destination=None)

Streams and events

class torch.cuda.Stream

query()

record_event(event=None)

synchronize()

wait_event(event)

wait_stream(stream)

class torch.cuda.Event

elapsed_time(end_event)

ipc_handle(）

query()

record(stream=None)

synchronize()

wait(stream=None)

Memory management

torch.cuda.empty_cache()

torch.cuda.memory_allocated(device=None)

torch.cuda.max_memory_allocated(device=None)

torch.cuda.reset_max_memory_allocated(device=None)

torch.cuda.memory_cached(device=None)

torch.cuda.max_memory_cached(device=None)

torch.cuda.reset_max_memory_cached(device=None)

NVIDIA Tools Extension (NVTX)

torch.cuda.nvtx.mark(msg)

torch.cuda.nvtx.range_push(msg)

torch.cuda.nvtx.range_pop()

torch.cuda相关推荐

最新文章

热门文章

`torch.cuda.default_stream`(device=None)

class `torch.cuda.device`(device)

`torch.cuda.device_count`()

class `torch.cuda.device_of`(obj)

`torch.cuda.empty_cache`()

`torch.cuda.get_device_capability`(device=None)

`torch.cuda.get_device_name`(device=None)

`torch.cuda.init`()

`torch.cuda.ipc_collect`()

`torch.cuda.is_available`()

`torch.cuda.max_memory_allocated`(device=None)

`torch.cuda.max_memory_cached`(device=None)

`torch.cuda.memory_allocated`(device=None)

`torch.cuda.memory_cached`(device=None)

`torch.cuda.reset_max_memory_allocated`(device=None)

`torch.cuda.reset_max_memory_cached`(device=None)

`torch.cuda.set_device`(device)

`torch.cuda.stream`(stream)

`torch.cuda.synchronize`(device=None)

`torch.cuda.get_rng_state`(device='cuda')

`torch.cuda.get_rng_state_all`()

`torch.cuda.set_rng_state`(new_state, device='cuda')

`torch.cuda.set_rng_state_all`(new_states)

`torch.cuda.manual_seed`(seed)

`torch.cuda.seed`()

`torch.cuda.seed_all`()

`torch.cuda.initial_seed`()

`torch.cuda.comm.broadcast`(tensor, devices)

`torch.cuda.comm.broadcast_coalesced`(tensors, devices, buffer_size=10485760)

`torch.cuda.comm.reduce_add`(inputs, destination=None)

`torch.cuda.comm.scatter`(tensor, devices, chunk_sizes=None, dim=0, streams=None)

`torch.cuda.comm.gather`(tensors, dim=0, destination=None)

class `torch.cuda.Stream`

`query`()

`record_event`(event=None)

`synchronize`()

`wait_event`(event)

`wait_stream`(stream)

class `torch.cuda.Event`

`elapsed_time`(end_event)

`ipc_handle`(）

`query`()

`record`(stream=None)

`synchronize`()

`wait`(stream=None)

`torch.cuda.empty_cache`()

`torch.cuda.memory_allocated`(device=None)

`torch.cuda.max_memory_allocated`(device=None)

`torch.cuda.reset_max_memory_allocated`(device=None)

`torch.cuda.memory_cached`(device=None)

`torch.cuda.max_memory_cached`(device=None)

`torch.cuda.reset_max_memory_cached`(device=None)

`torch.cuda.nvtx.mark`(msg)

`torch.cuda.nvtx.range_push`(msg)

`torch.cuda.nvtx.range_pop`()