Contents

Indices and tables

PyCUDA

Navigation

Quick search

PyCUDA gives you easy, Pythonic access to Nvidia’s CUDA parallel computation API. Several wrappers of the CUDA API already exist–so why the need for PyCUDA?

Object cleanup tied to lifetime of objects. This idiom, often called RAII in C++, makes it much easier to write correct, leak- and crash-free code. PyCUDA knows about dependencies, too, so (for example) it won’t detach from a context before all memory allocated in it is also freed.
Convenience. Abstractions like pycuda.compiler.SourceModule and pycuda.gpuarray.GPUArray make CUDA programming even more convenient than with Nvidia’s C-based runtime.
Completeness. PyCUDA puts the full power of CUDA’s driver API at your disposal, if you wish.
Automatic Error Checking. All CUDA errors are automatically translated into Python exceptions.
Speed. PyCUDA’s base layer is written in C++, so all the niceties above are virtually free.
Helpful Documentation. You’re looking at it. ;)

Here’s an example, to give you an impression:

import pycuda.autoinit
import pycuda.driver as drv
import numpyfrom pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{const int i = threadIdx.x;dest[i] = a[i] * b[i];
}
""")multiply_them = mod.get_function("multiply_them")a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)dest = numpy.zeros_like(a)
multiply_them(drv.Out(dest), drv.In(a), drv.In(b),block=(400,1,1), grid=(1,1))print dest-a*b

(This example is examples/hello_gpu.py in the PyCUDA source distribution.)

On the surface, this program will print a screenful of zeros. Behind the scenes, a lot more interesting stuff is going on:

PyCUDA has compiled the CUDA source code and uploaded it to the card.

Note

This code doesn’t have to be a constant–you can easily have Python generate the code you want to compile. See Metaprogramming.
PyCUDA’s numpy interaction code has automatically allocated space on the device, copied the numpy arrays a and b over, launched a 400x1x1 single-block grid, and copied dest back.

Note that you can just as well keep your data on the card between kernel invocations–no need to copy data all the time.
See how there’s no cleanup code in the example? That’s not because we were lazy and just skipped it. It simply isn’t needed. PyCUDA will automatically infer what cleanup is necessary and do it for you.

Curious? Let’s get started.

Installation
Tutorial
- Getting started
- Transferring Data
- Executing a Kernel
- Bonus: Abstracting Away the Complications
- Advanced Topics
- Where to go from here
Device Interface
- Version Queries
- Error Reporting
- Constants
- Devices and Contexts
- Concurrency and Streams
- Memory
- Code on the Device: Modules and Functions
Profiler Control
Just-in-time Compilation
Built-in Utilities
- Automatic Initialization
- Choice of Device
- Kernel Caching
- Testing
- Device Metadata and Occupancy
- Memory Pools
OpenGL
- Automatic Initialization
- Old-style (pre-CUDA 3.0) API
GPU Arrays
- Vector Types
- The GPUArray Array Class
- Constructing GPUArray Instances
- Elementwise Functions on GPUArray Instances
- Generating Arrays of Random Numbers
- Single-pass Custom Expression Evaluation
- Custom Reductions
- Parallel Scan / Prefix Sum
- Custom data types in Reduction and Scan
- GPGPU Algorithms
Metaprogramming
- Why Metaprogramming?
- Metaprogramming using a Templating Engine
- Metaprogramming using codepy
Changes
- Version 2020.1
- Version 2019.1
- Version 2018.1
- Version 2017.2
- Version 2016.2
- Version 2016.1
- Version 2014.1
- Version 2013.1.1
- Version 2013.1
- Version 2012.1
- Version 2011.2
- Version 2011.1.2
- Version 2011.1.1
- Version 2011.1
- Version 0.94.2
- Version 0.94.1
- Version 0.94
- Version 0.93
- Version 0.92
- Version 0.91
Acknowledgments
Licensing
Frequently Asked Questions
Citing PyCUDA

Note that this guide will not explain CUDA programming and technology. Please refer to Nvidia’s programming documentation for that.

PyCUDA also has its own web site, where you can find updates, new versions, documentation, and support.

Indices and tables

Index
Module Index
Search Page

PyCUDA

Navigation

Installation
Tutorial
Device Interface
Profiler Control
Just-in-time Compilation
Built-in Utilities
OpenGL
GPU Arrays
Metaprogramming
Changes
Acknowledgments
Licensing
Frequently Asked Questions
Citing PyCUDA

PyCUDA Documentation相关推荐
1. Theano2.1.12-基础知识之使用GPU
  本文转载自: https://www.cnblogs.com/shouhuxianjian/p/4590224.html 作者:shouhuxianjian 转载请注明该声明. 来自:http://d ...
2. PyCUDA学习：gpuarray与kernel的抽象原型
  为什么80%的码农都做不了架构师?>>> # -*-coding:utf-8 -*-import pycuda.gpuarray as gpuarray import pycu ...
3. 使用xilinx的documentation navigator快速查找资料
  2013-06-22 14:56:39 documentation navigator是xilinx的资料导航,是一个小插件,可以到xilinx的官网上下载,我的是Xilinx_DocNav_2013 ...
4. Python Multiprocessing with PyCUDA
  Python Multiprocessing with PyCUDA 参考:https://stackoverflow.com/questions/5904872/python-multiproces ...
5. pycuda write complex numbers — errors:class “cuComplex” has no member “i”
  参考:https://stackoverflow.com/questions/8857063/cuda-pycuda-how-to-write-complex-numbers-errorsclass- ...
6. 在Jetson Xavier NX上安装pycuda报错：src/cpp/cuda.hpp:14:10: fatal error: cuda.h: No such file or directory
  文章目录: 1 我的系统环境和遇到问题分析 1.1 我的系统环境 1.2 问题描述 2 问题解决方式 1 我的系统环境和遇到问题分析 1.1 我的系统环境我的详细系统环境如下:使用jetson_re ...
7. 【Linux 内核】Linux 内核源码目录说明 ① ( arch 目录 | block 目录 | certs 目录 | crypto 目录 | Documentation 目录 )
  文章目录一.arch 目录二.block 目录三.certs 目录四.crypto 目录五.Documentation 目录在上一篇博客 [Linux 内核]Linux 内核源码结构 ( ...
8. Centos下安装apahce的configure: error: APR not found. Please read the documentation解决办法
  今天从Apache官网上http://httpd.apache.org/下载httpd web服务器,由于我的虚拟机上之前安装过,我先yum remove httpd进行卸载,然后重新安装.我采用的是 ...
9. Spring Framework Reference Documentation手册官网下载地址
  之前在国内资源网站遇见很多Spring Framework Reference Documentation 开发手册下载的,居然都要资源点之类的,而且几乎都是英语(截止到2016/06/15,较新版本 ...
最新文章
热门文章

PyCUDA Documentation

Contents

Indices and tables

PyCUDA

Navigation

PyCUDA Documentation相关推荐

最新文章

热门文章