目录

Contents

Indices and tables

PyCUDA

Navigation

Quick search


PyCUDA gives you easy, Pythonic access to Nvidia’s CUDA parallel computation API. Several wrappers of the CUDA API already exist–so why the need for PyCUDA?

  • Object cleanup tied to lifetime of objects. This idiom, often called RAII in C++, makes it much easier to write correct, leak- and crash-free code. PyCUDA knows about dependencies, too, so (for example) it won’t detach from a context before all memory allocated in it is also freed.

  • Convenience. Abstractions like pycuda.compiler.SourceModule and pycuda.gpuarray.GPUArray make CUDA programming even more convenient than with Nvidia’s C-based runtime.

  • Completeness. PyCUDA puts the full power of CUDA’s driver API at your disposal, if you wish.

  • Automatic Error Checking. All CUDA errors are automatically translated into Python exceptions.

  • Speed. PyCUDA’s base layer is written in C++, so all the niceties above are virtually free.

  • Helpful Documentation. You’re looking at it. ;)

Here’s an example, to give you an impression:

import pycuda.autoinit
import pycuda.driver as drv
import numpyfrom pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{const int i = threadIdx.x;dest[i] = a[i] * b[i];
}
""")multiply_them = mod.get_function("multiply_them")a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)dest = numpy.zeros_like(a)
multiply_them(drv.Out(dest), drv.In(a), drv.In(b),block=(400,1,1), grid=(1,1))print dest-a*b

(This example is examples/hello_gpu.py in the PyCUDA source distribution.)

On the surface, this program will print a screenful of zeros. Behind the scenes, a lot more interesting stuff is going on:

  • PyCUDA has compiled the CUDA source code and uploaded it to the card.

    Note

    This code doesn’t have to be a constant–you can easily have Python generate the code you want to compile. See Metaprogramming.

  • PyCUDA’s numpy interaction code has automatically allocated space on the device, copied the numpy arrays a and b over, launched a 400x1x1 single-block grid, and copied dest back.

    Note that you can just as well keep your data on the card between kernel invocations–no need to copy data all the time.

  • See how there’s no cleanup code in the example? That’s not because we were lazy and just skipped it. It simply isn’t needed. PyCUDA will automatically infer what cleanup is necessary and do it for you.

Curious? Let’s get started.

Contents

  • Installation
  • Tutorial
    • Getting started
    • Transferring Data
    • Executing a Kernel
    • Bonus: Abstracting Away the Complications
    • Advanced Topics
    • Where to go from here
  • Device Interface
    • Version Queries
    • Error Reporting
    • Constants
    • Devices and Contexts
    • Concurrency and Streams
    • Memory
    • Code on the Device: Modules and Functions
  • Profiler Control
  • Just-in-time Compilation
  • Built-in Utilities
    • Automatic Initialization
    • Choice of Device
    • Kernel Caching
    • Testing
    • Device Metadata and Occupancy
    • Memory Pools
  • OpenGL
    • Automatic Initialization
    • Old-style (pre-CUDA 3.0) API
  • GPU Arrays
    • Vector Types
    • The GPUArray Array Class
    • Constructing GPUArray Instances
    • Elementwise Functions on GPUArray Instances
    • Generating Arrays of Random Numbers
    • Single-pass Custom Expression Evaluation
    • Custom Reductions
    • Parallel Scan / Prefix Sum
    • Custom data types in Reduction and Scan
    • GPGPU Algorithms
  • Metaprogramming
    • Why Metaprogramming?
    • Metaprogramming using a Templating Engine
    • Metaprogramming using codepy
  • Changes
    • Version 2020.1
    • Version 2019.1
    • Version 2018.1
    • Version 2017.2
    • Version 2016.2
    • Version 2016.1
    • Version 2014.1
    • Version 2013.1.1
    • Version 2013.1
    • Version 2012.1
    • Version 2011.2
    • Version 2011.1.2
    • Version 2011.1.1
    • Version 2011.1
    • Version 0.94.2
    • Version 0.94.1
    • Version 0.94
    • Version 0.93
    • Version 0.92
    • Version 0.91
  • Acknowledgments
  • Licensing
  • Frequently Asked Questions
  • Citing PyCUDA

Note that this guide will not explain CUDA programming and technology. Please refer to Nvidia’s programming documentation for that.

PyCUDA also has its own web site, where you can find updates, new versions, documentation, and support.

Indices and tables

  • Index

  • Module Index

  • Search Page

PyCUDA

Navigation

  • Installation
  • Tutorial
  • Device Interface
  • Profiler Control
  • Just-in-time Compilation
  • Built-in Utilities
  • OpenGL
  • GPU Arrays
  • Metaprogramming
  • Changes
  • Acknowledgments
  • Licensing
  • Frequently Asked Questions
  • Citing PyCUDA