PyCUDA Documentation
目录
Contents
Indices and tables
PyCUDA
Navigation
Quick search
PyCUDA gives you easy, Pythonic access to Nvidia’s CUDA parallel computation API. Several wrappers of the CUDA API already exist–so why the need for PyCUDA?
Object cleanup tied to lifetime of objects. This idiom, often called RAII in C++, makes it much easier to write correct, leak- and crash-free code. PyCUDA knows about dependencies, too, so (for example) it won’t detach from a context before all memory allocated in it is also freed.
Convenience. Abstractions like
pycuda.compiler.SourceModule
andpycuda.gpuarray.GPUArray
make CUDA programming even more convenient than with Nvidia’s C-based runtime.Completeness. PyCUDA puts the full power of CUDA’s driver API at your disposal, if you wish.
Automatic Error Checking. All CUDA errors are automatically translated into Python exceptions.
Speed. PyCUDA’s base layer is written in C++, so all the niceties above are virtually free.
Helpful Documentation. You’re looking at it. ;)
Here’s an example, to give you an impression:
import pycuda.autoinit
import pycuda.driver as drv
import numpyfrom pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{const int i = threadIdx.x;dest[i] = a[i] * b[i];
}
""")multiply_them = mod.get_function("multiply_them")a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)dest = numpy.zeros_like(a)
multiply_them(drv.Out(dest), drv.In(a), drv.In(b),block=(400,1,1), grid=(1,1))print dest-a*b
(This example is examples/hello_gpu.py
in the PyCUDA source distribution.)
On the surface, this program will print a screenful of zeros. Behind the scenes, a lot more interesting stuff is going on:
PyCUDA has compiled the CUDA source code and uploaded it to the card.
Note
This code doesn’t have to be a constant–you can easily have Python generate the code you want to compile. See Metaprogramming.
PyCUDA’s numpy interaction code has automatically allocated space on the device, copied the numpy arrays a and b over, launched a 400x1x1 single-block grid, and copied dest back.
Note that you can just as well keep your data on the card between kernel invocations–no need to copy data all the time.
See how there’s no cleanup code in the example? That’s not because we were lazy and just skipped it. It simply isn’t needed. PyCUDA will automatically infer what cleanup is necessary and do it for you.
Curious? Let’s get started.
Contents
- Installation
- Tutorial
- Getting started
- Transferring Data
- Executing a Kernel
- Bonus: Abstracting Away the Complications
- Advanced Topics
- Where to go from here
- Device Interface
- Version Queries
- Error Reporting
- Constants
- Devices and Contexts
- Concurrency and Streams
- Memory
- Code on the Device: Modules and Functions
- Profiler Control
- Just-in-time Compilation
- Built-in Utilities
- Automatic Initialization
- Choice of Device
- Kernel Caching
- Testing
- Device Metadata and Occupancy
- Memory Pools
- OpenGL
- Automatic Initialization
- Old-style (pre-CUDA 3.0) API
- GPU Arrays
- Vector Types
- The
GPUArray
Array Class - Constructing
GPUArray
Instances - Elementwise Functions on
GPUArray
Instances - Generating Arrays of Random Numbers
- Single-pass Custom Expression Evaluation
- Custom Reductions
- Parallel Scan / Prefix Sum
- Custom data types in Reduction and Scan
- GPGPU Algorithms
- Metaprogramming
- Why Metaprogramming?
- Metaprogramming using a Templating Engine
- Metaprogramming using
codepy
- Changes
- Version 2020.1
- Version 2019.1
- Version 2018.1
- Version 2017.2
- Version 2016.2
- Version 2016.1
- Version 2014.1
- Version 2013.1.1
- Version 2013.1
- Version 2012.1
- Version 2011.2
- Version 2011.1.2
- Version 2011.1.1
- Version 2011.1
- Version 0.94.2
- Version 0.94.1
- Version 0.94
- Version 0.93
- Version 0.92
- Version 0.91
- Acknowledgments
- Licensing
- Frequently Asked Questions
- Citing PyCUDA
Note that this guide will not explain CUDA programming and technology. Please refer to Nvidia’s programming documentation for that.
PyCUDA also has its own web site, where you can find updates, new versions, documentation, and support.
Indices and tables
Index
Module Index
Search Page
PyCUDA
Navigation
- Installation
- Tutorial
- Device Interface
- Profiler Control
- Just-in-time Compilation
- Built-in Utilities
- OpenGL
- GPU Arrays
- Metaprogramming
- Changes
- Acknowledgments
- Licensing
- Frequently Asked Questions
- Citing PyCUDA
-
PyCUDA Documentation相关推荐
- Theano2.1.12-基础知识之使用GPU
本文转载自: https://www.cnblogs.com/shouhuxianjian/p/4590224.html 作者:shouhuxianjian 转载请注明该声明. 来自:http://d ...
- PyCUDA学习:gpuarray与kernel的抽象原型
为什么80%的码农都做不了架构师?>>> # -*-coding:utf-8 -*-import pycuda.gpuarray as gpuarray import pycu ...
- 使用xilinx的documentation navigator快速查找资料
2013-06-22 14:56:39 documentation navigator是xilinx的资料导航,是一个小插件,可以到xilinx的官网上下载,我的是Xilinx_DocNav_2013 ...
- Python Multiprocessing with PyCUDA
Python Multiprocessing with PyCUDA 参考:https://stackoverflow.com/questions/5904872/python-multiproces ...
- pycuda write complex numbers — errors:class “cuComplex” has no member “i”
参考:https://stackoverflow.com/questions/8857063/cuda-pycuda-how-to-write-complex-numbers-errorsclass- ...
- 在Jetson Xavier NX上安装pycuda报错:src/cpp/cuda.hpp:14:10: fatal error: cuda.h: No such file or directory
文章目录: 1 我的系统环境和遇到问题分析 1.1 我的系统环境 1.2 问题描述 2 问题解决方式 1 我的系统环境和遇到问题分析 1.1 我的系统环境 我的详细系统环境如下:使用jetson_re ...
- 【Linux 内核】Linux 内核源码目录说明 ① ( arch 目录 | block 目录 | certs 目录 | crypto 目录 | Documentation 目录 )
文章目录 一.arch 目录 二.block 目录 三.certs 目录 四.crypto 目录 五.Documentation 目录 在上一篇博客 [Linux 内核]Linux 内核源码结构 ( ...
- Centos下安装apahce的configure: error: APR not found. Please read the documentation解决办法
今天从Apache官网上http://httpd.apache.org/下载httpd web服务器,由于我的虚拟机上之前安装过,我先yum remove httpd进行卸载,然后重新安装.我采用的是 ...
- Spring Framework Reference Documentation手册官网下载地址
之前在国内资源网站遇见很多Spring Framework Reference Documentation 开发手册下载的,居然都要资源点之类的,而且几乎都是英语(截止到2016/06/15,较新版本 ...
最新文章
- Linux磁盘扇区和内存页,技术|检查linux中硬盘损坏的扇区和区块
- 以太坊(Ethereum ETH)是如何计算难度的
- zookeeper web ui--gt;node-zk-browser安装
- Windows Phone 7 Tips (1)
- Hadoop RPC客户端调用服务代码示例
- Boost:iostream客户端的测试程序
- 学习笔记(08):Python网络编程并发编程-实现服务端可以对多个客户端提供服务
- hive UDF函数取最新分区
- android sd卡不可写,Android检查SD卡是否可读写
- Oreilly.Python.Cookbook(3rd.Edition.May.2013)pdf
- 【原创】搭建spark环境中的坑及解决办法
- 深入了解帆软报表系统的启动过程二
- MySQL经典50道练习题及全网最详细解析
- 学习历程(一)第一个微信打卡器
- Vue中使用axio跨域请求外部WebService接口
- 面试5173的奇葩经历——老板与员工的博弈论
- python实现微信hook_GitHub - gemgin/wechathook: 借助微信hook,拦截修改某些call,填充进我们的Python代码,进行微信公众号文章的爬取...
- 宜信大数据金融云==宜信==谷文栋==金融科技行业动态系列1
- 摘评:专访阿里云总裁王文斌:做出“用得爽”的工程产品
- C语言课设:影院售票管理系统
热门文章
- Theano2.1.12-基础知识之使用GPU