事情仍然简单,按理就是

pip install pyopencl

  但是并没有成功,错误提示说有个mako未安装,虽然说不装也没关系,但是想着不费事就装了,继续报错。

  似乎想要装pyopencl,得先装opencl,于是amd官网下opencl sdk(2.9.1,版本在表格里,一目了然),安装路径似乎没得选,在program files x86 文件夹。继续报错。

  这次的报错报到VS里了,我是不是该庆幸装了VS社区版……说CL/cl.h 找不到。试图打开的程序也叫cl……

  这是个头文件啊……cl.exe,莫非是compile + link?不妨写个helloworld编译一下……(masm里的编译连接一体机好像也叫cl)然后编译失败。

  提示是找不到xxx.h,或者xxx.lib,这个的教程很好找,在环境变量里把lib和include文件夹都包括进去。于是在环境变量里新建一个Include,一个Lib,然后按着教程加进去一堆来自windows的自带库目录(好像多半来自Microsoft sdk 和 windows kit),反正最后helloworld.c 可以编译了,cl果然就是一键编译连接,宛如gcc还不用设定文件名。

  此时,还记得amd sdk 文件夹(叫 AMD APP SDK)么?打开一看,也有一个include目录一个lib目录。直接把include加进环境变量即可,但lib下还有一层,x86还是x86_64各位自己试试吧,我也搞不明白我的64位机为何要用x86……一个检验的方法是,在helloworld.c 开头加一句 #include<CL/cl.h>,如果仍然可以编译成功,那么……

pip install pyopencl

  至少我就这么安装成功了,无警告。


  示例程序:(来自http://ju.outofmemory.cn/entry/106475,这里把py2的print改成了py3的)

# example provided by Eilif Mullerfrom __future__ import divisionKERNEL_CODE = """// Thread block size
#define BLOCK_SIZE %(block_size)d// Matrix dimensions
// (chosen as multiples of the thread block size for simplicity)
#define WA %(w_a)d // Matrix A width
#define HA %(h_a)d // Matrix A height
#define WB %(w_b)d // Matrix B width
#define HB WA  // Matrix B height
#define WC WB  // Matrix C width
#define HC HA  // Matrix C height/** Copyright 1993-2009 NVIDIA Corporation.  All rights reserved.** NVIDIA Corporation and its licensors retain all intellectual property and* proprietary rights in and to this software and related documentation.* Any use, reproduction, disclosure, or distribution of this software* and related documentation without an express license agreement from* NVIDIA Corporation is strictly prohibited.** Please refer to the applicable NVIDIA end user license agreement (EULA)* associated with this source code for terms and conditions that govern* your use of this NVIDIA software.**//* Matrix multiplication: C = A * B.* Device code.*/#define AS(j, i) As[i + j * BLOCK_SIZE]
#define BS(j, i) Bs[i + j * BLOCK_SIZE]//! Matrix multiplication on the device: C = A * B
//! WA is A's width and WB is B's width__kernel __attribute__((reqd_work_group_size(BLOCK_SIZE,BLOCK_SIZE,1)))
void
matrixMul( __global float* C, __global float* A, __global float* B)
{__local float As[BLOCK_SIZE*BLOCK_SIZE];__local float Bs[BLOCK_SIZE*BLOCK_SIZE];// Block indexint bx = get_group_id(0);int by = get_group_id(1);// Thread indexint tx = get_local_id(0);int ty = get_local_id(1);// Index of the first sub-matrix of A processed by the blockint aBegin = WA * BLOCK_SIZE * by;// Index of the last sub-matrix of A processed by the blockint aEnd   = aBegin + WA - 1;// Step size used to iterate through the sub-matrices of Aint aStep  = BLOCK_SIZE;// Index of the first sub-matrix of B processed by the blockint bBegin = BLOCK_SIZE * bx;// Step size used to iterate through the sub-matrices of Bint bStep  = BLOCK_SIZE * WB;// Csub is used to store the element of the block sub-matrix// that is computed by the threadfloat Csub = 0.0f;// Loop over all the sub-matrices of A and B// required to compute the block sub-matrixfor (int a = aBegin, b = bBegin;a <= aEnd;a += aStep, b += bStep) {// Load the matrices from device memory// to shared memory; each thread loads// one element of each matrixAS(ty, tx) = A[a + WA * ty + tx];BS(ty, tx) = B[b + WB * ty + tx];// Synchronize to make sure the matrices are loadedbarrier(CLK_LOCAL_MEM_FENCE);// Multiply the two matrices together;// each thread computes one element// of the block sub-matrixfor (int k = 0; k < BLOCK_SIZE; ++k)Csub += AS(ty, k) * BS(k, tx);// Synchronize to make sure that the preceding// computation is done before loading two new// sub-matrices of A and B in the next iterationbarrier(CLK_LOCAL_MEM_FENCE);}// Write the block sub-matrix to device memory;// each thread writes one elementC[get_global_id(1) * get_global_size(0) + get_global_id(0)] = Csub;}"""import pyopencl as cl
from time import time
import numpyblock_size = 16ctx = cl.create_some_context()for dev in ctx.devices:assert dev.local_mem_size > 0queue = cl.CommandQueue(ctx,properties=cl.command_queue_properties.PROFILING_ENABLE)#queue = cl.CommandQueue(ctx)if False:a_height = 4096#a_height = 1024a_width = 2048#a_width = 256#b_height == a_widthb_width = a_heightelif False:# like PyCUDAa_height = 2516a_width = 1472b_height = a_widthb_width = 2144else:# CL SDKa_width = 50*block_sizea_height = 100*block_sizeb_width = 50*block_sizeb_height = a_widthc_width = b_width
c_height = a_heighth_a = numpy.random.rand(a_height, a_width).astype(numpy.float32)
h_b = numpy.random.rand(b_height, b_width).astype(numpy.float32)
h_c = numpy.empty((c_height, c_width)).astype(numpy.float32)kernel_params = {"block_size": block_size,"w_a":a_width, "h_a":a_height, "w_b":b_width}if "NVIDIA" in queue.device.vendor:options = "-cl-mad-enable -cl-fast-relaxed-math"
else:options = ""
prg = cl.Program(ctx, KERNEL_CODE % kernel_params,).build(options=options)
kernel = prg.matrixMul
#print prg.binaries[0]assert a_width % block_size == 0
assert a_height % block_size == 0
assert b_width % block_size == 0# transfer host -> device -----------------------------------------------------
mf = cl.mem_flagst1 = time()d_a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_a)
d_b_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_b)
d_c_buf = cl.Buffer(ctx, mf.WRITE_ONLY, size=h_c.nbytes)push_time = time()-t1# warmup ----------------------------------------------------------------------
for i in range(5):event = kernel(queue, h_c.shape[::-1], (block_size, block_size),d_c_buf, d_a_buf, d_b_buf)event.wait()queue.finish()# actual benchmark ------------------------------------------------------------
t1 = time()count = 20
for i in range(count):event = kernel(queue, h_c.shape[::-1], (block_size, block_size),d_c_buf, d_a_buf, d_b_buf)event.wait()gpu_time = (time()-t1)/count# transfer device -> host -----------------------------------------------------
t1 = time()
cl.enqueue_copy(queue, h_c, d_c_buf)
pull_time = time()-t1# timing output ---------------------------------------------------------------
gpu_total_time = gpu_time+push_time+pull_timeprint("GPU push+compute+pull total [s]:", gpu_total_time)
print("GPU push [s]:", push_time)
print("GPU pull [s]:", pull_time)
print("GPU compute (host-timed) [s]:", gpu_time)
print("GPU compute (event-timed) [s]: ", (event.profile.end-event.profile.start)*1e-9)gflop = h_c.size * (a_width * 2.) / (1000**3.)
gflops = gflop / gpu_timeprint()
print("GFlops/s:", gflops)# cpu comparison --------------------------------------------------------------
t1 = time()
h_c_cpu = numpy.dot(h_a,h_b)
cpu_time = time()-t1print()
print("GPU==CPU:",numpy.allclose(h_c, h_c_cpu))
print()
print("CPU time (s)", cpu_time)
print()print("GPU speedup (with transfer): ", cpu_time/gpu_total_time)
print("GPU speedup (without transfer): ", cpu_time/gpu_time)

  别急,如果你开始看到提示输入,不妨看看选项……方括号里是号码,后面是内容,你负责输入号码回车。比如我输入了两次0,最后会有个提示:

Choose platform:
[0] <pyopencl.Platform 'AMD Accelerated Parallel Processing' at 0x7feee7e3168>
Choice [0]:0
Choose device(s):
[0] <pyopencl.Device 'Capeverde' on 'AMD Accelerated Parallel Processing' at 0x9
f28700>
[1] <pyopencl.Device 'Intel(R) Pentium(R) CPU G4560 @ 3.50GHz' on 'AMD Accelerat
ed Parallel Processing' at 0xa627610>
Choice, comma-separated [0]:0
Set the environment variable PYOPENCL_CTX='0:0' to avoid being asked again.

  意思是如果在环境变量里事先说好,就不用选了。我配置环境变量无果,但是把下面的代码加在文件开头起了作用——(感谢stackflow)

import os
os.environ['PYOPENCL_CTX'] = '0:0'

  再次运行,上面一段没有了,直接是结果。然而代码没看懂,那个CPU==GPU大概是说cpu和gpu算出来结果一致吧,还有numpy打印数组中间竟然用省略号……

(2018-2-2 于地球)

转载于:https://blog.51cto.com/13535617/2068039

win7(amd显卡) 安装 pyopencl相关推荐

  1. 联想 G510 AMD 显卡安装卡死

    联想 G510 win10 核心 显卡 AMD 显卡安装卡死 ISO 镜像地址 ed2k://|file|cn_windows_10_consumer_editions_version_1803_up ...

  2. AMD显卡安装Pytorch

    目录 0.背景 1.检查内核 2.ROCm安装 3.安装pytorch 0.背景 昨天看到新闻报道, PyTorch 1.8来了!正式支持AMD GPU,炼丹不必NVIDIA ROCm是AMD公司推出 ...

  3. AMD显卡安装Caffe|深度学习|Ubuntu

    caffe基本不更新了,而且caffe2也已经是pytorch的一部分了.如果想考古可以用用caffe,但是如果时间比较紧的话还是学学Pytorch吧,大势所趋 去年双十一前, 苦苦对比买了台组装机, ...

  4. AMD显卡安装驱动错误182 – AMD Installer 无法正常识别 AMD显卡

    文章目录 错误 182 – AMD Installer 无法正常识别 AMD 显卡硬件 解决办法 错误 182 – AMD Installer 无法正常识别 AMD 显卡硬件 在网上找了一些办法去解决 ...

  5. AMD显卡安装PyTorch及在PyCharm中环境配置

    具体操作如下 检查AMD的驱动是否已经更新到最新版 显卡型号查询方式: AMD官网进行驱动更新 显卡更新完毕后,利用网上的教程即可进行PyTorch的安装 PyTorch安装 安装完毕后,在PyCha ...

  6. amd显卡安装linux,告诉你完美安装Ubuntu 12.10最新AMD显卡驱动实战的方法及命令

    编者按:Ubuntu 显卡驱动安装需谨慎,无事别折腾这个,注意注意,这篇文章只做参考. Ubuntu 12.10安装完毕后一切运行正常,显卡也木有啥大问题,就是挂起待机的时候不能恢复,直接黑屏.安装U ...

  7. 使用AMD显卡安装tch-rs(Rust版Pytorch)

    自己研究,制作不宜,转发请标注来源 使用环境:6700xt,opensuse15.4,rocm5.4 接着上一篇的,如果有不懂的先看上一篇 第一步:安装libtorch(直接下pytorch官网的,解 ...

  8. kali linux amd显卡驱动,AMD 显卡安装debian

    Check your /etc/apt/sources.list. If it's anything different to the following, you need to fix it. Y ...

  9. esxi能直通的显卡型号_没有驱动,显卡不动:amd显卡驱动安装教程

    小白系统 免费的智能客服 点击使用 想要电脑显卡正常发挥性能,必须在电脑上安装相对应的显卡驱动才行,很多朋友可能不会安装显卡驱动或者安装了错误的显卡驱动,导致电脑使用时出现画面卡屏等情况.下面就拿am ...

  10. AMD显卡性能测试软件,谁更适合Win7?AMD与NVIDIA显卡性能对比

    前言:近期Win7 RTM版本的发布引起了无数人的关注和热议,随着RTM版本的发布,昭示着正式版离我们已经不远了.Win7正式版的临近,让越来越多的网友开始关注这一新平台下的各种性能,网络上大大小小的 ...

最新文章

  1. Ext学习-前后交互模式介绍
  2. 山东省2O2021年普通高考成绩查询,2021山东高考报名人数公布
  3. nyoj 71 独木舟上的旅行 贪心
  4. 初学者python笔记(封装、反射、类内置attr属性、包装与授权)
  5. 《SEO的艺术(原书第2版)》——3.11 为意识形态影响力开展SEO
  6. UVA12583_Memory Overow
  7. 20世纪50年代开始,数字技术出现,数字计算机开始代替模拟计算机,我们从电气时代逐渐走到了信息时代,电脑重塑了社会的架构与价值。...
  8. 宿舍电源额定500w,我的电脑550w的,有什么办法能解决吗?
  9. Cocos2d-x下Lua调用自定义C++类和函数的最佳实践[转]
  10. 【信息系统项目管理师】12项目合同管理
  11. KVM设备透传与重定向
  12. STN 图像配准一些记录
  13. tp-link与台式计算机连接教程,【详细图解】TP-Link TL-WDR6510路由器电脑设置教程...
  14. WEB页面打印--打印指定区域,页面预览,页面设置
  15. 2021window10下的IDEA安装
  16. 微信公众号网页授权40029错误「建议收藏」
  17. GT工具中用到的英文词解释
  18. windows_DFS服务文件服务
  19. Linux(Ubuntu)通过NFS服务挂载群晖NAS为虚拟磁盘
  20. 雅虎收购战的中国表情

热门文章

  1. Activiti工作流浅析
  2. 国庆节,零代码帮你搞定假期美食菜单
  3. 网工知识角|什么是防火墙你知道吗?一篇直接搞懂
  4. 使用Sqoop1将MySQL 导入数据到 HDFS
  5. Excel连通数据库,供应链进度追踪表效率倍增
  6. 那些你不知道的炫酷按钮交互效果
  7. Python实现直播答题
  8. 基于matlab的SEIR/SEIRS传染病模型仿真与模拟
  9. mac电池最大充电限制工具:AlDente mac中文免费版
  10. psnr,ssim解读