参考网址

  1. 安装rocm下的openmpi和ucx
    https://github.com/openucx/ucx/wiki/Build-and-run-ROCM-UCX-OpenMPI

1.测试显卡是否打开了largebar功能(因为单机不需要UCX,所以也不需要这个功能了,跳过,直接到第3步)

仿照参考文档,建立测试文件check_large_bar_rocm.c后编译出错:

(base) [jrf@cu06 ~] gcc $(/opt/rocm/bin/hipconfig --cpp_config) -L/opt/rocm/lib/ -lhip_hcc check_large_bar_rocm.c -o check_large_bar_rocm
In file included from /opt/rocm/hip/include/hip/hcc_detail/channel_descriptor.h:28:0,from /opt/rocm/hip/include/hip/hcc_detail/hip_texture_types.h:38,from /opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:44,from /opt/rocm/hip/include/hip/hip_runtime_api.h:342,from /opt/rocm/hip/include/hip/hip_runtime.h:64,from check_large_bar_rocm.c:2:
/opt/rocm/hip/include/hip/hcc_detail/hip_vector_types.h:38:24: error: missing binary operator before token "("#if __has_attribute(ext_vector_type)

** 尝试更新gcc**
尝试更新gcc到9.2版本,因为conda中找不到新版本的gcc,所以从源码编译。
(后来才发现系统里面有装好的8.3.0,可以用,尴尬)
参考网址这里https://blog.csdn.net/l919898756/article/details/81015617
从官网这里下载源文件,解压后,建立build文件夹,进入;

../configure  -disable-multilib --prefix=something
make
make install

出现错误

“Verify that you have permission to grant a GFDL license for all
new text in tm.texi, then copy it to $(srcdir)/doc/tm.texi.”

按照提示把文件复制过去,又出现错误

“You should edit $(srcdir)/doc/tm.texi.in rather than $(srcdir)/doc/tm.texi.”

http://www.hellogcc.org/?p=63这一篇文章提供了讲解但是没有说解决办法。
按照如下网址说的,把源码替换为新解压的文件,
https://wiki.osdev.org/Talk:GCC_Cross-Compiler
再次

make install

出现错误

g++: error: unrecognized command line option ‘-no-pie’\

改装7.2.0
查询资料得知是因为之前的gcc和g++版本太低,不知道怎么办,所以改安装7.2.0版本。https://github.com/xd009642/tarpaulin/issues/7

wget  http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-7.2.0/gcc-7.2.0.tar.gz
tar -xzf gcc-7.2.0.tar.gz
cd gcc-7.2.0
mkdir build
cd build
../configure --disable-checking --enable-languages=c,c++ --disable-multilib --prefix=/home/jrf/tools/gcc-7.2.0 --enable-threads=posix
make -j24
make install

安装顺利,最后提示

Libraries have been installed in:/home/jrf/tools/gcc-7.2.0/lib/../lib64If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:- add LIBDIR to the `LD_LIBRARY_PATH' environment variableduring execution- add LIBDIR to the `LD_RUN_PATH' environment variableduring linking- use the `-Wl,-rpath -Wl,LIBDIR' linker flag- have your system administrator add LIBDIR to `/etc/ld.so.conf'See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.

更新gcc成功到7.2.0,继续测试large bar
再次编译check bar的文件,编译成功,但是运行的时候出现错误

./check_large_bar_rocm: error while loading shared libraries:libhip_hcc.so: cannot open shared object file: No such file or directory

那么添加动态链接的函数库

export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH

运行成功,出现段错误

(base) [jrf@cu06 ~]$ ./check_large_bar_rocm
address buf 0x7f7bb5200000
Segmentation fault (core dumped)

。。。。。。。。学长说单机不需要ucx,看教程openmpi编译不需要ROCM,所以应当不需要开启这个large future功能,那么跳过直接编译ROCM。因为之前编译检查文件使用gcc4.8.5时候,链接ROCM库出现了问题,所以这次使用系统中的8.3.0版本(在/opt/soft/中)重新编译openblas和openmpi,然后再一起编译nwchem。

2.编译UCX(跳过)

3.使用gcc7.2.0编译openblas、lapack

  1. openblas
    nwchem自带了基础的blas函数库,但是如果想要得到更快的速度,最好自己编译安装一个.
    从官网下载资源编译
#下载文件
git clone https://github.com/xianyi/OpenBLAS.git
#编译
make
#安装到指定路径
make install PREFIX=/home/jrf/tools/openblas-gcc7.2.0
  1. lapack
    其实openblas包含了lapack库,就不需要再单独安装了.在编译Nwchem的时候,把lapack和blas的目录设置成一个就行了.
    参考https://www.jianshu.com/p/fe6c4f42aa0b 传送门

4.编译openmpi

注意
  1. 单机不用ucx
  2. 1.7版本后支持cuda 安装支持cuda版本的openmpi的链接
1)步骤
  1. 下载源代码
    建议
git clone ----recursive https://github.com/open-mpi/ompi.git
  1. autogen.pl
 cd ompi./autogen.pl
  1. 配置
    支持的情况下可以安装支持cuda版本的openmpi
mkdir build
cd build
../configure --prefix=/home/jrf/tools/ompi-gcc8.3.0  --with-cuda=/home/apps/jinrf/tools/cuda/cuda-11.0/include --enable-mpi-ext=cuda
  1. 编译安装
make
make instal
2)bug
  1. 缺少libtool
    ./autogen.sh 出现以下错误

    Updating build configuration files, please wait....configure.ac:38: warning: macro 'AM_PROG_LIBTOOL' not found in libraryconfigure.ac:38: error: possibly undefined macro: AM_PROG_LIBTOOLIf this token and others are legitimate, please use m4_pattern_allow.See the Autoconf documentation.autoreconf: /usr/bin/autoconf failed with exit status: 1
    

    根据这里得知是缺少libtool,安装即可

    sudo apt-get install libtool #ubuntu下
    
    1. cannot find -lnuma
      make 的时候出现以下错误
    /usr/bin/ld: cannot find -lnuma
    

    解决
    安装对应文件sudo apt-get install libnuma-dev

    1. “mca_pml_ob1_recv_request_ack” two few parameter
      这是在openmpi 5.0.0中出现的问题,查看源码发现是openmpi 自己的一个函数调用了自己的另外一个函数mca_pml_ob1_recv_request_ack,调用时候参数本应有四个但是真正只写了三个.猜测可能是更高版本的gcc编译器会满足这种要求?不管了,使用原来可以正常编译通过的openmpi 4.1.0版本.

5.nwchem的安装

1.下载源代码
注意,支持rocm的是这个master分支的版本

git clone https://github.com/nwchemgit/nwchem.git

2.编辑配置文件
注意:

  1. 从nwchem7.0.0开始,如果设置了BLASOPT,也必须设置LAPACK_LIB. OPENBLAS包含了lapack,两个设置相同路径即可.
  2. 重要 这里没有设置lapack的路径LAPACK_LIB,因为感觉openblas包含了lapack的部分实现,应该可以.之前可以编译成功是因为手动编译了nwchem自带的lapack函数库libnwclapack.a,并且在BLASOPT中加入了-lnwclapack,现在已经删除了对nwclapack的搜索,因为想使用效率更快的第三方函数库.如果不行,参考在集群上编译nwchem中3.2的pccompile,使用LAPACK_LIB指定自己手动编译的第三方lapack函数库.
#!/bin/bash
export NWCHEM_TOP=/home/jrf/Quantum_Soft/nwchem-hip#这里文档一定要设为 "nwchem-6.8.1"
export NWCHEM_TARGET=LINUX64
export ARMCI_NETWORK=MPI-PR
export USE_MPI=yes
export USE_MPIF=yes
export USE_MPIF4=yes
export TCE_HIP=yes
export NWCHEM_MODULES=all
export BLASOPT="-L/home/jrf/tools/openblas-gcc8.3.0/lib -lopenblas "
export LIBRARY_PATH=/opt/rocm/hip/lib:$LIBRARY_PATH
export LIBS="-lhip_hcc"
export LD_LIBRARY_PATH=/home/jrf/tools/ompi-gcc8.3.0/lib/:$LD_LIBRARY_PATH
export PATH=$PATH:/home/jrf/tools/ompi-gcc8.3.0/bin/
export HIP_INCLUDE="-I/opt/rocm/hip/include"
export LAPACK_LIB=/home/jrf/lapack-3.9.0
export C_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export OBJC_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export CPLUS_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export OBJCPLUS_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include

运行命令

make nwchem_config

遇到如下问题

config/makefile.h:220: /home/jrf/Quantum_Soft/nwchem-hip/src/config/nwchem_config.h: No such file or directory
config/makefile.h:2739: *** Please define LAPACK_LIB if you have defined BLASOPT or BLAS_LIB.  Stop.

所以编译lapack,参考https://www.jianshu.com/p/fe6c4f42aa0b
更改pccompile,添加LAPACK_LIB的路径(上面的就是最终版的)。
再次运行上面命令,通过,继续编译
运行命令

make

遇到错误,在文件夹/src/tce/ccsd_t中运行的命令

hipcc -c -DTCE_HIP -fno-gpu-rdc -o memory.o memory.hip.cpp
Warning: Type mismatch in argument ‘deltat’ at (1); passed REAL(8) to INTEGER(8) [-Wargument-mismatch]
Compiling ccsd_t_kernels_omp.F...
Compiling tce_hashnsort.F...
Compiling ccsd_t_pstat.F...
Compiling ccsd_t_dot.F...
Compiling ccsd_t_neword.F...
Compiling hybrid.c...
hipcc -c -DTCE_HIP -fno-gpu-rdc -o memory.o memory.hip.cpp
In file included from memory.hip.cpp:1:
In file included from ./header.h:5:
In file included from /opt/rocm/hip/include/hip/hip_runtime_api.h:342:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:44:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_texture_types.h:38:
In file included from /opt/rocm/hip/include/hip/hcc_detail/channel_descriptor.h:28:
/opt/rocm/hip/include/hip/hcc_detail/hip_vector_types.h:45:14: fatal error: 'array' file not found#include <array>^~~~~~~
1 error generated.
In file included from memory.hip.cpp:1:
In file included from ./header.h:5:
In file included from /opt/rocm/hip/include/hip/hip_runtime_api.h:342:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:44:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_texture_types.h:38:
In file included from /opt/rocm/hip/include/hip/hcc_detail/channel_descriptor.h:28:
/opt/rocm/hip/include/hip/hcc_detail/hip_vector_types.h:45:14: fatal error: 'array' file not found#include <array>^~~~~~~
1 error generated.
make[3]: *** [/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o)] Error 1
make[3]: Leaving directory `/home/jrf/Quantum_Soft/nwchem-hip/src/tce/ccsd_t'
make[2]: *** [optimized] Error 2
make[2]: Leaving directory `/home/jrf/Quantum_Soft/nwchem-hip/src/tce/ccsd_t'
make[1]: *** [subdirs] Error 1
make[1]: Leaving directory `/home/jrf/Quantum_Soft/nwchem-hip/src/tce'
make: *** [libraries] Error 1

这个问题应当是hipcc搜索头文件路径设置的不正确导致的,AMD将Clang+LLVM进行扩展形成HIP的底层编译器,以支持AMD GPU编译。实际上在ROCm环境,HIP有三种平台模式(通过环境变量HIP_PLATFORM区别):clang、hcc和nvcc。而HIP提供的hipcc命令,实质是一个perl脚本,通过HIP_PLATFORM等环境变量,调用不同的底层编译器,实现统一编译模式。所以应当是因为clang的路径搜索有问题,使用如下命令检查clang是否能够正常编译

clang -c -DTCE_HIP -fno-gpu-rdc -o memory.o memory.hip.cpp

发现同样的问题,使用如下命令来看看gcc和clang的include路径分别是什么

gcc -v -x c++ /dev/null -fsyntax-only
clang -v -x c++ /dev/null -fsyntax-only

发现果然不一样根据官网,可以知道通过四个环境变量设置路径:
(C_INCLUDE_PATH,
OBJC_INCLUDE_PATH,
CPLUS_INCLUDE_PATH,
OBJCPLUS_INCLUDE_PATH)
使用-I指定头文件搜索路径,再次编译memory.hip.cpp文件,通过,但是出现了新的问题

memory.hip.cpp:132:18: warning: 'hipMallocHost' is deprecated: use hipHostMalloc instead [-Wdeprecated-declarations]ptr = morecore(hipMallocHost, bytes);^
/opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:1115:1: note: 'hipMallocHost' has been explicitly marked deprecated here
DEPRECATED("use hipHostMalloc instead")
^
/opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:55:41: note: expanded from macro 'DEPRECATED'
#define DEPRECATED(msg) __attribute__ ((deprecated(msg)))^
1 warning generated.
memory.hip.cpp:132:18: warning: 'hipMallocHost' is deprecated: use hipHostMalloc instead [-Wdeprecated-declarations]ptr = morecore(hipMallocHost, bytes);^
/opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:1115:1: note: 'hipMallocHost' has been explicitly marked deprecated here
DEPRECATED("use hipHostMalloc instead")
^
/opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h:55:41: note: expanded from macro 'DEPRECATED'
#define DEPRECATED(msg) __attribute__ ((deprecated(msg)))^
1 warning generated.

但是这不是我所能更改的,没有权限,因为是warning,那就放过吧。
将如下命令添加进pccompile文件中去,重复上面的编译过程。

export C_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export OBJC_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export CPLUS_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include
export OBJCPLUS_INCLUDE_PATH=/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include:/usr/local/include:/opt/soft/gcc-8.3.0/build/include:/opt/soft/gcc-8.3.0/build/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include-fixed:/usr/include

再次遇到问题 找不到 -lnwclapack

make[1]: 进入目录“/home/jrf/Quantum_Soft/nwchem-hip/src”
gfortran -m64 -ffast-math  -Warray-bounds -std=legacy -fdefault-integer-8 -fno-tree-dominator-opts  -finline-functions -O2 -g -fno-aggressive-loop-optimizations -fno-tree-dominator-opts  -g -O   -I.  -I/home/jrf/Quantum_Soft/nwchem-hip/src/include -I/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/include -DGFORTRAN -DCHKUNDFLW -DGCC4 -DGCC46 -DEXT_INT -DLINUX -DLINUX64 -DPARALLEL_DIAG -DTCE_HIP  -D__HIP_PLATFORM_HCC__=   -I/opt/rocm/hip/include -I/opt/rocm/hcc/include -I/opt/rocm/hsa/include -DCOMPILATION_DATE="'`date +%a_%b_%d_%H:%M:%S_%Y`'" -DCOMPILATION_DIR="'/home/jrf/Quantum_Soft/nwchem-hip'" -DNWCHEM_BRANCH="'7.0.0'"  -c -o nwchem.o nwchem.F
gfortran -m64 -ffast-math  -Warray-bounds -std=legacy -fdefault-integer-8 -fno-tree-dominator-opts  -finline-functions -O2 -g -fno-aggressive-loop-optimizations -fno-tree-dominator-opts  -g -O   -I.  -I/home/jrf/Quantum_Soft/nwchem-hip/src/include -I/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/include -DGFORTRAN -DCHKUNDFLW -DGCC4 -DGCC46 -DEXT_INT -DLINUX -DLINUX64 -DPARALLEL_DIAG -DTCE_HIP  -D__HIP_PLATFORM_HCC__=   -I/opt/rocm/hip/include -I/opt/rocm/hcc/include -I/opt/rocm/hsa/include -DCOMPILATION_DATE="'`date +%a_%b_%d_%H:%M:%S_%Y`'" -DCOMPILATION_DIR="'/home/jrf/Quantum_Soft/nwchem-hip'" -DNWCHEM_BRANCH="'7.0.0'"  -c -o stubs.o stubs.F
make[1]: 离开目录“/home/jrf/Quantum_Soft/nwchem-hip/src”
gfortran   -L/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64 -L/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/lib   -o /home/jrf/Quantum_Soft/nwchem-hip/bin/LINUX64/nwchem nwchem.o stubs.o -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -lpspw -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lnwcutil -lga -larmci -lpeigs -lperfm -lcons -lbq -lnwcutil    -L/home/jrf/tools/openblas-gcc8.3.0/lib -lnwclapack -lopenblas   /home/jrf/lapack-3.9.0 -lnwcblas   -L/home/jrf/tools/ompi/lib -lmpi_usempi -lmpi_mpifh -lmpi     -lcomex -lmpi_usempi -lmpi_mpifh -lmpi -lrt -lpthread -lm -lpthread  -lstdc++
/usr/bin/ld: 找不到 -lnwclapack
/home/jrf/lapack-3.9.0: 文件无法辨识: 是一个目录
collect2: 错误:ld 返回 1
make: *** [all] 错误 1

从命名上看,这应当是nwmchem自己编译生成的类似lapack的库,看了之前编译的cpu的版本,发现在lib中是有这个库的,就在~/Quantum_Soft/nwchem_6.8.1/lib中,于是去了~/Quantum_Soft/nwchem_hip/src/lapack中尝试make,出现了如下提示

NWChem's Performance is degraded by not setting BLASOPT
Please consider using ATLAS, GotoBLAS2, OpenBLAS, Intel MKL,
IBM ESSL, AMD ACML, etc. to improve performance.
If you decide to not use a fast implementation of BLAS/LAPACK,
please define USE_INTERNALBLAS=y and the internal Netlib will be used.
/home/jrf/Quantum_Soft/nwchem-hip/bin/LINUX64/depend.x  -I/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/include > dependenciesNWChem's Performance is degraded by not setting BLASOPT
Please consider using ATLAS, GotoBLAS2, OpenBLAS, Intel MKL,
IBM ESSL, AMD ACML, etc. to improve performance.
If you decide to not use a fast implementation of BLAS/LAPACK,
please define USE_INTERNALBLAS=y and the internal Netlib will be used.
make: Nothing to be done for `errordgemm'.

哦?这很奇怪,我明明是设置了这个变量的呀,再次运行pccompile文件,make一下,发现生成了libnwclapack.a,唉?之前怎么就没有生成呢。

几个月后回顾,为什么会出现这个问题嘞?
-lnwclapack 应当是寻找nwchem自己内含的一个lapack函数库,是一个最基本的实现,速度和效率会比较低.这就是为什么进入/src/lapack进行make的时候会出现这样的警告了:
因为:
我在pccompile中设置了第三方blas库的路径,并且没有设置USE_INTERNALBLAS为y,因此编译nwchem时,make认为我是想用外部的lapack,因此就没有进入src/lapack来编译自带的lapack库.但是我又矛盾地使用-l命令去寻找nwchem自带的lapack库-lnwclapack,自然是找不到.
解决办法:
1)安装lapack函数库,设置lapack函数库的路径LAPACK_LIB,修改-lnwclapack-llapack(已实践,可以)
2)安装的openblas函数库有部分lapack的实现,可以直接将-llapack删除试一下(不知道行不行)
不管了,再次make,大段的未定义地函数,都是针对hip的API的。

lib/LINUX64/libtce.a(sd_t_total.o): In function `hip_impl::kernarg hip_impl::make_kernarg<15ul, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, double*, double*, double*, int, int, (void*)0>(std::tuple<int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, double*, double*, double*, int, int> const&, hip_impl::kernargs_size_align const&, hip_impl::kernarg)':
(.text+0x3ebb3): undefined reference to `hip_impl::kernarg::resize(unsigned long)'
(.text+0x3f41b): undefined reference to `hip_impl::kernarg::~kernarg()'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(sd_t_total.o): In function `hip_impl::kernarg hip_impl::make_kernarg<20ul, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, double*, double*, double*, int, int, (void*)0>(std::tuple<int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, double*, double*, double*, int, int> const&, hip_impl::kernargs_size_align const&, hip_impl::kernarg)':
(.text+0x3f4c2): undefined reference to `hip_impl::kernarg::kernarg(hip_impl::kernarg&&)'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o): In function `getGpuMem':
(.text+0x274): undefined reference to `hipMalloc'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o): In function `morecore(hipError_t (*)(void**, unsigned long), unsigned long)':
memory.hip.cpp:(.text+0x579): undefined reference to `hipGetLastError'
memory.hip.cpp:(.text+0x580): undefined reference to `hipGetErrorString'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o): In function `getHostMem':
(.text+0x8f4): undefined reference to `hipMallocHost'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o): In function `clearGpuFreeList()':
memory.hip.cpp:(.text+0xf35): undefined reference to `hipFree'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(memory.o): In function `clearHostFreeList()':
memory.hip.cpp:(.text+0x1015): undefined reference to `hipHostFree'
/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64/libtce.a(hybrid.o): In function `device_init_':
hybrid.c:(.text+0x48): undefined reference to `hipGetDeviceCount'
hybrid.c:(.text+0x8f): undefined reference to `hipSetDevice'
collect2: error: ld returned 1 exit status

针对的命令是下面的,这是最后对可执行程序nwchem的链接

gfortran   -L/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64 -L/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/lib   -o /home/jrf/Quantum_Soft/nwchem-hip/bin/LINUX64/nwchem nwchem.o stubs.o -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -lpspw -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lnwcutil -lga -larmci -lpeigs -lperfm -lcons -lbq -lnwcutil    -L/home/jrf/tools/openblas-gcc8.3.0/lib -lnwclapack -lopenblas   -L/home/jrf/lapack-3.9.0 -lnwcblas   -L/home/jrf/tools/ompi/lib -lmpi_usempi -lmpi_mpifh -lmpi     -lcomex -lmpi_usempi -lmpi_mpifh -lmpi -lrt -lpthread -lm -lpthread  -lstdc++

warning: ‘hipMallocHost’ is deprecated: use hipHostMalloc instead [-Wdeprecated-declarations]

一个问题相同但是原因不相同的帖子,他是因为装了两个版本的hip?但是这个帖子讲了怎么使用nm。


MIGraphX fails to link for Vega10 on ROCm 2.3


使用nm命令来分析静态库函数libtce.a,发现其中报错的一个函数hip_impl是weak格式

(base) [jrf@cu06 LINUX64]$ nm -A libtce.a | c++filt | grep hip_impl::hipLaunchKernelGGLImpl
libtce.a:sd_t_total.o:00000000000144a0 W hip_impl::hipLaunchKernelGGLImpl(unsigned long, dim3 const&, dim3 const&, unsigned int, ihipStream_t*, void**)

然后通过nm命令发现在/opt/rocm/hip/lib的libhip_hcc.so库函数里面包含这个函数的实现,应当添加上这个库的搜索路径,在pccompile中添加

export LIBRARY_PATH=/opt/rocm/hip/lib:$LIBRARY_PATH
export LIBS="-lhip_hcc"

又出现了如下的错误,不过通过最后这个整合的编译命令上面看,上面的两个环境变量的设置并没有起作用。而且不知道为什么,libmpi_usempi.so在用gcc8.3.0编译的版本中是没有的,只有在4.8.5那个版本中有,但是去掉这个函数库对编译没有影响。

make[1]: Leaving directory `/home/jrf/Quantum_Soft/nwchem-hip/src'
gfortran   -L/home/jrf/Quantum_Soft/nwchem-hip/lib/LINUX64              \
-L/home/jrf/Quantum_Soft/nwchem-hip/src/tools/install/lib               \-o /home/jrf/Quantum_Soft/nwchem-hip/bin/LINUX64/nwchem nwchem.o stubs.o \-lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim \-lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian  \-lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd   \-lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze   \-lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -lpspw -ltce    \-lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lnwcutil -lga -larmci    \-lpeigs -lperfm -lcons -lbq -lnwcutil    \ -L/home/jrf/tools/openblas-gcc8.3.0/lib -lnwclapack -lopenblas   \-L/home/jrf/lapack-3.9.0 -lnwcblas   -L/home/jrf/tools/ompi-gcc8.3.0/lib  \-lmpi_usempif08  -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi     \-lcomex -lmpi_usempi -lmpi_mpifh   \-lmpi -lrt -lpthread -lm -lpthread  -lstdc++
/usr/bin/ld: cannot find -lnwclapack
/usr/bin/ld: cannot find -lmpi_usempi
collect2: error: ld returned 1 exit status
make: *** [all] Error 1

尝试手动链接了最后的可执行程序,试试能不能运行成功。
。。。
手动编译运行成功了。
运行结果发现比纯使用cpu快了十几分钟。

nwchem (ROCM版)编译 -最终目标相关推荐

  1. 关于.NET编译的目标平台(AnyCPU,x86,x64)

    在VisualStudio中项目平台属性包含x86/x64/AnyCPU三个选项,之前的项目中并没有特别去关注这一点,最近的项目中涉及到了在不同平台运行的问题,所以专门了解并整理了这方面的知识. x8 ...

  2. 源发行版 8 需要目标发行版 1.8

    源发行版 8 需要目标发行版 1.8 bug信息 Information:java: javacTask: 源发行版 8 需要目标发行版 1.8 Error:java: Compilation fai ...

  3. 关于.NET编译的目标平台(AnyCPU,x86,x64) (转)

    关于.NET编译的目标平台(AnyCPU,x86,x64)(转) 今天有项目的代码收到客户的反馈,要求所有的EXE工程的目标平台全部指定成x86,而所有DLL工程的目标平台全部指定成AnyCPU . ...

  4. 阿里广告技术最新突破:全链路联动-面向最终目标的全链路一致性建模

    ©作者 | 王哲 单位 | 阿里妈妈展示广告算法专家 研究方向 | 广告/推荐/深度学习/NLP 引言 深度学习时代的到来给搜推广业务带来了一波巨大的红利,一方面是深度学习模型带来的技术红利,另一方面 ...

  5. 为啥linux分区是nvme0n1,NAS 篇五:尝试达成最终目标: 黑裙+万兆网卡+Nvme 存储空间的实现与测试...

    NAS 篇五:尝试达成最终目标: 黑裙+万兆网卡+Nvme 存储空间的实现与测试 2020-01-18 18:49:23 46点赞 248收藏 80评论 创作立场声明:1. 本系列文章讨论NAS与局域 ...

  6. 阿里广告技术最新突破!全链路联动——面向最终目标的全链路一致性建模

    作者 | 王哲,阿里妈妈展示广告算法专家 整理 | NewBeeNLP 1. 引言 深度学习时代的到来给搜推广业务带来了一波巨大的红利,一方面是深度学习模型带来的技术红利,另一方面是GPU/NPU等硬 ...

  7. Mybatis plus修改了Language Level后,IDEA运行应用出现了Information:java: javacTask: 源发行版 8 需要目标发行版 1.8

    问题描述: Mybatis plus修改了Language Level后,IDEA运行应用出现了Information:java: javacTask: 源发行版 8 需要目标发行版 1.8,启动服务 ...

  8. python最终目标_Python晋级之路-工欲善其事必先利其器

    猴子老师Live中用一个盖房子的例子,很好的解释了大多数零基础朋友开始学习编程时的迷茫.面对一个客户需求,如何分三个步骤就能得出最后的结果. 我简单整理一下,第一步先仔细分析客户的需求,明确最终目标( ...

  9. java: 警告: 源发行版 xx 需要目标发行版 1.10 解决方案

    错误提示:java: 警告: 源发行版 10需要目标发行版 1.10 解决方案 首先说一下问题的关键所在,然后再细说解决步骤: 遇到这个问题的朋友大概率都是在写Spring项目时遇到的,这就需要Mav ...

最新文章

  1. xgboost重要参数1
  2. 静态查看进程信息 -- 基于 ps 命令实现
  3. Mysql远程无法连接
  4. Objective-C 运行AppleScript脚本
  5. SAP Analytics Cloud里显示在图表里的描述信息更改
  6. python 命名实体识别_使用Python和Keras的有关命名实体识别(NER)的完整教程
  7. wangeditor html编辑,Vue整合wangEditor富文本编辑器
  8. 努力≠上进!那些“熬夜”持续精进的人有多可怕!
  9. PHP URL参数获取方式的四种例子
  10. 使用PInvoke.NET插件为托管代码添加Win32 API签名
  11. Talib技术因子详解(十)
  12. RFID天线—1.阻抗测量方法
  13. [1034]安装Xposed框架+JustTrustMe
  14. vue中使用web serial api实现串口通信
  15. 免费升级win10系统方法
  16. codeforce #401 div2 Alyona and Spreadsheet 思维题
  17. Debian Etch 源配置
  18. 微信小程序—刷脸实名认证
  19. Exchange2016将用户头像发布到全局地址列表(GAL)
  20. 李嘉诚布局接班人富儿穷养 财富版图正远离中华区

热门文章

  1. 龙蜥社区成立Anolis OS Course SIG,打造龙蜥OS学习平台
  2. Word中用宏编程完成图片调整大小与旋转
  3. UVA - 12563 劲歌金曲(DP 01背包)
  4. java对类对象初始化_Java类和对象初始化
  5. js 代码大全(各种方法、属性)
  6. keychron K3 键盘和 Windows11 操作系统的笔记本电脑通过蓝牙配对出现问题的解决方案
  7. iOS-Swift3常用语法
  8. win10 java更新失败_win10系统配置java环境及遇到问题的一些处理方法
  9. JqGrid 各个属性、方法使用说明
  10. 全景航拍需要的注意事项