cuBLAS使用(3)

cublasgbmv()

cublasspr()

cublasspr2()

cublasspmv()

cublasspr()

cublasgemvStridedBatched()

cublasgemvBatched()

cublashpr()

cublasher2()

cublashpmv()

cublashbmv()

cublashemv()

cublastrsv()

cublastrmv()

cublastpsv()

cublastpmv()

cublastbsv()

cublastbmv()

cublassyr2()

cublassyr()

cublassymv()

cublassbmv()

cublasger()

cublasgemv()

在本章中，我们将介绍执行矩阵-向量运算的二级基本线性代数子程序（BLAS2）函数。

cublas<t>gbmv()

cublasStatus_t cublasSgbmv(cublasHandle_t handle, cublasOperation_t trans,int m, int n, int kl, int ku,const float           *alpha,const float           *A, int lda,const float           *x, int incx,const float           *beta,float           *y, int incy)
cublasStatus_t cublasDgbmv(cublasHandle_t handle, cublasOperation_t trans,int m, int n, int kl, int ku,const double          *alpha,const double          *A, int lda,const double          *x, int incx,const double          *beta,double          *y, int incy)
cublasStatus_t cublasCgbmv(cublasHandle_t handle, cublasOperation_t trans,int m, int n, int kl, int ku,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *x, int incx,const cuComplex       *beta,cuComplex       *y, int incy)
cublasStatus_t cublasZgbmv(cublasHandle_t handle, cublasOperation_t trans,int m, int n, int kl, int ku,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *x, int incx,const cuDoubleComplex *beta,cuDoubleComplex *y, int incy)

此函数支持64位整数接口。
此函数执行带状矩阵向量乘法

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix � lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix � .
alpha	host or device	input	<type> scalar used for multiplication.
AP	device	input	<type> array with � stored in packed format.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
beta	host or device	input	<type> scalar used for multiplication, if `beta==0` then `y` does not have to be a valid input.
y	device	input	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or `incy` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or `alpha` == NULL or `beta` == NULL
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>spr()

cublasStatus_t cublasSspr(cublasHandle_t handle, cublasFillMode_t uplo,int n, const float  *alpha,const float  *x, int incx, float  *AP)
cublasStatus_t cublasDspr(cublasHandle_t handle, cublasFillMode_t uplo,int n, const double *alpha,const double *x, int incx, double *AP)

此函数支持64位整数接口。
此函数执行压缩对称秩1更新

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix � lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix � .
alpha	host or device	input	<type> scalar used for multiplication.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
AP	device	in/out	<type> array with � stored in packed format.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or `alpha` == NULL
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>spr2()

cublasStatus_t cublasSspr2(cublasHandle_t handle, cublasFillMode_t uplo,int n, const float  *alpha,const float  *x, int incx,const float  *y, int incy, float  *AP)
cublasStatus_t cublasDspr2(cublasHandle_t handle, cublasFillMode_t uplo,int n, const double *alpha,const double *x, int incx,const double *y, int incy, double *AP)

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix � lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix � .
alpha	host or device	input	<type> scalar used for multiplication.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
y	device	input	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.
AP	device	in/out	<type> array with � stored in packed format.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or `incy` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or `alpha` == NULL
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>sbmv()

cublasStatus_t cublasSsbmv(cublasHandle_t handle, cublasFillMode_t uplo,int n, int k, const float  *alpha,const float  *A, int lda,const float  *x, int incx,const float  *beta, float *y, int incy)
cublasStatus_t cublasDsbmv(cublasHandle_t handle, cublasFillMode_t uplo,int n, int k, const double *alpha,const double *A, int lda,const double *x, int incx,const double *beta, double *y, int incy)

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix `A`.
k		input	number of sub- and super-diagonals of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
A	device	input	<type> array of dimension `lda x n` with `\lda >= k+1`.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
beta	host or device	input	<type> scalar used for multiplication, if `beta==0` then `y` does not have to be a valid input.
y	device	in/out	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or `k` < 0 or if `incx` = 0 or `incy` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or if `alpha` == NULL or `beta` == NULL or `lda` < (1 + `k`)
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>spmv()

cublasStatus_t cublasSspmv(cublasHandle_t handle, cublasFillMode_t uplo,int n, const float  *alpha, const float  *AP,const float  *x, int incx, const float  *beta,float  *y, int incy)
cublasStatus_t cublasDspmv(cublasHandle_t handle, cublasFillMode_t uplo,int n, const double *alpha, const double *AP,const double *x, int incx, const double *beta,double *y, int incy)

此函数支持64位整数接口。
此函数执行对称压缩矩阵-向量乘法

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix � lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix � .
alpha	host or device	input	<type> scalar used for multiplication.
AP	device	input	<type> array with � stored in packed format.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
beta	host or device	input	<type> scalar used for multiplication, if `beta==0` then `y` does not have to be a valid input.
y	device	input	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or `incy` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or `alpha` == NULL or `beta` == NULL
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>spr()

cublasStatus_t cublasSspr(cublasHandle_t handle, cublasFillMode_t uplo,int n, const float  *alpha,const float  *x, int incx, float  *AP)
cublasStatus_t cublasDspr(cublasHandle_t handle, cublasFillMode_t uplo,int n, const double *alpha,const double *x, int incx, double *AP)

此函数支持64位整数接口。
此函数执行压缩对称秩1更新

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix � lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix � .
alpha	host or device	input	<type> scalar used for multiplication.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
AP	device	in/out	<type> array with � stored in packed format.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or `alpha` == NULL
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>gemvStridedBatched()

cublasStatus_t cublasSgemvStridedBatched(cublasHandle_t handle,cublasOperation_t trans,int m, int n,const float           *alpha,const float           *A, int lda,long long int         strideA,const float           *x, int incx,long long int         stridex,const float           *beta,float                 *y, int incy,long long int         stridey,int batchCount)
cublasStatus_t cublasDgemvStridedBatched(cublasHandle_t handle,cublasOperation_t trans,int m, int n,const double          *alpha,const double          *A, int lda,long long int         strideA,const double          *x, int incx,long long int         stridex,const double          *beta,double                *yarray[], int incy,long long int         stridey,int batchCount)
cublasStatus_t cublasCgemvStridedBatched(cublasHandle_t handle,cublasOperation_t trans,int m, int n,const cuComplex       *alpha,const cuComplex       *A, int lda,long long int         strideA,const cuComplex       *x, int incx,long long int         stridex,const cuComplex       *beta,cuComplex             *y, int incy,long long int         stridey,int batchCount)
cublasStatus_t cublasZgemvStridedBatched(cublasHandle_t handle,cublasOperation_t trans,int m, int n,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,long long int         strideA,const cuDoubleComplex *x, int incx,long long int         stridex,const cuDoubleComplex *beta,cuDoubleComplex       *y, int incy,long long int         stridey,int batchCount)
cublasStatus_t cublasHSHgemvStridedBatched(cublasHandle_t handle,cublasOperation_t trans,int m, int n,const float           *alpha,const __half          *A, int lda,long long int         strideA,const __half          *x, int incx,long long int         stridex,const float           *beta,__half                *y, int incy,long long int         stridey,int batchCount)
cublasStatus_t cublasHSSgemvStridedBatched(cublasHandle_t handle,cublasOperation_t trans,int m, int n,const float           *alpha,const __half          *A, int lda,long long int         strideA,const __half          *x, int incx,long long int         stridex,const float           *beta,float                 *y, int incy,long long int         stridey,int batchCount)
cublasStatus_t cublasTSTgemvStridedBatched(cublasHandle_t handle,cublasOperation_t trans,int m, int n,const float           *alpha,const __nv_bfloat16   *A, int lda,long long int         strideA,const __nv_bfloat16   *x, int incx,long long int         stridex,const float           *beta,__nv_bfloat16         *y, int incy,long long int         stridey,int batchCount)
cublasStatus_t cublasTSSgemvStridedBatched(cublasHandle_t handle,cublasOperation_t trans,int m, int n,const float           *alpha,const __nv_bfloat16   *A, int lda,long long int         strideA,const __nv_bfloat16   *x, int incx,long long int         stridex,const float           *beta,float                 *y, int incy,long long int         stridey,int batchCount)

此函数支持64位整数接口。
此函数执行一批矩阵和向量的矩阵向量乘法。该批被认为是“均匀的，”即所有实例对于它们各自的A矩阵、x和y向量具有相同的维数（m，n）、前导维数（lda）、增量（incx，incy）和转置（trans）.批处理的每个实例的输入矩阵A和向量x以及输出向量y位于相对于它们在前一实例中的位置的固定元素偏移量处。指向A矩阵的指针、第一个实例的x和y向量以及元素数量的偏移量（strideA、stridex和stridey）由用户传递给函数，这些元素确定输入矩阵和向量的位置，以及未来实例中的输出向量。

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
trans		input	operation op(`A[i]`) that is non- or (conj.) transpose.
m		input	number of rows of matrix `A[i]`.
n		input	number of columns of matrix `A[i]`.
alpha	host or device	input	<type> scalar used for multiplication.
A	device	input	<type>* pointer to the A matrix corresponding to the first instance of the batch, with dimensions `lda x n` with `lda>=max(1,m)`.
lda		input	leading dimension of two-dimensional array used to store each matrix `A[i]`.
strideA		input	Value of type long long int that gives the offset in number of elements between `A[i]` and `A[i+1]`
x	device	input	<type>* pointer to the x vector corresponding to the first instance of the batch, with each dimension `n` if `trans==CUBLAS_OP_N` and `m` otherwise.
incx		input	stride of each one-dimensional array x[i].
stridex		input	Value of type long long int that gives the offset in number of elements between `x[i]` and `x[i+1]`
beta	host or device	input	<type> scalar used for multiplication. If `beta == 0`, `y` does not have to be a valid input.
y	device	in/out	<type>* pointer to the y vector corresponding to the first instance of the batch, with each dimension `m` if `trans==CUBLAS_OP_N` and `n` otherwise. Vectors `y[i]` should not overlap; otherwise, undefined behavior is expected.
incy		input	stride of each one-dimensional array y[i].
stridey		input	Value of type long long int that gives the offset in number of elements between `y[i]` and `y[i+1]`
batchCount		input	number of GEMVs to perform in the batch.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	the parameters `m,n,batchCount<0`
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>gemvBatched()

cublasStatus_t cublasSgemvBatched(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const float           *alpha,const float           *Aarray[], int lda,const float           *xarray[], int incx,const float           *beta,float           *yarray[], int incy,int batchCount)
cublasStatus_t cublasDgemvBatched(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const double          *alpha,const double          *Aarray[], int lda,const double          *xarray[], int incx,const double          *beta,double          *yarray[], int incy,int batchCount)
cublasStatus_t cublasCgemvBatched(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const cuComplex       *alpha,const cuComplex       *Aarray[], int lda,const cuComplex       *xarray[], int incx,const cuComplex       *beta,cuComplex       *yarray[], int incy,int batchCount)
cublasStatus_t cublasZgemvBatched(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const cuDoubleComplex *alpha,const cuDoubleComplex *Aarray[], int lda,const cuDoubleComplex *xarray[], int incx,const cuDoubleComplex *beta,cuDoubleComplex *yarray[], int incy,int batchCount)
cublasStatus_t cublasHSHgemvBatched(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const float           *alpha,const __half          *Aarray[], int lda,const __half          *xarray[], int incx,const float           *beta,__half                *yarray[], int incy,int batchCount)
cublasStatus_t cublasHSSgemvBatched(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const float           *alpha,const __half          *Aarray[], int lda,const __half          *xarray[], int incx,const float           *beta,float                 *yarray[], int incy,int batchCount)
cublasStatus_t cublasTSTgemvBatched(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const float           *alpha,const __nv_bfloat16   *Aarray[], int lda,const __nv_bfloat16   *xarray[], int incx,const float           *beta,__nv_bfloat16         *yarray[], int incy,int batchCount)
cublasStatus_t cublasTSSgemvBatched(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const float           *alpha,const __nv_bfloat16   *Aarray[], int lda,const __nv_bfloat16   *xarray[], int incx,const float           *beta,float                 *yarray[], int incy,int batchCount)

此函数支持64位整数接口。
此函数执行一批矩阵和向量的矩阵向量乘法。该批被认为是“均匀的，”即所有实例对于它们各自的A矩阵、x和y向量具有相同的维数（m，n）、前导维数（lda）、增量（incx，incy）和转置（trans）.输入矩阵和向量的地址以及批处理的每个实例的输出向量是从调用方传递给函数的指针数组中读取的。

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
trans		input	operation op(`A[i]`) that is non- or (conj.) transpose.
m		input	number of rows of matrix `A[i]`.
n		input	number of columns of matrix `A[i]`.
alpha	host or device	input	<type> scalar used for multiplication.
Aarray	device	input	array of pointers to <type> array, with each array of dim. `lda x n` with `lda>=max(1,m)`. All pointers must meet certain alignment criteria. Please see below for details.
lda		input	leading dimension of two-dimensional array used to store each matrix `A[i]`.
xarray	device	input	array of pointers to <type> array, with each dimension `n` if `trans==CUBLAS_OP_N` and `m` otherwise. All pointers must meet certain alignment criteria. Please see below for details.
incx		input	stride of each one-dimensional array x[i].
beta	host or device	input	<type> scalar used for multiplication. If `beta == 0`, `y` does not have to be a valid input.
yarray	device	in/out	array of pointers to <type> array. It has dimensions `m` if `trans==CUBLAS_OP_N` and `n` otherwise. Vectors `y[i]` should not overlap; otherwise, undefined behavior is expected. All pointers must meet certain alignment criteria. Please see below for details.
incy		input	stride of each one-dimensional array y[i].
batchCount		input	number of pointers contained in Aarray, xarray and yarray.

If math mode enables fast math modes when using cublasSgemvBatched(), pointers (not the pointer arrays) placed in the GPU memory must be properly aligned to avoid misaligned memory access errors. Ideally all pointers are aligned to at least 16 Bytes. Otherwise it is recommended that they meet the following rule:

if k % 4==0 then ensure intptr_t(ptr) % 16 == 0,

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	the parameters `m,n,batchCount<0`
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>hpr()

cublasStatus_t cublasChpr(cublasHandle_t handle, cublasFillMode_t uplo,int n, const float *alpha,const cuComplex       *x, int incx,cuComplex       *AP)
cublasStatus_t cublasZhpr(cublasHandle_t handle, cublasFillMode_t uplo,int n, const double *alpha,const cuDoubleComplex *x, int incx,cuDoubleComplex *AP)

此函数支持64位整数接口。
此函数执行压缩厄米特秩-1更新

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
A	device	in/out	<type> array of dimensions `lda x n`, with `lda>=max(1,n)`. The imaginary parts of the diagonal elements are assumed and set to zero.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` == 0 or if `uplo` != `CUBLAS_FILL_MODE_UPPER`, `CUBLAS_FILL_MODE_LOWER` or if `lda` < max(1, `n`) or `alpha` == NULL
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>her2()

cublasStatus_t cublasCher2(cublasHandle_t handle, cublasFillMode_t uplo,int n, const cuComplex       *alpha,const cuComplex       *x, int incx,const cuComplex       *y, int incy,cuComplex       *A, int lda)
cublasStatus_t cublasZher2(cublasHandle_t handle, cublasFillMode_t uplo,int n, const cuDoubleComplex *alpha,const cuDoubleComplex *x, int incx,const cuDoubleComplex *y, int incy,cuDoubleComplex *A, int lda)

此函数支持64位整数接口。
此函数执行厄米特秩2更新

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
y	device	input	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.
A	device	in/out	<type> array of dimension `lda x n` with `lda>=max(1,n)`. The imaginary parts of the diagonal elements are assumed and set to zero.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` == 0 or `incy` == 0 or if `uplo` != `CUBLAS_FILL_MODE_UPPER`, `CUBLAS_FILL_MODE_LOWER` or if `lda` < max(1, `n`) or `alpha` == NULL
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>hpmv()

cublasStatus_t cublasChpmv(cublasHandle_t handle, cublasFillMode_t uplo,int n, const cuComplex       *alpha,const cuComplex       *AP,const cuComplex       *x, int incx,const cuComplex       *beta,cuComplex       *y, int incy)
cublasStatus_t cublasZhpmv(cublasHandle_t handle, cublasFillMode_t uplo,int n, const cuDoubleComplex *alpha,const cuDoubleComplex *AP,const cuDoubleComplex *x, int incx,const cuDoubleComplex *beta,cuDoubleComplex *y, int incy)

此功能支持64位整数接口。此功能执行Hermitian包装的矩阵矢量乘法

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
AP	device	input	<type> array with `A` stored in packed format. The imaginary parts of the diagonal elements are assumed to be zero.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
beta	host or device	input	<type> scalar used for multiplication, if `beta==0` then `y` does not have to be a valid input.
y	device	in/out	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` == 0 or `incy` == 0 or if `uplo` != `CUBLAS_FILL_MODE_UPPER`, `CUBLAS_FILL_MODE_LOWER` or `alpha` == NULL or `beta` == NULL
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>hbmv()

cublasStatus_t cublasChbmv(cublasHandle_t handle, cublasFillMode_t uplo,int n, int k, const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *x, int incx,const cuComplex       *beta,cuComplex       *y, int incy)
cublasStatus_t cublasZhbmv(cublasHandle_t handle, cublasFillMode_t uplo,int n, int k, const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *x, int incx,const cuDoubleComplex *beta,cuDoubleComplex *y, int incy)

此函数支持64位整数接口。
This function performs the Hermitian banded matrix-vector multiplication

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix `A`.
k		input	number of sub- and super-diagonals of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
A	device	input	<type> array of dimensions `lda x n`, with `lda>=k+1`. The imaginary parts of the diagonal elements are assumed to be zero.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
beta	host or device	input	<type> scalar used for multiplication, if `beta==0` then does not have to be a valid input.
y	device	in/out	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or `k` < 0 or if `incx` = 0 or `incy` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or if `lda` < (`k` + 1) or `alpha` == NULL or `beta` == NULL
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>hemv()

cublasStatus_t cublasChemv(cublasHandle_t handle, cublasFillMode_t uplo,int n, const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *x, int incx,const cuComplex       *beta,cuComplex       *y, int incy)
cublasStatus_t cublasZhemv(cublasHandle_t handle, cublasFillMode_t uplo,int n, const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *x, int incx,const cuDoubleComplex *beta,cuDoubleComplex *y, int incy)

此函数支持64位整数接口。
此函数执行厄米特矩阵-向量乘法

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
A	device	input	<type> array of dimension `lda x n`, with `lda>=max(1,n)`. The imaginary parts of the diagonal elements are assumed to be zero.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
beta	host or device	input	<type> scalar used for multiplication, if `beta==0` then `y` does not have to be a valid input.
y	device	in/out	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or `incy` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or `lda` < `n`
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>trsv()

cublasStatus_t cublasStrsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const float           *A, int lda,float           *x, int incx)
cublasStatus_t cublasDtrsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const double          *A, int lda,double          *x, int incx)
cublasStatus_t cublasCtrsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const cuComplex       *A, int lda,cuComplex       *x, int incx)
cublasStatus_t cublasZtrsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const cuDoubleComplex *A, int lda,cuDoubleComplex *x, int incx)

此函数支持64位整数接口。
此函数求解具有单个右侧边的三角线性系统

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.
trans		input	operation op(`A`) that is non- or (conj.) transpose.
diag		input	indicates if the elements on the main diagonal of matrix `A` are unity and should not be accessed.
n		input	number of rows and columns of matrix `A`.
A	device	input	<type> array of dimension `lda x n`, with `lda>=max(1,n)`.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.
x	device	in/out	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or if `trans` != `CUBLAS_OP_N`, `CUBLAS_OP_C`, `CUBLAS_OP_T` or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or if `diag` != `CUBLAS_DIAG_UNIT`, `CUBLAS_DIAG_NON_UNIT` or `lda` < max(1, `n`)
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>trmv()

cublasStatus_t cublasStrmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const float           *A, int lda,float           *x, int incx)
cublasStatus_t cublasDtrmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const double          *A, int lda,double          *x, int incx)
cublasStatus_t cublasCtrmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const cuComplex       *A, int lda,cuComplex       *x, int incx)
cublasStatus_t cublasZtrmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const cuDoubleComplex *A, int lda,cuDoubleComplex *x, int incx)

此函数支持64位整数接口。
此函数执行三角矩阵-向量乘法

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.
trans		input	operation op(`A`) (that is, non- or conj.) transpose.
diag		input	indicates if the elements on the main diagonal of matrix `A` are unity and should not be accessed.
n		input	number of rows and columns of matrix `A`.
A	device	input	<type> array of dimensions `lda x n` , with `lda>=max(1,n)`.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.
x	device	in/out	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or if `trans` != `CUBLAS_OP_N`, `CUBLAS_OP_C`, `CUBLAS_OP_T` or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or if `diag` != `CUBLAS_DIAG_UNIT`, `CUBLAS_DIAG_NON_UNIT` or `lda` < max(1, `n`)
`CUBLAS_STATUS_ALLOC_FAILED`	the allocation of internal scratch memory failed
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>tpsv()

cublasStatus_t cublasStpsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const float           *AP,float           *x, int incx)
cublasStatus_t cublasDtpsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const double          *AP,double          *x, int incx)
cublasStatus_t cublasCtpsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const cuComplex       *AP,cuComplex       *x, int incx)
cublasStatus_t cublasZtpsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const cuDoubleComplex *AP,cuDoubleComplex *x, int incx)

此函数支持64位整数接口。
此函数求解具有单个右侧边的压缩三角线性系统

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.
trans		input	operation op(`A`) that is non- or (conj.) transpose.
diag		input	indicates if the elements on the main diagonal of matrix are unity and should not be accessed.
n		input	number of rows and columns of matrix `A`.
AP	device	input	<type> array with `A` stored in packed format.
x	device	in/out	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or if `trans` != `CUBLAS_OP_N`, `CUBLAS_OP_C`, `CUBLAS_OP_T` or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or `diag` != `CUBLAS_DIAG_UNIT`, `CUBLAS_DIAG_NON_UNIT`
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>tpmv()

cublasStatus_t cublasStpmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const float           *AP,float           *x, int incx)
cublasStatus_t cublasDtpmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const double          *AP,double          *x, int incx)
cublasStatus_t cublasCtpmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const cuComplex       *AP,cuComplex       *x, int incx)
cublasStatus_t cublasZtpmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const cuDoubleComplex *AP,cuDoubleComplex *x, int incx)

此函数支持64位整数接口。
此函数求解具有单个右侧边的压缩三角线性系统

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.
trans		input	operation op(`A`) that is non- or (conj.) transpose.
diag		input	indicates if the elements on the main diagonal of matrix `A` are unity and should not be accessed.
n		input	number of rows and columns of matrix `A`.
AP	device	input	<type> array with � stored in packed format.
x	device	in/out	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n < 0` or if `incx == 0` or if `uplo != CUBLAS_FILL_MODE_UPPER, CUBLAS_FILL_MODE_LOWER` or if `trans != CUBLAS_OP_N, CUBLAS_OP_T, CUBLAS_OP_C` or `diag != CUBLAS_DIAG_UNIT, CUBLAS_DIAG_NON_UNIT`
`CUBLAS_STATUS_ALLOC_FAILED`	the allocation of internal scratch memory failed
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>tbsv()

cublasStatus_t cublasStbsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, int k, const float           *A, int lda,float           *x, int incx)
cublasStatus_t cublasDtbsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, int k, const double          *A, int lda,double          *x, int incx)
cublasStatus_t cublasCtbsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, int k, const cuComplex       *A, int lda,cuComplex       *x, int incx)
cublasStatus_t cublasZtbsv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, int k, const cuDoubleComplex *A, int lda,cuDoubleComplex *x, int incx)

此函数支持64位整数接口。
此函数求解具有单个右侧边的三角带状线性系统

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.
trans		input	operation op(`A`) that is non- or (conj.) transpose.
diag		input	indicates if the elements on the main diagonal of matrix `A` are unity and should not be accessed.
n		input	number of rows and columns of matrix `A`.
k		input	number of sub- and super-diagonals of matrix `A`.
A	device	input	<type> array of dimension `lda x n`, with `lda >= k+1`.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.
x	device	in/out	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or `k` < 0 or if `incx` = 0 or if `trans` != `CUBLAS_OP_N`, `CUBLAS_OP_C`, `CUBLAS_OP_T` or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or if `diag` != `CUBLAS_DIAG_UNIT`, `CUBLAS_DIAG_NON_UNIT` or `lda` < (1 + `k`)
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>tbmv()

cublasStatus_t cublasStbmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, int k, const float           *A, int lda,float           *x, int incx)
cublasStatus_t cublasDtbmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, int k, const double          *A, int lda,double          *x, int incx)
cublasStatus_t cublasCtbmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, int k, const cuComplex       *A, int lda,cuComplex       *x, int incx)
cublasStatus_t cublasZtbmv(cublasHandle_t handle, cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, int k, const cuDoubleComplex *A, int lda,cuDoubleComplex *x, int incx)

此函数支持64位整数接口。
此函数用于执行三角带矩阵向量乘法

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.
trans		input	operation op(`A`) that is non- or (conj.) transpose.
diag		input	indicates if the elements on the main diagonal of matrix `A` are unity and should not be accessed.
n		input	number of rows and columns of matrix `A`.
k		input	number of sub- and super-diagonals of matrix .
A	device	input	<type> array of dimension `lda x n`, with `lda>=k+1`.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.
x	device	in/out	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or `k` < 0 or if `incx` = 0 or if `trans` != `CUBLAS_OP_N`, `CUBLAS_OP_C`, `CUBLAS_OP_T` or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or if `diag` != `CUBLAS_DIAG_UNIT`, `CUBLAS_DIAG_NON_UNIT` or `lda` < (1 + `k`)
`CUBLAS_STATUS_ALLOC_FAILED`	the allocation of internal scratch memory failed
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>syr2()

cublasStatus_t cublasSsyr2(cublasHandle_t handle, cublasFillMode_t uplo, int n,const float           *alpha, const float           *x, int incx,const float           *y, int incy, float           *A, int lda
cublasStatus_t cublasDsyr2(cublasHandle_t handle, cublasFillMode_t uplo, int n,const double          *alpha, const double          *x, int incx,const double          *y, int incy, double          *A, int lda
cublasStatus_t cublasCsyr2(cublasHandle_t handle, cublasFillMode_t uplo, int n,const cuComplex       *alpha, const cuComplex       *x, int incx,const cuComplex       *y, int incy, cuComplex       *A, int lda
cublasStatus_t cublasZsyr2(cublasHandle_t handle, cublasFillMode_t uplo, int n,const cuDoubleComplex *alpha, const cuDoubleComplex *x, int incx,const cuDoubleComplex *y, int incy, cuDoubleComplex *A, int lda

此函数支持64位整数接口。
此函数执行对称秩2更新

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
y	device	input	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.
A	device	in/out	<type> array of dimensions `lda x n`, with `lda>=max(1,n)`.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or `incy` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or if `alpha` == NULL or `lda` < max(1, `n`)
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>syr()

cublasStatus_t cublasSsyr(cublasHandle_t handle, cublasFillMode_t uplo,int n, const float           *alpha,const float           *x, int incx, float           *A, int lda)
cublasStatus_t cublasDsyr(cublasHandle_t handle, cublasFillMode_t uplo,int n, const double          *alpha,const double          *x, int incx, double          *A, int lda)
cublasStatus_t cublasCsyr(cublasHandle_t handle, cublasFillMode_t uplo,int n, const cuComplex       *alpha,const cuComplex       *x, int incx, cuComplex       *A, int lda)
cublasStatus_t cublasZsyr(cublasHandle_t handle, cublasFillMode_t uplo,int n, const cuDoubleComplex *alpha,const cuDoubleComplex *x, int incx, cuDoubleComplex *A, int lda)

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
A	device	in/out	<type> array of dimensions `lda x n`, with `lda>=max(1,n)`.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or if `lda` < max(1, `n`) or `alpha` == NULL
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>symv()

cublasStatus_t cublasSsymv(cublasHandle_t handle, cublasFillMode_t uplo,int n, const float           *alpha,const float           *A, int lda,const float           *x, int incx, const float           *beta,float           *y, int incy)
cublasStatus_t cublasDsymv(cublasHandle_t handle, cublasFillMode_t uplo,int n, const double          *alpha,const double          *A, int lda,const double          *x, int incx, const double          *beta,double          *y, int incy)
cublasStatus_t cublasCsymv(cublasHandle_t handle, cublasFillMode_t uplo,int n, const cuComplex       *alpha, /* host or device pointer */const cuComplex       *A, int lda,const cuComplex       *x, int incx, const cuComplex       *beta,cuComplex       *y, int incy)
cublasStatus_t cublasZsymv(cublasHandle_t handle, cublasFillMode_t uplo,int n, const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *x, int incx, const cuDoubleComplex *beta,cuDoubleComplex *y, int incy)

此函数支持64位整数接口。
此函数执行对称矩阵向量乘法。

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
A	device	input	<type> array of dimension `lda x n` with `lda>=max(1,n)`.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
beta	host or device	input	<type> scalar used for multiplication, if `beta==0` then `y` does not have to be a valid input.
y	device	in/out	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or if `incx` = 0 or `incy` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or `lda` < `n`
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>sbmv()

cublasStatus_t cublasSsbmv(cublasHandle_t handle, cublasFillMode_t uplo,int n, int k, const float  *alpha,const float  *A, int lda,const float  *x, int incx,const float  *beta, float *y, int incy)
cublasStatus_t cublasDsbmv(cublasHandle_t handle, cublasFillMode_t uplo,int n, int k, const double *alpha,const double *A, int lda,const double *x, int incx,const double *beta, double *y, int incy)

This function supports the 64-bit Integer Interface.

This function performs the symmetric banded matrix-vector multiplication

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
uplo		input	indicates if matrix `A` lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.
n		input	number of rows and columns of matrix `A`.
k		input	number of sub- and super-diagonals of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
A	device	input	<type> array of dimension `lda x n` with `\lda >= k+1`.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.
x	device	input	<type> vector with `n` elements.
incx		input	stride between consecutive elements of `x`.
beta	host or device	input	<type> scalar used for multiplication, if `beta==0` then `y` does not have to be a valid input.
y	device	in/out	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `n` < 0 or `k` < 0 or if `incx` = 0 or `incy` = 0 or if `uplo` != `CUBLAS_FILL_MODE_LOWER`, `CUBLAS_FILL_MODE_UPPER` or if `alpha` == NULL or `beta` == NULL or `lda` < (1 + `k`)
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>ger()

cublasStatus_t  cublasSger(cublasHandle_t handle, int m, int n,const float           *alpha,const float           *x, int incx,const float           *y, int incy,float           *A, int lda)
cublasStatus_t  cublasDger(cublasHandle_t handle, int m, int n,const double          *alpha,const double          *x, int incx,const double          *y, int incy,double          *A, int lda)
cublasStatus_t cublasCgeru(cublasHandle_t handle, int m, int n,const cuComplex       *alpha,const cuComplex       *x, int incx,const cuComplex       *y, int incy,cuComplex       *A, int lda)
cublasStatus_t cublasCgerc(cublasHandle_t handle, int m, int n,const cuComplex       *alpha,const cuComplex       *x, int incx,const cuComplex       *y, int incy,cuComplex       *A, int lda)
cublasStatus_t cublasZgeru(cublasHandle_t handle, int m, int n,const cuDoubleComplex *alpha,const cuDoubleComplex *x, int incx,const cuDoubleComplex *y, int incy,cuDoubleComplex *A, int lda)
cublasStatus_t cublasZgerc(cublasHandle_t handle, int m, int n,const cuDoubleComplex *alpha,const cuDoubleComplex *x, int incx,const cuDoubleComplex *y, int incy,cuDoubleComplex *A, int lda)

This function supports the 64-bit Integer Interface.

This function performs the rank-1 update

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
m		input	number of rows of matrix `A`.
n		input	number of columns of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
x	device	input	<type> vector with `m` elements.
incx		input	stride between consecutive elements of `x`.
y	device	input	<type> vector with `n` elements.
incy		input	stride between consecutive elements of `y`.
A	device	in/out	<type> array of dimension `lda x n` with `lda >= max(1,m)`.
lda		input	leading dimension of two-dimensional array used to store matrix `A`.

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	If `m` < 0 or `n` < 0 if `incx` = 0 or `incy` = 0 or if `alpha` == NULL or `lda` < max(1, `m`)
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cublas<t>gemv()

cublasStatus_t cublasSgemv(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const float           *alpha,const float           *A, int lda,const float           *x, int incx,const float           *beta,float           *y, int incy)
cublasStatus_t cublasDgemv(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const double          *alpha,const double          *A, int lda,const double          *x, int incx,const double          *beta,double          *y, int incy)
cublasStatus_t cublasCgemv(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const cuComplex       *alpha,const cuComplex       *A, int lda,const cuComplex       *x, int incx,const cuComplex       *beta,cuComplex       *y, int incy)
cublasStatus_t cublasZgemv(cublasHandle_t handle, cublasOperation_t trans,int m, int n,const cuDoubleComplex *alpha,const cuDoubleComplex *A, int lda,const cuDoubleComplex *x, int incx,const cuDoubleComplex *beta,cuDoubleComplex *y, int incy)

Param.	Memory	In/out	Meaning
handle		input	handle to the cuBLAS library context.
trans		input	operation op(`A`) that is non- or (conj.) transpose.
m		input	number of rows of matrix `A`.
n		input	number of columns of matrix `A`.
alpha	host or device	input	<type> scalar used for multiplication.
A	device	input	<type> array of dimension `lda x n` with `lda >= max(1,m)`. Before entry, the leading `m` by `n` part of the array `A` must contain the matrix of coefficients. Unchanged on exit.
lda		input	leading dimension of two-dimensional array used to store matrix `A`. `lda` must be at least `max(1,m)`.
x	device	input	<type> vector at least `(1+(n-1)abs(incx))` elements if `transa==CUBLAS_OP_N` and at least `(1+(m-1)abs(incx))` elements otherwise.
incx		input	stride between consecutive elements of `x`.
beta	host or device	input	<type> scalar used for multiplication, if `beta==0` then `y` does not have to be a valid input.
y	device	in/out	<type> vector at least `(1+(m-1)abs(incy))` elements if `transa==CUBLAS_OP_N` and at least `(1+(n-1)abs(incy))` elements otherwise.
incy		input	stride between consecutive elements of `y`

The possible error values returned by this function and their meanings are listed below.

Error Value	Meaning
`CUBLAS_STATUS_SUCCESS`	the operation completed successfully
`CUBLAS_STATUS_NOT_INITIALIZED`	the library was not initialized
`CUBLAS_STATUS_INVALID_VALUE`	the parameters `m,n<0` or `incx,incy=0`
`CUBLAS_STATUS_EXECUTION_FAILED`	the function failed to launch on the GPU

cuBLAS使用(3)相关推荐

TensorRT was linked against cuBLAS/cuBLAS LT 11.2.0 but loaded cuBLAS/cuBLAS
TensorRT was linked against cuBLAS/cuBLAS LT 11.2.0 but loaded cuBLAS/cuBLAS LT 10.2 原因: TensorRT和cu ...
直播报名 | CUDA优化：高性能库cuBLAS使用指南
NVIDIA cuBLAS 库是标准基本线性代数子程序(Basic Linear Algebra Subroutines)的 GPU 加速库.使用 cuBLAS API,您可以通过将密集型计算部署到单 ...
错误调试：GPU 版 TensorFlow failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
如果你是使用 GPU 版 TensorFlow 的话,并且你想在显卡高占用率的情况下(比如玩游戏)训练模型,那你要注意在初始化 Session 的时候为其分配固定数量的显存,否则可能会在开始训练的时候 ...
解决Keras的failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED、attempting to perform BLAS operat
解决Keras的failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED.attempting to perform BLAS operat ...
cublas 的学习笔记_1
最近开始接触cublas,为了监督自己的学习,并希望得到其他朋友的指点,特地将自己的学习笔记写出来 1. 参考文档 CUBLAS_Library_2.1.pdf > 2. 环境配置 1)添加头 ...
使用cublas实现矩阵乘法
使用CUDA写一个矩阵乘法C = A X B(矩阵维度:A: M X K, B: K X N, C: M X N),当然可以自己写核函数,但效率不如CUDA自带的cublas算法效率高.使用cubla ...
7.cuBLAS开发指南中文版--cuBLAS中的cublasSetVector()和cublasGetVector()
cuBLAS中的cublasSetVector()和cublasGetVector() 2.4.9. cublasGetStream() cublasStatus_t cublasGetStream( ...
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublas‘
调用nn.linear时出现RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublas'错误,搜索网上资料 ...
5.cuBLAS开发指南中文版--cuBLAS中的Create()和Destroy()
cuBLAS中的Create()和Destroy() 2.4.1. cublasCreate() cublasStatus_t cublasCreate(cublasHandle_t *handle) ...
玩玩CUBLAS(1)——hello cublas
转载请注明出处:http://blog.csdn.net/bendanban/article/details/8891274 /*=================================== ...

cuBLAS使用(3)

cublas<t>gbmv()

cublas<t>spr()

cublas<t>spr2()

cublas<t>spmv()

cublas<t>spr()

cublas<t>gemvStridedBatched()

cublas<t>gemvBatched()

cublas<t>hpr()

cublas<t>her2()

cublas<t>hpmv()

cublas<t>hbmv()

cublas<t>hemv()

cublas<t>trsv()

cublas<t>trmv()

cublas<t>tpsv()

cublas<t>tpmv()

cublas<t>tbsv()

cublas<t>tbmv()

cublas<t>syr2()

cublas<t>syr()

cublas<t>symv()

cublas<t>sbmv()

cublas<t>ger()

cublas<t>gemv()

cuBLAS使用(3)相关推荐

最新文章

热门文章

cuBLAS使用(3)

cublas<t>gbmv()

cublas<t>spr()

​​​​​​cublas<t>spr2()

cublas<t>spmv()

cublas<t>spr()

cublas<t>gemvStridedBatched()

cublas<t>gemvBatched()

cublas<t>hpr()

cublas<t>her2()

cublas<t>hpmv()

cublas<t>hbmv()

cublas<t>hemv()

cublas<t>trsv()

cublas<t>trmv()

cublas<t>tpsv()

cublas<t>tpmv()

cublas<t>tbsv()

cublas<t>tbmv()

cublas<t>syr2()

cublas<t>syr()

cublas<t>symv()

cublas<t>sbmv()

cublas<t>ger()

cublas<t>gemv()

cuBLAS使用(3)相关推荐

最新文章

热门文章

cublas<t>spr2()