ROCm Libraries

rocBLAS

Please refer rocBLAS Github link

A BLAS implementation on top of AMD’s Radeon Open Compute ROCm runtime and toolchains. rocBLAS is implemented in the HIP programming language and optimized for AMD’s latest discrete GPUs.

Prerequisites

  • A ROCm enabled platform, more information here.

  • Base software stack, which includes * HIP

Installing pre-built packages

Download pre-built packages either from ROCm’s package servers or by clicking the github releases tab and manually downloading, which could be newer. Release notes are available for each release on the releases tab.

sudo apt update && sudo apt install rocblas

Quickstart rocBLAS build

Bash helper build script (Ubuntu only)

The root of this repository has a helper bash script install.sh to build and install rocBLAS on Ubuntu with a single command. It does not take a lot of options and hard-codes configuration that can be specified through invoking cmake directly, but it’s a great way to get started quickly and can serve as an example of how to build/install. A few commands in the script need sudo access, so it may prompt you for a password.

./install -h -- shows help
./install -id -- build library, build dependencies and install (-d flag only needs to be passed once on a system)

Manual build (all supported platforms)

If you use a distro other than Ubuntu, or would like more control over the build process, the rocblas build wiki has helpful information on how to configure cmake and manually build.

Functions supported

A list of exported functions. from rocblas can be found on the wiki.

rocBLAS interface examples

In general, the rocBLAS interface is compatible with CPU oriented Netlib BLAS and the cuBLAS-v2 API, with the explicit exception that traditional BLAS interfaces do not accept handles. The cuBLAS’ cublasHandle_t is replaced with rocblas_handle everywhere. Thus, porting a CUDA application which originally calls the cuBLAS API to a HIP application calling rocBLAS API should be relatively straightforward. For example, the rocBLAS SGEMV interface is

GEMV API

rocblas_status
rocblas_sgemv(rocblas_handle handle,
              rocblas_operation trans,
              rocblas_int m, rocblas_int n,
              const float* alpha,
              const float* A, rocblas_int lda,
              const float* x, rocblas_int incx,
              const float* beta,
              float* y, rocblas_int incy);

Batched and strided GEMM API

rocBLAS GEMM can process matrices in batches with regular strides. There are several permutations of these API’s, the following is an example that takes everything

rocblas_status
rocblas_sgemm_strided_batched(
    rocblas_handle handle,
    rocblas_operation transa, rocblas_operation transb,
    rocblas_int m, rocblas_int n, rocblas_int k,
    const float* alpha,
    const float* A, rocblas_int ls_a, rocblas_int ld_a, rocblas_int bs_a,
    const float* B, rocblas_int ls_b, rocblas_int ld_b, rocblas_int bs_b,
    const float* beta,
          float* C, rocblas_int ls_c, rocblas_int ld_c, rocblas_int bs_c,
    rocblas_int batch_count )

rocBLAS assumes matrices A and vectors x, y are allocated in GPU memory space filled with data. Users are responsible for copying data from/to the host and device memory. HIP provides memcpy style API’s to facilitate data management.

Asynchronous API

Except a few routines (like TRSM) having memory allocation inside preventing asynchronicity, most of the library routines (like BLAS-1 SCAL, BLAS-2 GEMV, BLAS-3 GEMM) are configured to operate in asynchronous fashion with respect to CPU, meaning these library functions return immediately.

For more information regarding rocBLAS library and corresponding API documentation, refer rocBLAS

API

This section provides details of the library API

Types

Definitions
rocblas_int
typedef int32_t rocblas_int

To specify whether int32 or int64 is used.

rocblas_long
typedef int64_t rocblas_long
rocblas_float_complex
typedef float2 rocblas_float_complex
rocblas_double_complex
typedef double2 rocblas_double_complex
rocblas_half
typedef uint16_t rocblas_half
rocblas_half_complex
typedef float2 rocblas_half_complex
rocblas_handle
typedef struct _rocblas_handle *rocblas_handle
Enums

Enumeration constants have numbering that is consistent with CBLAS, ACML and most standard C BLAS libraries.

rocblas_operation
enum rocblas_operation

Used to specify whether the matrix is to be transposed or not.

parameter constants. numbering is consistent with CBLAS, ACML and most standard C BLAS libraries

Values:

rocblas_operation_none = 111

Operate with the matrix.

rocblas_operation_transpose = 112

Operate with the transpose of the matrix.

rocblas_operation_conjugate_transpose = 113

Operate with the conjugate transpose of the matrix.

rocblas_fill
enum rocblas_fill

Used by the Hermitian, symmetric and triangular matrix routines to specify whether the upper or lower triangle is being referenced.

Values:

rocblas_fill_upper = 121

Upper triangle.

rocblas_fill_lower = 122

Lower triangle.

rocblas_fill_full = 123
rocblas_diagonal
enum rocblas_diagonal

It is used by the triangular matrix routines to specify whether the matrix is unit triangular.

Values:

rocblas_diagonal_non_unit = 131

Non-unit triangular.

rocblas_diagonal_unit = 132

Unit triangular.

rocblas_side
enum rocblas_side

Indicates the side matrix A is located relative to matrix B during multiplication.

Values:

rocblas_side_left = 141

Multiply general matrix by symmetric, Hermitian or triangular matrix on the left.

rocblas_side_right = 142

Multiply general matrix by symmetric, Hermitian or triangular matrix on the right.

rocblas_side_both = 143
rocblas_status
enum rocblas_status

rocblas status codes definition

Values:

rocblas_status_success = 0

success

rocblas_status_invalid_handle = 1

handle not initialized, invalid or null

rocblas_status_not_implemented = 2

function is not implemented

rocblas_status_invalid_pointer = 3

invalid pointer parameter

rocblas_status_invalid_size = 4

invalid size parameter

rocblas_status_memory_error = 5

failed internal memory allocation, copy or dealloc

rocblas_status_internal_error = 6

other internal library failure

rocblas_datatype
enum rocblas_datatype

Indicates the precision width of data stored in a blas type.

Values:

rocblas_datatype_f16_r = 150
rocblas_datatype_f32_r = 151
rocblas_datatype_f64_r = 152
rocblas_datatype_f16_c = 153
rocblas_datatype_f32_c = 154
rocblas_datatype_f64_c = 155
rocblas_datatype_i8_r = 160
rocblas_datatype_u8_r = 161
rocblas_datatype_i32_r = 162
rocblas_datatype_u32_r = 163
rocblas_datatype_i8_c = 164
rocblas_datatype_u8_c = 165
rocblas_datatype_i32_c = 166
rocblas_datatype_u32_c = 167
rocblas_pointer_mode
enum rocblas_pointer_mode

Indicates the pointer is device pointer or host pointer.

Values:

rocblas_pointer_mode_host = 0
rocblas_pointer_mode_device = 1
rocblas_layer_mode
enum rocblas_layer_mode

Indicates if layer is active with bitmask.

Values:

rocblas_layer_mode_none = 0b0000000000
rocblas_layer_mode_log_trace = 0b0000000001
rocblas_layer_mode_log_bench = 0b0000000010
rocblas_layer_mode_log_profile = 0b0000000100
rocblas_gemm_algo
enum rocblas_gemm_algo

Indicates if layer is active with bitmask.

Values:

rocblas_gemm_algo_standard = 0b0000000000

Functions

Level 1 BLAS
rocblas_<type>scal()
rocblas_status rocblas_dscal(rocblas_handle handle, rocblas_int n, const double *alpha, double *x, rocblas_int incx)
rocblas_status rocblas_sscal(rocblas_handle handle, rocblas_int n, const float *alpha, float *x, rocblas_int incx)

BLAS Level 1 API.

scal scal the vector x[i] with scalar alpha, for i = 1 , … , n

x := alpha * x ,

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] alpha: specifies the scalar alpha.

  • [inout] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

rocblas_<type>copy()
rocblas_status rocblas_dcopy(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, double *y, rocblas_int incy)
rocblas_status rocblas_scopy(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, float *y, rocblas_int incy)

BLAS Level 1 API.

copy copies the vector x into the vector y, for i = 1 , … , n

y := x,

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

  • [out] y: pointer storing vector y on the GPU.

  • [in] incy: rocblas_int specifies the increment for the elements of y.

rocblas_<type>dot()
rocblas_status rocblas_ddot(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, const double *y, rocblas_int incy, double *result)
rocblas_status rocblas_sdot(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, const float *y, rocblas_int incy, float *result)

BLAS Level 1 API.

dot(u) perform dot product of vector x and y

result = x * y;

dotc perform dot product of complex vector x and complex y

result = conjugate (x) * y;

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of y.

  • [inout] result: store the dot product. either on the host CPU or device GPU. return is 0.0 if n <= 0.

rocblas_<type>swap()
rocblas_status rocblas_sswap(rocblas_handle handle, rocblas_int n, float *x, rocblas_int incx, float *y, rocblas_int incy)

BLAS Level 1 API.

swap interchange vector x[i] and y[i], for i = 1 , … , n

y := x; x := y

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [inout] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

  • [inout] y: pointer storing vector y on the GPU.

  • [in] incy: rocblas_int specifies the increment for the elements of y.

rocblas_status rocblas_dswap(rocblas_handle handle, rocblas_int n, double *x, rocblas_int incx, double *y, rocblas_int incy)
rocblas_<type>axpy()
rocblas_status rocblas_daxpy(rocblas_handle handle, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, double *y, rocblas_int incy)
rocblas_status rocblas_saxpy(rocblas_handle handle, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, float *y, rocblas_int incy)
rocblas_status rocblas_haxpy(rocblas_handle handle, rocblas_int n, const rocblas_half *alpha, const rocblas_half *x, rocblas_int incx, rocblas_half *y, rocblas_int incy)

BLAS Level 1 API.

axpy compute y := alpha * x + y

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] alpha: specifies the scalar alpha.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of x.

  • [out] y: pointer storing vector y on the GPU.

  • [inout] incy: rocblas_int specifies the increment for the elements of y.

rocblas_<type>asum()
rocblas_status rocblas_dasum(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, double *result)
rocblas_status rocblas_sasum(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, float *result)

BLAS Level 1 API.

asum computes the sum of the magnitudes of elements of a real vector x, or the sum of magnitudes of the real and imaginary parts of elements if x is a complex vector

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of y.

  • [inout] result: store the asum product. either on the host CPU or device GPU. return is 0.0 if n, incx<=0.

rocblas_<type>nrm2()
rocblas_status rocblas_dnrm2(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, double *result)
rocblas_status rocblas_snrm2(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, float *result)

BLAS Level 1 API.

nrm2 computes the euclidean norm of a real or complex vector := sqrt( x’*x ) for real vector := sqrt( x**H*x ) for complex vector

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of y.

  • [inout] result: store the nrm2 product. either on the host CPU or device GPU. return is 0.0 if n, incx<=0.

rocblas_i<type>amax()
rocblas_status rocblas_idamax(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, rocblas_int *result)
rocblas_status rocblas_isamax(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, rocblas_int *result)

BLAS Level 1 API.

amax finds the first index of the element of maximum magnitude of real vector x or the sum of magnitude of the real and imaginary parts of elements if x is a complex vector

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of y.

  • [inout] result: store the amax index. either on the host CPU or device GPU. return is 0.0 if n, incx<=0.

rocblas_i<type>amin()
rocblas_status rocblas_idamin(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, rocblas_int *result)
rocblas_status rocblas_isamin(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, rocblas_int *result)

BLAS Level 1 API.

amin finds the first index of the element of minimum magnitude of real vector x or the sum of magnitude of the real and imaginary parts of elements if x is a complex vector

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of y.

  • [inout] result: store the amin index. either on the host CPU or device GPU. return is 0.0 if n, incx<=0.

Level 2 BLAS
rocblas_<type>gemv()
rocblas_status rocblas_dgemv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const double *alpha, const double *A, rocblas_int lda, const double *x, rocblas_int incx, const double *beta, double *y, rocblas_int incy)
rocblas_status rocblas_sgemv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const float *alpha, const float *A, rocblas_int lda, const float *x, rocblas_int incx, const float *beta, float *y, rocblas_int incy)

BLAS Level 2 API.

xGEMV performs one of the matrix-vector operations

y := alpha*A*x    + beta*y,   or
y := alpha*A**T*x + beta*y,   or
y := alpha*A**H*x + beta*y,

where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] trans: rocblas_operation

  • [in] m: rocblas_int

  • [in] n: rocblas_int

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

  • [in] beta: specifies the scalar beta.

  • [out] y: pointer storing vector y on the GPU.

  • [in] incy: rocblas_int specifies the increment for the elements of y.

rocblas_<type>trsv()
rocblas_status rocblas_dtrsv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int m, const double *A, rocblas_int lda, double *x, rocblas_int incx)
rocblas_status rocblas_strsv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int m, const float *A, rocblas_int lda, float *x, rocblas_int incx)

BLAS Level 2 API.

trsv solves

 A*x = alpha*b or A**T*x = alpha*b,

where x and b are vectors and A is a triangular matrix.

The vector x is overwritten on b.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] uplo: rocblas_fill. rocblas_fill_upper: A is an upper triangular matrix. rocblas_fill_lower: A is a lower triangular matrix.

  • [in] transA: rocblas_operation

  • [in] diag: rocblas_diagonal. rocblas_diagonal_unit: A is assumed to be unit triangular. rocblas_diagonal_non_unit: A is not assumed to be unit triangular.

  • [in] m: rocblas_int m specifies the number of rows of b. m >= 0.

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing matrix A on the GPU, of dimension ( lda, m )

  • [in] lda: rocblas_int specifies the leading dimension of A. lda = max( 1, m ).

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

rocblas_<type>ger()
rocblas_status rocblas_dger(rocblas_handle handle, rocblas_int m, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, const double *y, rocblas_int incy, double *A, rocblas_int lda)
rocblas_status rocblas_sger(rocblas_handle handle, rocblas_int m, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, const float *y, rocblas_int incy, float *A, rocblas_int lda)

BLAS Level 2 API.

xHE(SY)MV performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n Hermitian(Symmetric) matrix.

BLAS Level 2 API

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] uplo: rocblas_fill. specifies whether the upper or lower

  • [in] n: rocblas_int.

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

  • [in] beta: specifies the scalar beta.

  • [out] y: pointer storing vector y on the GPU.

  • [in] incy: rocblas_int specifies the increment for the elements of y.

xGER performs the matrix-vector operations

A := A + alpha*x*y**T

where alpha is a scalars, x and y are vectors, and A is an m by n matrix.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] m: rocblas_int

  • [in] n: rocblas_int

  • [in] alpha: specifies the scalar alpha.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of x.

  • [in] y: pointer storing vector y on the GPU.

  • [in] incy: rocblas_int specifies the increment for the elements of y.

  • [inout] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

rocblas_<type>syr()
rocblas_status rocblas_dsyr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, double *A, rocblas_int lda)
rocblas_status rocblas_ssyr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, float *A, rocblas_int lda)

BLAS Level 2 API.

xSYR performs the matrix-vector operations

A := A + alpha*x*x**T

where alpha is a scalars, x is a vector, and A is an n by n symmetric matrix.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int

  • [in] alpha: specifies the scalar alpha.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of x.

  • [inout] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

Level 3 BLAS
rocblas_<type>trtri_batched()
rocblas_status rocblas_dtrtri_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_diagonal diag, rocblas_int n, const double *A, rocblas_int lda, rocblas_int stride_a, double *invA, rocblas_int ldinvA, rocblas_int bsinvA, rocblas_int batch_count)
rocblas_status rocblas_strtri_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_diagonal diag, rocblas_int n, const float *A, rocblas_int lda, rocblas_int stride_a, float *invA, rocblas_int ldinvA, rocblas_int bsinvA, rocblas_int batch_count)

BLAS Level 3 API.

trtri compute the inverse of a matrix A

inv(A);

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] uplo: rocblas_fill. specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

  • [in] diag: rocblas_diagonal. = ‘rocblas_diagonal_non_unit’, A is non-unit triangular; = ‘rocblas_diagonal_unit’, A is unit triangular;

  • [in] n: rocblas_int.

  • [in] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] stride_a: rocblas_int “batch stride a”: stride from the start of one “A” matrix to the next

rocblas_<type>trsm()
rocblas_status rocblas_dtrsm(rocblas_handle handle, rocblas_side side, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int m, rocblas_int n, const double *alpha, const double *A, rocblas_int lda, double *B, rocblas_int ldb)
rocblas_status rocblas_strsm(rocblas_handle handle, rocblas_side side, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int m, rocblas_int n, const float *alpha, const float *A, rocblas_int lda, float *B, rocblas_int ldb)

BLAS Level 3 API.

trsm solves

op(A)*X = alpha*B or  X*op(A) = alpha*B,

where alpha is a scalar, X and B are m by n matrices, A is triangular matrix and op(A) is one of

op( A ) = A   or   op( A ) = A^T   or   op( A ) = A^H.

The matrix X is overwritten on B.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] side: rocblas_side. rocblas_side_left: op(A)*X = alpha*B. rocblas_side_right: X*op(A) = alpha*B.

  • [in] uplo: rocblas_fill. rocblas_fill_upper: A is an upper triangular matrix. rocblas_fill_lower: A is a lower triangular matrix.

  • [in] transA: rocblas_operation. transB: op(A) = A. rocblas_operation_transpose: op(A) = A^T. rocblas_operation_conjugate_transpose: op(A) = A^H.

  • [in] diag: rocblas_diagonal. rocblas_diagonal_unit: A is assumed to be unit triangular. rocblas_diagonal_non_unit: A is not assumed to be unit triangular.

  • [in] m: rocblas_int. m specifies the number of rows of B. m >= 0.

  • [in] n: rocblas_int. n specifies the number of columns of B. n >= 0.

  • [in] alpha: alpha specifies the scalar alpha. When alpha is &zero then A is not referenced and B need not be set before entry.

  • [in] A: pointer storing matrix A on the GPU. of dimension ( lda, k ), where k is m when rocblas_side_left and is n when rocblas_side_right only the upper/lower triangular part is accessed.

  • [in] lda: rocblas_int. lda specifies the first dimension of A. if side = rocblas_side_left, lda >= max( 1, m ), if side = rocblas_side_right, lda >= max( 1, n ).

rocblas_<type>gemm()
rocblas_status rocblas_dgemm(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const double *alpha, const double *A, rocblas_int lda, const double *B, rocblas_int ldb, const double *beta, double *C, rocblas_int ldc)
rocblas_status rocblas_sgemm(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const float *alpha, const float *A, rocblas_int lda, const float *B, rocblas_int ldb, const float *beta, float *C, rocblas_int ldc)
rocblas_status rocblas_hgemm(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const rocblas_half *alpha, const rocblas_half *A, rocblas_int lda, const rocblas_half *B, rocblas_int ldb, const rocblas_half *beta, rocblas_half *C, rocblas_int ldc)

BLAS Level 3 API.

xGEMM performs one of the matrix-matrix operations

C = alpha*op( A )*op( B ) + beta*C,

where op( X ) is one of

op( X ) = X      or
op( X ) = X**T   or
op( X ) = X**H,

alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.

Parameters
  • [in] handle: rocblas_handle, handle to the rocblas library context queue.

  • [in] transA: rocblas_operation, specifies the form of op( A )

  • [in] transB: rocblas_operation, specifies the form of op( B )

  • [in] m: rocblas_int, number or rows of matrices op( A ) and C

  • [in] n: rocblas_int, number of columns of matrices op( B ) and C

  • [in] k: rocblas_int, number of columns of matrix op( A ) and number of rows of matrix op( B )

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int, specifies the leading dimension of A.

  • [in] B: pointer storing matrix B on the GPU.

  • [in] ldb: rocblas_int, specifies the leading dimension of B.

  • [in] beta: specifies the scalar beta.

  • [inout] C: pointer storing matrix C on the GPU.

  • [in] ldc: rocblas_int, specifies the leading dimension of C.

rocblas_<type>gemm_strided_batched()
rocblas_status rocblas_dgemm_strided_batched(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const double *alpha, const double *A, rocblas_int lda, rocblas_int stride_a, const double *B, rocblas_int ldb, rocblas_int stride_b, const double *beta, double *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)
rocblas_status rocblas_sgemm_strided_batched(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const float *alpha, const float *A, rocblas_int lda, rocblas_int stride_a, const float *B, rocblas_int ldb, rocblas_int stride_b, const float *beta, float *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)
rocblas_status rocblas_hgemm_strided_batched(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const rocblas_half *alpha, const rocblas_half *A, rocblas_int lda, rocblas_int stride_a, const rocblas_half *B, rocblas_int ldb, rocblas_int stride_b, const rocblas_half *beta, rocblas_half *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)

BLAS Level 3 API.

xGEMM_STRIDED_BATCHED performs one of the strided batched matrix-matrix operations

C[i*stride_c] = alpha*op( A[i*stride_a] )*op( B[i*stride_b] ) + beta*C[i*stride_c], for i in
[0,batch_count-1]

where op( X ) is one of

op( X ) = X      or
op( X ) = X**T   or
op( X ) = X**H,

alpha and beta are scalars, and A, B and C are strided batched matrices, with op( A ) an m by k by batch_count strided_batched matrix, op( B ) an k by n by batch_count strided_batched matrix and C an m by n by batch_count strided_batched matrix.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] transA: rocblas_operation specifies the form of op( A )

  • [in] transB: rocblas_operation specifies the form of op( B )

  • [in] m: rocblas_int. matrix dimention m.

  • [in] n: rocblas_int. matrix dimention n.

  • [in] k: rocblas_int. matrix dimention k.

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing strided batched matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of “A”.

  • [in] stride_a: rocblas_int stride from the start of one “A” matrix to the next

  • [in] B: pointer storing strided batched matrix B on the GPU.

  • [in] ldb: rocblas_int specifies the leading dimension of “B”.

  • [in] stride_b: rocblas_int stride from the start of one “B” matrix to the next

  • [in] beta: specifies the scalar beta.

  • [inout] C: pointer storing strided batched matrix C on the GPU.

  • [in] ldc: rocblas_int specifies the leading dimension of “C”.

  • [in] stride_c: rocblas_int stride from the start of one “C” matrix to the next

  • [in] batch_count: rocblas_int number of gemm operatons in the batch

rocblas_<type>gemm_kernel_name()
rocblas_status rocblas_dgemm_kernel_name(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const double *alpha, const double *A, rocblas_int lda, rocblas_int stride_a, const double *B, rocblas_int ldb, rocblas_int stride_b, const double *beta, double *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)
rocblas_status rocblas_sgemm_kernel_name(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const float *alpha, const float *A, rocblas_int lda, rocblas_int stride_a, const float *B, rocblas_int ldb, rocblas_int stride_b, const float *beta, float *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)
rocblas_status rocblas_hgemm_kernel_name(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const rocblas_half *alpha, const rocblas_half *A, rocblas_int lda, rocblas_int stride_a, const rocblas_half *B, rocblas_int ldb, rocblas_int stride_b, const rocblas_half *beta, rocblas_half *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)
rocblas_<type>geam()
rocblas_status rocblas_dgeam(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, const double *alpha, const double *A, rocblas_int lda, const double *beta, const double *B, rocblas_int ldb, double *C, rocblas_int ldc)
rocblas_status rocblas_sgeam(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, const float *alpha, const float *A, rocblas_int lda, const float *beta, const float *B, rocblas_int ldb, float *C, rocblas_int ldc)

BLAS Level 3 API.

xGEAM performs one of the matrix-matrix operations

C = alpha*op( A ) + beta*op( B ),

where op( X ) is one of

op( X ) = X      or
op( X ) = X**T   or
op( X ) = X**H,

alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by n matrix, op( B ) an m by n matrix, and C an m by n matrix.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] transA: rocblas_operation specifies the form of op( A )

  • [in] transB: rocblas_operation specifies the form of op( B )

  • [in] m: rocblas_int.

  • [in] n: rocblas_int.

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] beta: specifies the scalar beta.

  • [in] B: pointer storing matrix B on the GPU.

  • [in] ldb: rocblas_int specifies the leading dimension of B.

  • [inout] C: pointer storing matrix C on the GPU.

  • [in] ldc: rocblas_int specifies the leading dimension of C.

BLAS Extensions
rocblas_gemm_ex()
rocblas_status rocblas_gemm_ex(rocblas_handle handle, rocblas_operation trans_a, rocblas_operation trans_b, rocblas_int m, rocblas_int n, rocblas_int k, const void *alpha, const void *a, rocblas_datatype a_type, rocblas_int lda, const void *b, rocblas_datatype b_type, rocblas_int ldb, const void *beta, const void *c, rocblas_datatype c_type, rocblas_int ldc, void *d, rocblas_datatype d_type, rocblas_int ldd, rocblas_datatype compute_type, rocblas_gemm_algo algo, int32_t solution_index, uint32_t flags, size_t *workspace_size, void *workspace)
rocblas_gemm_strided_batched_ex()
rocblas_status rocblas_gemm_strided_batched_ex(rocblas_handle handle, rocblas_operation trans_a, rocblas_operation trans_b, rocblas_int m, rocblas_int n, rocblas_int k, const void *alpha, const void *a, rocblas_datatype a_type, rocblas_int lda, rocblas_long stride_a, const void *b, rocblas_datatype b_type, rocblas_int ldb, rocblas_long stride_b, const void *beta, const void *c, rocblas_datatype c_type, rocblas_int ldc, rocblas_long stride_c, void *d, rocblas_datatype d_type, rocblas_int ldd, rocblas_long stride_d, rocblas_int batch_count, rocblas_datatype compute_type, rocblas_gemm_algo algo, int32_t solution_index, uint32_t flags, size_t *workspace_size, void *workspace)

BLAS EX API.

GEMM_EX performs one of the matrix-matrix operations

D = alpha*op( A )*op( B ) + beta*C,

where op( X ) is one of

op( X ) = X      or
op( X ) = X**T   or
op( X ) = X**H,

alpha and beta are scalars, and A, B, C, and D are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C and D are m by n matrices.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] transA: rocblas_operation specifies the form of op( A )

  • [in] transB: rocblas_operation specifies the form of op( B )

  • [in] m: rocblas_int. matrix dimension m

  • [in] n: rocblas_int. matrix dimension n

  • [in] k: rocblas_int. matrix dimension k

  • [in] alpha: const void * specifies the scalar alpha. Same datatype as compute_type.

  • [in] a: void * pointer storing matrix A on the GPU.

  • [in] a_type: rocblas_datatype specifies the datatype of matrix A

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] b: void * pointer storing matrix B on the GPU.

  • [in] b_type: rocblas_datatype specifies the datatype of matrix B

  • [in] ldb: rocblas_int specifies the leading dimension of B.

  • [in] beta: const void * specifies the scalar beta. Same datatype as compute_type.

  • [in] c: void * pointer storing matrix C on the GPU.

  • [in] c_type: rocblas_datatype specifies the datatype of matrix C

  • [in] ldc: rocblas_int specifies the leading dimension of C.

  • [out] d: void * pointer storing matrix D on the GPU.

  • [in] d_type: rocblas_datatype specifies the datatype of matrix D

  • [in] ldd: rocblas_int specifies the leading dimension of D.

  • [in] compute_type: rocblas_datatype specifies the datatype of computation

  • [in] algo: rocblas_gemm_algo enumerant specifying the algorithm type.

  • [in] solution_index: int32_t reserved for future use

  • [in] flags: uint32_t reserved for future use

Build Information
rocblas_get_version_string()
rocblas_status rocblas_get_version_string(char *buf, size_t len)

BLAS EX API.

GEMM_STRIDED_BATCHED_EX performs one of the strided_batched matrix-matrix operations

D[i*stride_d] = alpha*op(A[i*stride_a])*op(B[i*stride_b]) + beta*C[i*stride_c], for i in
[0,batch_count-1]

where op( X ) is one of

op( X ) = X      or
op( X ) = X**T   or
op( X ) = X**H,

alpha and beta are scalars, and A, B, C, and D are strided_batched matrices, with op( A ) an m by k by batch_count strided_batched matrix, op( B ) a k by n by batch_count strided_batched matrix and C and D are m by n by batch_count strided_batched matrices.

The strided_batched matrices are multiple matrices separated by a constant stride. The number of matrices is batch_count.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] transA: rocblas_operation specifies the form of op( A )

  • [in] transB: rocblas_operation specifies the form of op( B )

  • [in] m: rocblas_int. matrix dimension m

  • [in] n: rocblas_int. matrix dimension n

  • [in] k: rocblas_int. matrix dimension k

  • [in] alpha: const void * specifies the scalar alpha. Same datatype as compute_type.

  • [in] a: void * pointer storing matrix A on the GPU.

  • [in] a_type: rocblas_datatype specifies the datatype of matrix A

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] stride_a: rocblas_long specifies stride from start of one “A” matrix to the next

  • [in] b: void * pointer storing matrix B on the GPU.

  • [in] b_type: rocblas_datatype specifies the datatype of matrix B

  • [in] ldb: rocblas_int specifies the leading dimension of B.

  • [in] stride_b: rocblas_long specifies stride from start of one “B” matrix to the next

  • [in] beta: const void * specifies the scalar beta. Same datatype as compute_type.

  • [in] c: void * pointer storing matrix C on the GPU.

  • [in] c_type: rocblas_datatype specifies the datatype of matrix C

  • [in] ldc: rocblas_int specifies the leading dimension of C.

  • [in] stride_c: rocblas_long specifies stride from start of one “C” matrix to the next

  • [out] d: void * pointer storing matrix D on the GPU.

  • [in] d_type: rocblas_datatype specifies the datatype of matrix D

  • [in] ldd: rocblas_int specifies the leading dimension of D.

  • [in] stride_d: rocblas_long specifies stride from start of one “D” matrix to the next

  • [in] batch_count: rocblas_int number of gemm operations in the batch

  • [in] compute_type: rocblas_datatype specifies the datatype of computation

  • [in] algo: rocblas_gemm_algo enumerant specifying the algorithm type.

  • [in] solution_index: int32_t reserved for future use

  • [in] flags: uint32_t reserved for future use

Auxiliary
rocblas_pointer_to_mode()
rocblas_pointer_mode rocblas_pointer_to_mode(void *ptr)

indicates whether the pointer is on the host or device. currently HIP API can only recoginize the input ptr on deive or not can not recoginize it is on host or not

rocblas_create_handle()
rocblas_status rocblas_create_handle(rocblas_handle *handle)
rocblas_destroy_handle()
rocblas_status rocblas_destroy_handle(rocblas_handle handle)
rocblas_add_stream()
rocblas_status rocblas_add_stream(rocblas_handle handle, hipStream_t stream)
rocblas_set_stream()
rocblas_status rocblas_set_stream(rocblas_handle handle, hipStream_t stream)
rocblas_get_stream()
rocblas_status rocblas_get_stream(rocblas_handle handle, hipStream_t *stream)
rocblas_set_pointer_mode()
rocblas_status rocblas_set_pointer_mode(rocblas_handle handle, rocblas_pointer_mode pointer_mode)
rocblas_get_pointer_mode()
rocblas_status rocblas_get_pointer_mode(rocblas_handle handle, rocblas_pointer_mode *pointer_mode)
rocblas_set_vector()
rocblas_status rocblas_set_vector(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy)
rocblas_get_vector()
rocblas_status rocblas_get_vector(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy)
rocblas_set_matrix()
rocblas_status rocblas_set_matrix(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb)
rocblas_get_matrix()
rocblas_status rocblas_get_matrix(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb)

All API

namespace rocblas

Functions

void reinit_logs()
file rocblas-auxiliary.h
#include <hip/hip_runtime_api.h>#include “rocblas-types.h”

rocblas-auxiliary.h provides auxilary functions in rocblas

Defines

_ROCBLAS_AUXILIARY_H_

Functions

rocblas_pointer_mode rocblas_pointer_to_mode(void *ptr)

indicates whether the pointer is on the host or device. currently HIP API can only recoginize the input ptr on deive or not can not recoginize it is on host or not

rocblas_status rocblas_create_handle(rocblas_handle *handle)
rocblas_status rocblas_destroy_handle(rocblas_handle handle)
rocblas_status rocblas_add_stream(rocblas_handle handle, hipStream_t stream)
rocblas_status rocblas_set_stream(rocblas_handle handle, hipStream_t stream)
rocblas_status rocblas_get_stream(rocblas_handle handle, hipStream_t *stream)
rocblas_status rocblas_set_pointer_mode(rocblas_handle handle, rocblas_pointer_mode pointer_mode)
rocblas_status rocblas_get_pointer_mode(rocblas_handle handle, rocblas_pointer_mode *pointer_mode)
rocblas_status rocblas_set_vector(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy)
rocblas_status rocblas_get_vector(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy)
rocblas_status rocblas_set_matrix(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb)
rocblas_status rocblas_get_matrix(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb)
file rocblas-functions.h
#include “rocblas-types.h”

rocblas_functions.h provides Basic Linear Algebra Subprograms of Level 1, 2 and 3, using HIP optimized for AMD HCC-based GPU hardware. This library can also run on CUDA-based NVIDIA GPUs. This file exposes C89 BLAS interface

Defines

_ROCBLAS_FUNCTIONS_H_

Functions

rocblas_status rocblas_sscal(rocblas_handle handle, rocblas_int n, const float *alpha, float *x, rocblas_int incx)

BLAS Level 1 API.

scal scal the vector x[i] with scalar alpha, for i = 1 , … , n

x := alpha * x ,

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] alpha: specifies the scalar alpha.

  • [inout] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

rocblas_status rocblas_dscal(rocblas_handle handle, rocblas_int n, const double *alpha, double *x, rocblas_int incx)
rocblas_status rocblas_scopy(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, float *y, rocblas_int incy)

BLAS Level 1 API.

copy copies the vector x into the vector y, for i = 1 , … , n

y := x,

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

  • [out] y: pointer storing vector y on the GPU.

  • [in] incy: rocblas_int specifies the increment for the elements of y.

rocblas_status rocblas_dcopy(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, double *y, rocblas_int incy)
rocblas_status rocblas_sdot(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, const float *y, rocblas_int incy, float *result)

BLAS Level 1 API.

dot(u) perform dot product of vector x and y

result = x * y;

dotc perform dot product of complex vector x and complex y

result = conjugate (x) * y;

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of y.

  • [inout] result: store the dot product. either on the host CPU or device GPU. return is 0.0 if n <= 0.

rocblas_status rocblas_ddot(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, const double *y, rocblas_int incy, double *result)
rocblas_status rocblas_sswap(rocblas_handle handle, rocblas_int n, float *x, rocblas_int incx, float *y, rocblas_int incy)

BLAS Level 1 API.

swap interchange vector x[i] and y[i], for i = 1 , … , n

y := x; x := y

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [inout] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

  • [inout] y: pointer storing vector y on the GPU.

  • [in] incy: rocblas_int specifies the increment for the elements of y.

rocblas_status rocblas_dswap(rocblas_handle handle, rocblas_int n, double *x, rocblas_int incx, double *y, rocblas_int incy)
rocblas_status rocblas_haxpy(rocblas_handle handle, rocblas_int n, const rocblas_half *alpha, const rocblas_half *x, rocblas_int incx, rocblas_half *y, rocblas_int incy)

BLAS Level 1 API.

axpy compute y := alpha * x + y

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] alpha: specifies the scalar alpha.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of x.

  • [out] y: pointer storing vector y on the GPU.

  • [inout] incy: rocblas_int specifies the increment for the elements of y.

rocblas_status rocblas_saxpy(rocblas_handle handle, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, float *y, rocblas_int incy)
rocblas_status rocblas_daxpy(rocblas_handle handle, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, double *y, rocblas_int incy)
rocblas_status rocblas_sasum(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, float *result)

BLAS Level 1 API.

asum computes the sum of the magnitudes of elements of a real vector x, or the sum of magnitudes of the real and imaginary parts of elements if x is a complex vector

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of y.

  • [inout] result: store the asum product. either on the host CPU or device GPU. return is 0.0 if n, incx<=0.

rocblas_status rocblas_dasum(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, double *result)
rocblas_status rocblas_snrm2(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, float *result)

BLAS Level 1 API.

nrm2 computes the euclidean norm of a real or complex vector := sqrt( x’*x ) for real vector := sqrt( x**H*x ) for complex vector

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of y.

  • [inout] result: store the nrm2 product. either on the host CPU or device GPU. return is 0.0 if n, incx<=0.

rocblas_status rocblas_dnrm2(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, double *result)
rocblas_status rocblas_isamax(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, rocblas_int *result)

BLAS Level 1 API.

amax finds the first index of the element of maximum magnitude of real vector x or the sum of magnitude of the real and imaginary parts of elements if x is a complex vector

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of y.

  • [inout] result: store the amax index. either on the host CPU or device GPU. return is 0.0 if n, incx<=0.

rocblas_status rocblas_idamax(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, rocblas_int *result)
rocblas_status rocblas_isamin(rocblas_handle handle, rocblas_int n, const float *x, rocblas_int incx, rocblas_int *result)

BLAS Level 1 API.

amin finds the first index of the element of minimum magnitude of real vector x or the sum of magnitude of the real and imaginary parts of elements if x is a complex vector

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of y.

  • [inout] result: store the amin index. either on the host CPU or device GPU. return is 0.0 if n, incx<=0.

rocblas_status rocblas_idamin(rocblas_handle handle, rocblas_int n, const double *x, rocblas_int incx, rocblas_int *result)
rocblas_status rocblas_sgemv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const float *alpha, const float *A, rocblas_int lda, const float *x, rocblas_int incx, const float *beta, float *y, rocblas_int incy)

BLAS Level 2 API.

xGEMV performs one of the matrix-vector operations

y := alpha*A*x    + beta*y,   or
y := alpha*A**T*x + beta*y,   or
y := alpha*A**H*x + beta*y,

where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] trans: rocblas_operation

  • [in] m: rocblas_int

  • [in] n: rocblas_int

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

  • [in] beta: specifies the scalar beta.

  • [out] y: pointer storing vector y on the GPU.

  • [in] incy: rocblas_int specifies the increment for the elements of y.

rocblas_status rocblas_dgemv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const double *alpha, const double *A, rocblas_int lda, const double *x, rocblas_int incx, const double *beta, double *y, rocblas_int incy)
rocblas_status rocblas_strsv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int m, const float *A, rocblas_int lda, float *x, rocblas_int incx)

BLAS Level 2 API.

trsv solves

 A*x = alpha*b or A**T*x = alpha*b,

where x and b are vectors and A is a triangular matrix.

The vector x is overwritten on b.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] uplo: rocblas_fill. rocblas_fill_upper: A is an upper triangular matrix. rocblas_fill_lower: A is a lower triangular matrix.

  • [in] transA: rocblas_operation

  • [in] diag: rocblas_diagonal. rocblas_diagonal_unit: A is assumed to be unit triangular. rocblas_diagonal_non_unit: A is not assumed to be unit triangular.

  • [in] m: rocblas_int m specifies the number of rows of b. m >= 0.

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing matrix A on the GPU, of dimension ( lda, m )

  • [in] lda: rocblas_int specifies the leading dimension of A. lda = max( 1, m ).

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

rocblas_status rocblas_dtrsv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int m, const double *A, rocblas_int lda, double *x, rocblas_int incx)
rocblas_status rocblas_sger(rocblas_handle handle, rocblas_int m, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, const float *y, rocblas_int incy, float *A, rocblas_int lda)

BLAS Level 2 API.

xHE(SY)MV performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n Hermitian(Symmetric) matrix.

BLAS Level 2 API

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] uplo: rocblas_fill. specifies whether the upper or lower

  • [in] n: rocblas_int.

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: specifies the increment for the elements of x.

  • [in] beta: specifies the scalar beta.

  • [out] y: pointer storing vector y on the GPU.

  • [in] incy: rocblas_int specifies the increment for the elements of y.

xGER performs the matrix-vector operations

A := A + alpha*x*y**T

where alpha is a scalars, x and y are vectors, and A is an m by n matrix.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] m: rocblas_int

  • [in] n: rocblas_int

  • [in] alpha: specifies the scalar alpha.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of x.

  • [in] y: pointer storing vector y on the GPU.

  • [in] incy: rocblas_int specifies the increment for the elements of y.

  • [inout] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

rocblas_status rocblas_dger(rocblas_handle handle, rocblas_int m, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, const double *y, rocblas_int incy, double *A, rocblas_int lda)
rocblas_status rocblas_ssyr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, float *A, rocblas_int lda)

BLAS Level 2 API.

xSYR performs the matrix-vector operations

A := A + alpha*x*x**T

where alpha is a scalars, x is a vector, and A is an n by n symmetric matrix.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] n: rocblas_int

  • [in] alpha: specifies the scalar alpha.

  • [in] x: pointer storing vector x on the GPU.

  • [in] incx: rocblas_int specifies the increment for the elements of x.

  • [inout] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

rocblas_status rocblas_dsyr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, double *A, rocblas_int lda)
rocblas_status rocblas_strtri(rocblas_handle handle, rocblas_fill uplo, rocblas_diagonal diag, rocblas_int n, const float *A, rocblas_int lda, float *invA, rocblas_int ldinvA)

BLAS Level 3 API.

trtri compute the inverse of a matrix A, namely, invA

and write the result into invA;

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] uplo: rocblas_fill. specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’ if rocblas_fill_upper, the lower part of A is not referenced if rocblas_fill_lower, the upper part of A is not referenced

  • [in] diag: rocblas_diagonal. = ‘rocblas_diagonal_non_unit’, A is non-unit triangular; = ‘rocblas_diagonal_unit’, A is unit triangular;

  • [in] n: rocblas_int. size of matrix A and invA

  • [in] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

rocblas_status rocblas_dtrtri(rocblas_handle handle, rocblas_fill uplo, rocblas_diagonal diag, rocblas_int n, const double *A, rocblas_int lda, double *invA, rocblas_int ldinvA)
rocblas_status rocblas_strtri_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_diagonal diag, rocblas_int n, const float *A, rocblas_int lda, rocblas_int stride_a, float *invA, rocblas_int ldinvA, rocblas_int bsinvA, rocblas_int batch_count)

BLAS Level 3 API.

trtri compute the inverse of a matrix A

inv(A);

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] uplo: rocblas_fill. specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

  • [in] diag: rocblas_diagonal. = ‘rocblas_diagonal_non_unit’, A is non-unit triangular; = ‘rocblas_diagonal_unit’, A is unit triangular;

  • [in] n: rocblas_int.

  • [in] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] stride_a: rocblas_int “batch stride a”: stride from the start of one “A” matrix to the next

rocblas_status rocblas_dtrtri_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_diagonal diag, rocblas_int n, const double *A, rocblas_int lda, rocblas_int stride_a, double *invA, rocblas_int ldinvA, rocblas_int bsinvA, rocblas_int batch_count)
rocblas_status rocblas_strsm(rocblas_handle handle, rocblas_side side, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int m, rocblas_int n, const float *alpha, const float *A, rocblas_int lda, float *B, rocblas_int ldb)

BLAS Level 3 API.

trsm solves

op(A)*X = alpha*B or  X*op(A) = alpha*B,

where alpha is a scalar, X and B are m by n matrices, A is triangular matrix and op(A) is one of

op( A ) = A   or   op( A ) = A^T   or   op( A ) = A^H.

The matrix X is overwritten on B.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] side: rocblas_side. rocblas_side_left: op(A)*X = alpha*B. rocblas_side_right: X*op(A) = alpha*B.

  • [in] uplo: rocblas_fill. rocblas_fill_upper: A is an upper triangular matrix. rocblas_fill_lower: A is a lower triangular matrix.

  • [in] transA: rocblas_operation. transB: op(A) = A. rocblas_operation_transpose: op(A) = A^T. rocblas_operation_conjugate_transpose: op(A) = A^H.

  • [in] diag: rocblas_diagonal. rocblas_diagonal_unit: A is assumed to be unit triangular. rocblas_diagonal_non_unit: A is not assumed to be unit triangular.

  • [in] m: rocblas_int. m specifies the number of rows of B. m >= 0.

  • [in] n: rocblas_int. n specifies the number of columns of B. n >= 0.

  • [in] alpha: alpha specifies the scalar alpha. When alpha is &zero then A is not referenced and B need not be set before entry.

  • [in] A: pointer storing matrix A on the GPU. of dimension ( lda, k ), where k is m when rocblas_side_left and is n when rocblas_side_right only the upper/lower triangular part is accessed.

  • [in] lda: rocblas_int. lda specifies the first dimension of A. if side = rocblas_side_left, lda >= max( 1, m ), if side = rocblas_side_right, lda >= max( 1, n ).

rocblas_status rocblas_dtrsm(rocblas_handle handle, rocblas_side side, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int m, rocblas_int n, const double *alpha, const double *A, rocblas_int lda, double *B, rocblas_int ldb)
rocblas_status rocblas_hgemm(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const rocblas_half *alpha, const rocblas_half *A, rocblas_int lda, const rocblas_half *B, rocblas_int ldb, const rocblas_half *beta, rocblas_half *C, rocblas_int ldc)

BLAS Level 3 API.

xGEMM performs one of the matrix-matrix operations

C = alpha*op( A )*op( B ) + beta*C,

where op( X ) is one of

op( X ) = X      or
op( X ) = X**T   or
op( X ) = X**H,

alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.

Parameters
  • [in] handle: rocblas_handle, handle to the rocblas library context queue.

  • [in] transA: rocblas_operation, specifies the form of op( A )

  • [in] transB: rocblas_operation, specifies the form of op( B )

  • [in] m: rocblas_int, number or rows of matrices op( A ) and C

  • [in] n: rocblas_int, number of columns of matrices op( B ) and C

  • [in] k: rocblas_int, number of columns of matrix op( A ) and number of rows of matrix op( B )

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int, specifies the leading dimension of A.

  • [in] B: pointer storing matrix B on the GPU.

  • [in] ldb: rocblas_int, specifies the leading dimension of B.

  • [in] beta: specifies the scalar beta.

  • [inout] C: pointer storing matrix C on the GPU.

  • [in] ldc: rocblas_int, specifies the leading dimension of C.

rocblas_status rocblas_sgemm(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const float *alpha, const float *A, rocblas_int lda, const float *B, rocblas_int ldb, const float *beta, float *C, rocblas_int ldc)
rocblas_status rocblas_dgemm(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const double *alpha, const double *A, rocblas_int lda, const double *B, rocblas_int ldb, const double *beta, double *C, rocblas_int ldc)
rocblas_status rocblas_hgemm_strided_batched(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const rocblas_half *alpha, const rocblas_half *A, rocblas_int lda, rocblas_int stride_a, const rocblas_half *B, rocblas_int ldb, rocblas_int stride_b, const rocblas_half *beta, rocblas_half *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)

BLAS Level 3 API.

xGEMM_STRIDED_BATCHED performs one of the strided batched matrix-matrix operations

C[i*stride_c] = alpha*op( A[i*stride_a] )*op( B[i*stride_b] ) + beta*C[i*stride_c], for i in
[0,batch_count-1]

where op( X ) is one of

op( X ) = X      or
op( X ) = X**T   or
op( X ) = X**H,

alpha and beta are scalars, and A, B and C are strided batched matrices, with op( A ) an m by k by batch_count strided_batched matrix, op( B ) an k by n by batch_count strided_batched matrix and C an m by n by batch_count strided_batched matrix.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] transA: rocblas_operation specifies the form of op( A )

  • [in] transB: rocblas_operation specifies the form of op( B )

  • [in] m: rocblas_int. matrix dimention m.

  • [in] n: rocblas_int. matrix dimention n.

  • [in] k: rocblas_int. matrix dimention k.

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing strided batched matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of “A”.

  • [in] stride_a: rocblas_int stride from the start of one “A” matrix to the next

  • [in] B: pointer storing strided batched matrix B on the GPU.

  • [in] ldb: rocblas_int specifies the leading dimension of “B”.

  • [in] stride_b: rocblas_int stride from the start of one “B” matrix to the next

  • [in] beta: specifies the scalar beta.

  • [inout] C: pointer storing strided batched matrix C on the GPU.

  • [in] ldc: rocblas_int specifies the leading dimension of “C”.

  • [in] stride_c: rocblas_int stride from the start of one “C” matrix to the next

  • [in] batch_count: rocblas_int number of gemm operatons in the batch

rocblas_status rocblas_sgemm_strided_batched(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const float *alpha, const float *A, rocblas_int lda, rocblas_int stride_a, const float *B, rocblas_int ldb, rocblas_int stride_b, const float *beta, float *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)
rocblas_status rocblas_dgemm_strided_batched(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const double *alpha, const double *A, rocblas_int lda, rocblas_int stride_a, const double *B, rocblas_int ldb, rocblas_int stride_b, const double *beta, double *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)
rocblas_status rocblas_hgemm_kernel_name(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const rocblas_half *alpha, const rocblas_half *A, rocblas_int lda, rocblas_int stride_a, const rocblas_half *B, rocblas_int ldb, rocblas_int stride_b, const rocblas_half *beta, rocblas_half *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)
rocblas_status rocblas_sgemm_kernel_name(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const float *alpha, const float *A, rocblas_int lda, rocblas_int stride_a, const float *B, rocblas_int ldb, rocblas_int stride_b, const float *beta, float *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)
rocblas_status rocblas_dgemm_kernel_name(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, rocblas_int k, const double *alpha, const double *A, rocblas_int lda, rocblas_int stride_a, const double *B, rocblas_int ldb, rocblas_int stride_b, const double *beta, double *C, rocblas_int ldc, rocblas_int stride_c, rocblas_int batch_count)
rocblas_status rocblas_sgeam(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, const float *alpha, const float *A, rocblas_int lda, const float *beta, const float *B, rocblas_int ldb, float *C, rocblas_int ldc)

BLAS Level 3 API.

xGEAM performs one of the matrix-matrix operations

C = alpha*op( A ) + beta*op( B ),

where op( X ) is one of

op( X ) = X      or
op( X ) = X**T   or
op( X ) = X**H,

alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by n matrix, op( B ) an m by n matrix, and C an m by n matrix.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] transA: rocblas_operation specifies the form of op( A )

  • [in] transB: rocblas_operation specifies the form of op( B )

  • [in] m: rocblas_int.

  • [in] n: rocblas_int.

  • [in] alpha: specifies the scalar alpha.

  • [in] A: pointer storing matrix A on the GPU.

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] beta: specifies the scalar beta.

  • [in] B: pointer storing matrix B on the GPU.

  • [in] ldb: rocblas_int specifies the leading dimension of B.

  • [inout] C: pointer storing matrix C on the GPU.

  • [in] ldc: rocblas_int specifies the leading dimension of C.

rocblas_status rocblas_dgeam(rocblas_handle handle, rocblas_operation transa, rocblas_operation transb, rocblas_int m, rocblas_int n, const double *alpha, const double *A, rocblas_int lda, const double *beta, const double *B, rocblas_int ldb, double *C, rocblas_int ldc)
rocblas_status rocblas_gemm_ex(rocblas_handle handle, rocblas_operation trans_a, rocblas_operation trans_b, rocblas_int m, rocblas_int n, rocblas_int k, const void *alpha, const void *a, rocblas_datatype a_type, rocblas_int lda, const void *b, rocblas_datatype b_type, rocblas_int ldb, const void *beta, const void *c, rocblas_datatype c_type, rocblas_int ldc, void *d, rocblas_datatype d_type, rocblas_int ldd, rocblas_datatype compute_type, rocblas_gemm_algo algo, int32_t solution_index, uint32_t flags, size_t *workspace_size, void *workspace)
rocblas_status rocblas_gemm_strided_batched_ex(rocblas_handle handle, rocblas_operation trans_a, rocblas_operation trans_b, rocblas_int m, rocblas_int n, rocblas_int k, const void *alpha, const void *a, rocblas_datatype a_type, rocblas_int lda, rocblas_long stride_a, const void *b, rocblas_datatype b_type, rocblas_int ldb, rocblas_long stride_b, const void *beta, const void *c, rocblas_datatype c_type, rocblas_int ldc, rocblas_long stride_c, void *d, rocblas_datatype d_type, rocblas_int ldd, rocblas_long stride_d, rocblas_int batch_count, rocblas_datatype compute_type, rocblas_gemm_algo algo, int32_t solution_index, uint32_t flags, size_t *workspace_size, void *workspace)

BLAS EX API.

GEMM_EX performs one of the matrix-matrix operations

D = alpha*op( A )*op( B ) + beta*C,

where op( X ) is one of

op( X ) = X      or
op( X ) = X**T   or
op( X ) = X**H,

alpha and beta are scalars, and A, B, C, and D are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C and D are m by n matrices.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] transA: rocblas_operation specifies the form of op( A )

  • [in] transB: rocblas_operation specifies the form of op( B )

  • [in] m: rocblas_int. matrix dimension m

  • [in] n: rocblas_int. matrix dimension n

  • [in] k: rocblas_int. matrix dimension k

  • [in] alpha: const void * specifies the scalar alpha. Same datatype as compute_type.

  • [in] a: void * pointer storing matrix A on the GPU.

  • [in] a_type: rocblas_datatype specifies the datatype of matrix A

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] b: void * pointer storing matrix B on the GPU.

  • [in] b_type: rocblas_datatype specifies the datatype of matrix B

  • [in] ldb: rocblas_int specifies the leading dimension of B.

  • [in] beta: const void * specifies the scalar beta. Same datatype as compute_type.

  • [in] c: void * pointer storing matrix C on the GPU.

  • [in] c_type: rocblas_datatype specifies the datatype of matrix C

  • [in] ldc: rocblas_int specifies the leading dimension of C.

  • [out] d: void * pointer storing matrix D on the GPU.

  • [in] d_type: rocblas_datatype specifies the datatype of matrix D

  • [in] ldd: rocblas_int specifies the leading dimension of D.

  • [in] compute_type: rocblas_datatype specifies the datatype of computation

  • [in] algo: rocblas_gemm_algo enumerant specifying the algorithm type.

  • [in] solution_index: int32_t reserved for future use

  • [in] flags: uint32_t reserved for future use

rocblas_status rocblas_get_version_string(char *buf, size_t len)

BLAS EX API.

GEMM_STRIDED_BATCHED_EX performs one of the strided_batched matrix-matrix operations

D[i*stride_d] = alpha*op(A[i*stride_a])*op(B[i*stride_b]) + beta*C[i*stride_c], for i in
[0,batch_count-1]

where op( X ) is one of

op( X ) = X      or
op( X ) = X**T   or
op( X ) = X**H,

alpha and beta are scalars, and A, B, C, and D are strided_batched matrices, with op( A ) an m by k by batch_count strided_batched matrix, op( B ) a k by n by batch_count strided_batched matrix and C and D are m by n by batch_count strided_batched matrices.

The strided_batched matrices are multiple matrices separated by a constant stride. The number of matrices is batch_count.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] transA: rocblas_operation specifies the form of op( A )

  • [in] transB: rocblas_operation specifies the form of op( B )

  • [in] m: rocblas_int. matrix dimension m

  • [in] n: rocblas_int. matrix dimension n

  • [in] k: rocblas_int. matrix dimension k

  • [in] alpha: const void * specifies the scalar alpha. Same datatype as compute_type.

  • [in] a: void * pointer storing matrix A on the GPU.

  • [in] a_type: rocblas_datatype specifies the datatype of matrix A

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] stride_a: rocblas_long specifies stride from start of one “A” matrix to the next

  • [in] b: void * pointer storing matrix B on the GPU.

  • [in] b_type: rocblas_datatype specifies the datatype of matrix B

  • [in] ldb: rocblas_int specifies the leading dimension of B.

  • [in] stride_b: rocblas_long specifies stride from start of one “B” matrix to the next

  • [in] beta: const void * specifies the scalar beta. Same datatype as compute_type.

  • [in] c: void * pointer storing matrix C on the GPU.

  • [in] c_type: rocblas_datatype specifies the datatype of matrix C

  • [in] ldc: rocblas_int specifies the leading dimension of C.

  • [in] stride_c: rocblas_long specifies stride from start of one “C” matrix to the next

  • [out] d: void * pointer storing matrix D on the GPU.

  • [in] d_type: rocblas_datatype specifies the datatype of matrix D

  • [in] ldd: rocblas_int specifies the leading dimension of D.

  • [in] stride_d: rocblas_long specifies stride from start of one “D” matrix to the next

  • [in] batch_count: rocblas_int number of gemm operations in the batch

  • [in] compute_type: rocblas_datatype specifies the datatype of computation

  • [in] algo: rocblas_gemm_algo enumerant specifying the algorithm type.

  • [in] solution_index: int32_t reserved for future use

  • [in] flags: uint32_t reserved for future use

file rocblas-types.h
#include <stddef.h>#include <stdint.h>#include <hip/hip_vector_types.h>

rocblas-types.h defines data types used by rocblas

Defines

_ROCBLAS_TYPES_H_

Typedefs

typedef int32_t rocblas_int

To specify whether int32 or int64 is used.

typedef int64_t rocblas_long
typedef float2 rocblas_float_complex
typedef double2 rocblas_double_complex
typedef uint16_t rocblas_half
typedef float2 rocblas_half_complex
typedef struct _rocblas_handle *rocblas_handle

Enums

enum rocblas_operation

Used to specify whether the matrix is to be transposed or not.

parameter constants. numbering is consistent with CBLAS, ACML and most standard C BLAS libraries

Values:

rocblas_operation_none = 111

Operate with the matrix.

rocblas_operation_transpose = 112

Operate with the transpose of the matrix.

rocblas_operation_conjugate_transpose = 113

Operate with the conjugate transpose of the matrix.

enum rocblas_fill

Used by the Hermitian, symmetric and triangular matrix routines to specify whether the upper or lower triangle is being referenced.

Values:

rocblas_fill_upper = 121

Upper triangle.

rocblas_fill_lower = 122

Lower triangle.

rocblas_fill_full = 123
enum rocblas_diagonal

It is used by the triangular matrix routines to specify whether the matrix is unit triangular.

Values:

rocblas_diagonal_non_unit = 131

Non-unit triangular.

rocblas_diagonal_unit = 132

Unit triangular.

enum rocblas_side

Indicates the side matrix A is located relative to matrix B during multiplication.

Values:

rocblas_side_left = 141

Multiply general matrix by symmetric, Hermitian or triangular matrix on the left.

rocblas_side_right = 142

Multiply general matrix by symmetric, Hermitian or triangular matrix on the right.

rocblas_side_both = 143
enum rocblas_status

rocblas status codes definition

Values:

rocblas_status_success = 0

success

rocblas_status_invalid_handle = 1

handle not initialized, invalid or null

rocblas_status_not_implemented = 2

function is not implemented

rocblas_status_invalid_pointer = 3

invalid pointer parameter

rocblas_status_invalid_size = 4

invalid size parameter

rocblas_status_memory_error = 5

failed internal memory allocation, copy or dealloc

rocblas_status_internal_error = 6

other internal library failure

enum rocblas_datatype

Indicates the precision width of data stored in a blas type.

Values:

rocblas_datatype_f16_r = 150
rocblas_datatype_f32_r = 151
rocblas_datatype_f64_r = 152
rocblas_datatype_f16_c = 153
rocblas_datatype_f32_c = 154
rocblas_datatype_f64_c = 155
rocblas_datatype_i8_r = 160
rocblas_datatype_u8_r = 161
rocblas_datatype_i32_r = 162
rocblas_datatype_u32_r = 163
rocblas_datatype_i8_c = 164
rocblas_datatype_u8_c = 165
rocblas_datatype_i32_c = 166
rocblas_datatype_u32_c = 167
enum rocblas_pointer_mode

Indicates the pointer is device pointer or host pointer.

Values:

rocblas_pointer_mode_host = 0
rocblas_pointer_mode_device = 1
enum rocblas_layer_mode

Indicates if layer is active with bitmask.

Values:

rocblas_layer_mode_none = 0b0000000000
rocblas_layer_mode_log_trace = 0b0000000001
rocblas_layer_mode_log_bench = 0b0000000010
rocblas_layer_mode_log_profile = 0b0000000100
enum rocblas_gemm_algo

Indicates if layer is active with bitmask.

Values:

rocblas_gemm_algo_standard = 0b0000000000
file rocblas.h
#include <stdbool.h>#include “rocblas-export.h”#include “rocblas-version.h”#include “rocblas-types.h”#include “rocblas-auxiliary.h”#include “rocblas-functions.h”

rocblas.h includes other *.h and exposes a common interface

Defines

_ROCBLAS_H_
file buildinfo.cpp
#include <stdio.h>#include <sstream>#include <string.h>#include “definitions.h”#include “rocblas-types.h”#include “rocblas-functions.h”#include “rocblas-version.h”

Defines

TO_STR2(x)
TO_STR(x)
VERSION_STRING

Functions

rocblas_status rocblas_get_version_string(char *buf, size_t len)

BLAS EX API.

GEMM_STRIDED_BATCHED_EX performs one of the strided_batched matrix-matrix operations

D[i*stride_d] = alpha*op(A[i*stride_a])*op(B[i*stride_b]) + beta*C[i*stride_c], for i in
[0,batch_count-1]

where op( X ) is one of

op( X ) = X      or
op( X ) = X**T   or
op( X ) = X**H,

alpha and beta are scalars, and A, B, C, and D are strided_batched matrices, with op( A ) an m by k by batch_count strided_batched matrix, op( B ) a k by n by batch_count strided_batched matrix and C and D are m by n by batch_count strided_batched matrices.

The strided_batched matrices are multiple matrices separated by a constant stride. The number of matrices is batch_count.

Parameters
  • [in] handle: rocblas_handle. handle to the rocblas library context queue.

  • [in] transA: rocblas_operation specifies the form of op( A )

  • [in] transB: rocblas_operation specifies the form of op( B )

  • [in] m: rocblas_int. matrix dimension m

  • [in] n: rocblas_int. matrix dimension n

  • [in] k: rocblas_int. matrix dimension k

  • [in] alpha: const void * specifies the scalar alpha. Same datatype as compute_type.

  • [in] a: void * pointer storing matrix A on the GPU.

  • [in] a_type: rocblas_datatype specifies the datatype of matrix A

  • [in] lda: rocblas_int specifies the leading dimension of A.

  • [in] stride_a: rocblas_long specifies stride from start of one “A” matrix to the next

  • [in] b: void * pointer storing matrix B on the GPU.

  • [in] b_type: rocblas_datatype specifies the datatype of matrix B

  • [in] ldb: rocblas_int specifies the leading dimension of B.

  • [in] stride_b: rocblas_long specifies stride from start of one “B” matrix to the next

  • [in] beta: const void * specifies the scalar beta. Same datatype as compute_type.

  • [in] c: void * pointer storing matrix C on the GPU.

  • [in] c_type: rocblas_datatype specifies the datatype of matrix C

  • [in] ldc: rocblas_int specifies the leading dimension of C.

  • [in] stride_c: rocblas_long specifies stride from start of one “C” matrix to the next

  • [out] d: void * pointer storing matrix D on the GPU.

  • [in] d_type: rocblas_datatype specifies the datatype of matrix D

  • [in] ldd: rocblas_int specifies the leading dimension of D.

  • [in] stride_d: rocblas_long specifies stride from start of one “D” matrix to the next

  • [in] batch_count: rocblas_int number of gemm operations in the batch

  • [in] compute_type: rocblas_datatype specifies the datatype of computation

  • [in] algo: rocblas_gemm_algo enumerant specifying the algorithm type.

  • [in] solution_index: int32_t reserved for future use

  • [in] flags: uint32_t reserved for future use

file handle.cpp
#include “handle.h”#include <cstdlib>

Functions

static void open_log_stream(const char *environment_variable_name, std::ostream *&log_os, std::ofstream &log_ofs)

Logging function.

open_log_stream Open stream log_os for logging. If the environment variable with name environment_variable_name is not set, then stream log_os to std::cerr. Else open a file at the full logfile path contained in the environment variable. If opening the file suceeds, stream to the file else stream to std::cerr.

[out] log_os std::ostream*& Output stream. Stream to std:cerr if environment_variable_name is not set, else set to stream to log_ofs

Parameters
  • [in] environment_variable_name: const char* Name of environment variable that contains the full logfile path.

[out] log_ofs std::ofstream& Output file stream. If log_ofs->is_open()==true, then log_os will stream to log_ofs. Else it will stream to std::cerr.

file rocblas_auxiliary.cpp
#include <stdio.h>#include <hip/hip_runtime.h>#include “definitions.h”#include “rocblas-types.h”#include “handle.h”#include “logging.h”#include “utility.h”#include “rocblas_unique_ptr.hpp”#include “rocblas-auxiliary.h”

Functions

rocblas_pointer_mode rocblas_pointer_to_mode(void *ptr)

indicates whether the pointer is on the host or device. currently HIP API can only recoginize the input ptr on deive or not can not recoginize it is on host or not

rocblas_status rocblas_get_pointer_mode(rocblas_handle handle, rocblas_pointer_mode *mode)
rocblas_status rocblas_set_pointer_mode(rocblas_handle handle, rocblas_pointer_mode mode)
rocblas_status rocblas_create_handle(rocblas_handle *handle)
rocblas_status rocblas_destroy_handle(rocblas_handle handle)
rocblas_status rocblas_set_stream(rocblas_handle handle, hipStream_t stream_id)
rocblas_status rocblas_get_stream(rocblas_handle handle, hipStream_t *stream_id)
__global__ void copy_void_ptr_vector_kernel(rocblas_int n, rocblas_int elem_size, const void * x, rocblas_int incx, void * y, rocblas_int incy)
rocblas_status rocblas_set_vector(rocblas_int n, rocblas_int elem_size, const void *x_h, rocblas_int incx, void *y_d, rocblas_int incy)
rocblas_status rocblas_get_vector(rocblas_int n, rocblas_int elem_size, const void *x_d, rocblas_int incx, void *y_h, rocblas_int incy)
__global__ void copy_void_ptr_matrix_kernel(rocblas_int rows, rocblas_int cols, size_t elem_size, const void * a, rocblas_int lda, void * b, rocblas_int ldb)
rocblas_status rocblas_set_matrix(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a_h, rocblas_int lda, void *b_d, rocblas_int ldb)
rocblas_status rocblas_get_matrix(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a_d, rocblas_int lda, void *b_h, rocblas_int ldb)

Variables

constexpr size_t VEC_BUFF_MAX_BYTES = 1048576
constexpr rocblas_int NB_X = 256
constexpr size_t MAT_BUFF_MAX_BYTES = 1048576
constexpr rocblas_int MATRIX_DIM_X = 128
constexpr rocblas_int MATRIX_DIM_Y = 8
file status.cpp
#include <hip/hip_runtime_api.h>#include “rocblas.h”#include “status.h”

Functions

rocblas_status get_rocblas_status_for_hip_status(hipError_t status)
dir ROCm_Libraries/rocBLAS
dir ROCm_Libraries
dir ROCm_Libraries/rocBLAS/src
dir ROCm_Libraries/rocBLAS/src/src

hipBLAS

Introduction

Please Refer here for Github link hipBLAS

hipBLAS is a BLAS marshalling library, with multiple supported backends. It sits between the application and a ‘worker’ BLAS library, marshalling inputs into the backend library and marshalling results back to the application. hipBLAS exports an interface that does not require the client to change, regardless of the chosen backend. Currently, hipBLAS supports rocBLAS and cuBLAS as backends.

Installing pre-built packages

Download pre-built packages either from ROCm’s package servers or by clicking the github releases tab and manually downloading, which could be newer. Release notes are available for each release on the releases tab.

sudo apt update && sudo apt install hipblas

Quickstart hipBLAS build

Bash helper build script (Ubuntu only)

The root of this repository has a helper bash script install.sh to build and install hipBLAS on Ubuntu with a single command. It does not take a lot of options and hard-codes configuration that can be specified through invoking cmake directly, but it’s a great way to get started quickly and can serve as an example of how to build/install. A few commands in the script need sudo access, so it may prompt you for a password.

./install -h -- shows help
./install -id -- build library, build dependencies and install (-d flag only needs to be passed once on a system)

Manual build (all supported platforms)

If you use a distro other than Ubuntu, or would like more control over the build process, the hipblas build has helpful information on how to configure cmake and manually build.

Build

Dependencies For Building Library

CMake 3.5 or later

The build infrastructure for hipBLAS is based on Cmake v3.5. This is the version of cmake available on ROCm supported platforms. If you are on a headless machine without the x-windows system, we recommend using ccmake; if you have access to X-windows, we recommend using cmake-gui.

Install one-liners cmake:

Ubuntu: sudo apt install cmake-qt-gui
Fedora: sudo dnf install cmake-gui

Build Library Using Script (Ubuntu only)

The root of this repository has a helper bash script install.sh to build and install hipBLAS on Ubuntu with a single command. It does not take a lot of options and hard-codes configuration that can be specified through invoking cmake directly, but it’s a great way to get started quickly and can serve as an example of how to build/install. A few commands in the script need sudo access, so it may prompt you for a password.

./install.sh -h -- shows help
./install.sh -id -- build library, build dependencies and install (-d flag only needs to be passed once on a system)

Build Library Using Individual Commands

mkdir -p [HIPBLAS_BUILD_DIR]/release
cd [HIPBLAS_BUILD_DIR]/release
# Default install location is in /opt/rocm, define -DCMAKE_INSTALL_PREFIX=<path> to specify other
# Default build config is 'Release', define -DCMAKE_BUILD_TYPE=<config> to specify other
CXX=/opt/rocm/bin/hcc ccmake [HIPBLAS_SOURCE]
make -j$(nproc)
sudo make install # sudo required if installing into system directory such as /opt/rocm

Build Library + Tests + Benchmarks + Samples Using Individual Commands

The repository contains source for clients that serve as samples, tests and benchmarks. Clients source can be found in the clients subdir.

Dependencies (only necessary for hipBLAS clients)

The hipBLAS samples have no external dependencies, but our unit test and benchmarking applications do. These clients introduce the following dependencies:

  1. boost

  2. lapack
    • lapack itself brings a dependency on a fortran compiler

  3. googletest

Linux distros typically have an easy installation mechanism for boost through the native package manager.

Ubuntu: sudo apt install libboost-program-options-dev
Fedora: sudo dnf install boost-program-options

Unfortunately, googletest and lapack are not as easy to install. Many distros do not provide a googletest package with pre-compiled libraries, and the lapack packages do not have the necessary cmake config files for cmake to configure linking the cblas library. hipBLAS provide a cmake script that builds the above dependencies from source. This is an optional step; users can provide their own builds of these dependencies and help cmake find them by setting the CMAKE_PREFIX_PATH definition. The following is a sequence of steps to build dependencies and install them to the cmake default /usr/local.

(optional, one time only)

mkdir -p [HIPBLAS_BUILD_DIR]/release/deps
cd [HIPBLAS_BUILD_DIR]/release/deps
ccmake -DBUILD_BOOST=OFF [HIPBLAS_SOURCE]/deps   # assuming boost is installed through package manager as above
make -j$(nproc) install

Once dependencies are available on the system, it is possible to configure the clients to build. This requires a few extra cmake flags to the library cmake configure script. If the dependencies are not installed into system defaults (like /usr/local ), you should pass the CMAKE_PREFIX_PATH to cmake to help find them.

-DCMAKE_PREFIX_PATH="<semicolon separated paths>"

# Default install location is in /opt/rocm, use -DCMAKE_INSTALL_PREFIX=<path> to specify other
CXX=/opt/rocm/bin/hcc ccmake -DBUILD_CLIENTS_TESTS=ON -DBUILD_CLIENTS_BENCHMARKS=ON [HIPBLAS_SOURCE]
make -j$(nproc)
sudo make install   # sudo required if installing into system directory such as /opt/rocm

Common build problems

  • Issue: HIP (/opt/rocm/hip) was built using hcc 1.0.xxx-xxx-xxx-xxx, but you are using /opt/rocm/hcc/hcc with version 1.0.yyy-yyy-yyy-yyy from hipcc. (version does not match) . Please rebuild HIP including cmake or update HCC_HOME variable.

Solution: Download HIP from github and use hcc to build from source and then use the build HIP instead of /opt/rocm/hip one or singly overwrite the new build HIP to this location.

  • Issue: For Carrizo - HCC RUNTIME ERROR: Fail to find compatible kernel

Solution: Add the following to the cmake command when configuring: -DCMAKE_CXX_FLAGS=”–amdgpu-target=gfx801”

  • Issue: For MI25 (Vega10 Server) - HCC RUNTIME ERROR: Fail to find compatible kernel

Solution: export HCC_AMDGPU_TARGET=gfx900

Running

Notice

Before reading this Wiki, it is assumed hipBLAS with the client applications has been successfully built as described in Build hipBLAS libraries and verification code

Samples

cd [BUILD_DIR]/clients/staging
./example-sscal

Example code that calls hipBLAS you can also see the following blog on the right side Example C code calling hipBLAS routine.

Unit tests

Run tests with the following:

cd [BUILD_DIR]/clients/staging
./hipblas-test

To run specific tests, use –gtest_filter=match where match is a ‘:’-separated list of wildcard patterns (called the positive patterns) optionally followed by a ‘-‘ and another ‘:’-separated pattern list (called the negative patterns). For example, run gemv tests with the following:

cd [BUILD_DIR]/clients/staging
./hipblas-test --gtest_filter=*gemv*

Functions supported

A list of exported functions from hipblas can be found on the wiki

Platform: rocBLAS or cuBLAS

hipBLAS is a marshalling library, so it runs with either rocBLAS or cuBLAS configured as the backend BLAS library, chosen at cmake configure time.

hipBLAS interface examples

The hipBLAS interface is compatible with rocBLAS and cuBLAS-v2 APIs. Porting a CUDA application which originally calls the cuBLAS API to an application calling hipBLAS API should be relatively straightforward. For example, the hipBLAS SGEMV interface is

GEMV API

hipblasStatus_t
hipblasSgemv( hipblasHandle_t handle,
             hipblasOperation_t trans,
             int m, int n, const float *alpha,
             const float *A, int lda,
             const float *x, int incx, const float *beta,
             float *y, int incy );

Batched and strided GEMM API

hipBLAS GEMM can process matrices in batches with regular strides. There are several permutations of these API’s, the following is an example that takes everything

hipblasStatus_t
hipblasSgemmStridedBatched( hipblasHandle_t handle,
             hipblasOperation_t transa, hipblasOperation_t transb,
             int m, int n, int k, const float *alpha,
             const float *A, int lda, long long bsa,
             const float *B, int ldb, long long bsb, const float *beta,
             float *C, int ldc, long long bsc,
             int batchCount);

hipBLAS assumes matrices A and vectors x, y are allocated in GPU memory space filled with data. Users are responsible for copying data from/to the host and device memory.

rocRAND

The rocRAND project provides functions that generate pseudo-random and quasi-random numbers.

The rocRAND library is implemented in the HIP programming language and optimised for AMD’s latest discrete GPUs. It is designed to run on top of AMD’s Radeon Open Compute ROCm runtime, but it also works on CUDA enabled GPUs.

Additionally, the project includes a wrapper library called hipRAND which allows user to easily port CUDA applications that use cuRAND library to the HIP layer. In ROCm environment hipRAND uses rocRAND, however in CUDA environment cuRAND is used instead.

Supported Random Number Generators

  • XORWOW

  • MRG32k3a

  • Mersenne Twister for Graphic Processors (MTGP32)

  • Philox (4x32, 10 rounds)

  • bSobol32

Requirements

  • Git

  • cmake (3.0.2 or later)

  • C++ compiler with C++11 support

  • For AMD platforms:
    • ROCm (1.7 or later)

    • HCC compiler, which must be set as C++ compiler on ROCm platform.

  • For CUDA platforms:
    • HIP (hcc is not required)

    • Latest CUDA SDK

Optional:

  • GTest (required only for tests; building tests is enabled by default)
    • Use GTEST_ROOT to specify GTest location (also see FindGTest)

    • Note: If GTest is not already installed, it will be automatically downloaded and built

  • TestU01 (required only for crush tests)
    • Use TESTU01_ROOT_DIR to specify TestU01 location

    • Note: If TestU01 is not already installed, it will be automatically downloaded and built

  • Fortran compiler (required only for Fortran wrapper)
    • gfortran is recommended.

  • Python 2.7+ or 3.5+ (required only for Python wrapper)

If some dependencies are missing, cmake script automatically downloads, builds and installs them. Setting DEPENDENCIES_FORCE_DOWNLOAD option ON forces script to not to use system-installed libraries, and to download all dependencies.

Build and Install

git clone https://github.com/ROCmSoftwarePlatform/rocRAND.git

# Go to rocRAND directory, create and go to build directory
cd rocRAND; mkdir build; cd build

# Configure rocRAND, setup options for your system
# Build options: BUILD_TEST, BUILD_BENCHMARK (off by default), BUILD_CRUSH_TEST (off by default)
#
# ! IMPORTANT !
# On ROCm platform set C++ compiler to HCC. You can do it by adding 'CXX=<path-to-hcc>' or just
# `CXX=hcc` before 'cmake', or setting cmake option 'CMAKE_CXX_COMPILER' to path to the HCC compiler.
#
[CXX=hcc] cmake -DBUILD_BENCHMARK=ON ../. # or cmake-gui ../.
# Build
# For ROCM-1.6, if a HCC runtime error is caught, consider setting
# HCC_AMDGPU_TARGET=<arch> in front of make as a workaround
make -j4
# Optionally, run tests if they're enabled
ctest --output-on-failure
# Install
[sudo] make install

Note: Existing gtest library in the system (especially static gtest libraries built with other compilers) may cause build failure; if errors are encountered with existing gtest library or other dependencies, DEPENDENCIES_FORCE_DOWNLOAD flag can be passed to cmake, as mentioned before, to help solve the problem.

Note: To disable inline assembly optimisations in rocRAND (for both the host library and the device functions provided in rocrand_kernel.h) set cmake option ENABLE_INLINE_ASM to OFF.

Running Unit Tests

# Go to rocRAND build directory
cd rocRAND; cd build
# To run all tests
ctest
# To run unit tests
./test/<unit-test-name>

Running Benchmarks

# Go to rocRAND build directory
cd rocRAND; cd build
# To run benchmark for generate functions:
# engine -> all, xorwow, mrg32k3a, mtgp32, philox, sobol32
# distribution -> all, uniform-uint, uniform-float, uniform-double, normal-float, normal-double,
#                 log-normal-float, log-normal-double, poisson
# Further option can be found using --help
./benchmark/benchmark_rocrand_generate --engine <engine> --dis <distribution>
# To run benchmark for device kernel functions:
# engine -> all, xorwow, mrg32k3a, mtgp32, philox, sobol32
# distribution -> all, uniform-uint, uniform-float, uniform-double, normal-float, normal-double,
#                 log-normal-float, log-normal-double, poisson, discrete-poisson, discrete-custom
# further option can be found using --help
./benchmark/benchmark_rocrand_kernel --engine <engine> --dis <distribution>
# To compare against cuRAND (cuRAND must be supported):
./benchmark/benchmark_curand_generate --engine <engine> --dis <distribution>
./benchmark/benchmark_curand_kernel --engine <engine> --dis <distribution>

Running Statistical Tests

# Go to rocRAND build directory
cd rocRAND; cd build
# To run "crush" test, which verifies that generated pseudorandom
# numbers are of high quality:
# engine -> all, xorwow, mrg32k3a, mtgp32, philox
./test/crush_test_rocrand --engine <engine>
# To run Pearson Chi-squared and Anderson-Darling tests, which verify
# that distribution of random number agrees with the requested distribution:
# engine -> all, xorwow, mrg32k3a, mtgp32, philox, sobol32
# distribution -> all, uniform-float, uniform-double, normal-float, normal-double,
#                 log-normal-float, log-normal-double, poisson
./test/stat_test_rocrand_generate --engine <engine> --dis <distribution>

Documentation

# go to rocRAND doc directory
cd rocRAND; cd doc
# run doxygen
doxygen Doxyfile
# open html/index.html

Wrappers

Support

Bugs and feature requests can be reported through the issue tracker.

rocFFT

rocFFT is a software library for computing Fast Fourier Transforms (FFT) written in HIP. It is part of AMD’s software ecosystem based on ROCm. In addition to AMD GPU devices, the library can also be compiled with the CUDA compiler using HIP tools for running on Nvidia GPU devices.

The rocFFT library:

  • Provides a fast and accurate platform for calculating discrete FFTs.

  • Supports single and double precision floating point formats.

  • Supports 1D, 2D, and 3D transforms.

  • Supports computation of transforms in batches.

  • Supports real and complex FFTs.

  • Supports lengths that are any combination of powers of 2, 3, 5.

API design

Please refer to the rocFFT API design for current documentation. Work in progress.

Installing pre-built packages

Download pre-built packages either from ROCm’s package servers or by clicking the github releases tab and manually downloading, which could be newer. Release notes are available for each release on the releases tab.

sudo apt update && sudo apt install rocfft

Quickstart rocFFT build

Bash helper build script (Ubuntu only) The root of this repository has a helper bash script install.sh to build and install rocFFT on Ubuntu with a single command. It does not take a lot of options and hard-codes configuration that can be specified through invoking cmake directly, but it’s a great way to get started quickly and can serve as an example of how to build/install. A few commands in the script need sudo access, so it may prompt you for a password. * ./install -h – shows help * ./install -id – build library, build dependencies and install globally (-d flag only needs to be specified once on a system) * ./install -c --cuda – build library and clients for cuda backend into a local directory Manual build (all supported platforms) If you use a distro other than Ubuntu, or would like more control over the build process, the rocfft build wiki has helpful information on how to configure cmake and manually build.

Manual build (all supported platforms)

If you use a distro other than Ubuntu, or would like more control over the build process, the rocfft build wiki has helpful information on how to configure cmake and manually build.

Library and API Documentation

Please refer to the Library documentation for current documentation.

Example

The following is a simple example code that shows how to use rocFFT to compute a 1D single precision 16-point complex forward transform.

#include <iostream>
#include <vector>
#include "hip/hip_runtime_api.h"
#include "hip/hip_vector_types.h"
#include "rocfft.h"

int main()
{
        // rocFFT gpu compute
        // ========================================

        size_t N = 16;
        size_t Nbytes = N * sizeof(float2);

        // Create HIP device buffer
        float2 *x;
        hipMalloc(&x, Nbytes);

        // Initialize data
        std::vector<float2> cx(N);
        for (size_t i = 0; i < N; i++)
        {
                cx[i].x = 1;
                cx[i].y = -1;
        }

        //  Copy data to device
        hipMemcpy(x, cx.data(), Nbytes, hipMemcpyHostToDevice);

        // Create rocFFT plan
        rocfft_plan plan = NULL;
        size_t length = N;
        rocfft_plan_create(&plan, rocfft_placement_inplace, rocfft_transform_type_complex_forward, rocfft_precision_single, 1,                        &length, 1, NULL);

        // Execute plan
        rocfft_execute(plan, (void**) &x, NULL, NULL);

        // Wait for execution to finish
        hipDeviceSynchronize();

        // Destroy plan
        rocfft_plan_destroy(plan);

        // Copy result back to host
        std::vector<float2> y(N);
        hipMemcpy(y.data(), x, Nbytes, hipMemcpyDeviceToHost);

        // Print results
        for (size_t i = 0; i < N; i++)
        {
                std::cout << y[i].x << ", " << y[i].y << std::endl;
        }

        // Free device buffer
        hipFree(x);

        return 0;
  }

API

This section provides details of the library API

Types

There are few data structures that are internal to the library. The pointer types to these structures are given below. The user would need to use these types to create handles and pass them between different library functions.

typedef struct rocfft_plan_t *rocfft_plan

Pointer type to plan structure.

This type is used to declare a plan handle that can be initialized with rocfft_plan_create

typedef struct rocfft_plan_description_t *rocfft_plan_description

Pointer type to plan description structure.

This type is used to declare a plan description handle that can be initialized with rocfft_plan_description_create

typedef struct rocfft_execution_info_t *rocfft_execution_info

Pointer type to execution info structure.

This type is used to declare an execution info handle that can be initialized with rocfft_execution_info_create

Library Setup and Cleanup

The following functions deals with initialization and cleanup of the library.

rocfft_status rocfft_setup()

Library setup function, called once in program before start of library use.

rocfft_status rocfft_cleanup()

Library cleanup function, called once in program after end of library use.

Plan

The following functions are used to create and destroy plan objects.

rocfft_status rocfft_plan_create(rocfft_plan *plan, rocfft_result_placement placement, rocfft_transform_type transform_type, rocfft_precision precision, size_t dimensions, const size_t *lengths, size_t number_of_transforms, const rocfft_plan_description description)

Create an FFT plan.

This API creates a plan, which the user can execute subsequently. This function takes many of the fundamental parameters needed to specify a transform. The parameters are self explanatory. The dimensions parameter can take a value of 1,2 or 3. The ‘lengths’ array specifies size of data in each dimension. Note that lengths[0] is the size of the innermost dimension, lengths[1] is the next higher dimension and so on. The ‘number_of_transforms’ parameter specifies how many transforms (of the same kind) needs to be computed. By specifying a value greater than 1, a batch of transforms can be computed with a single api call. Additionally, a handle to a plan description can be passed for more detailed transforms. For simple transforms, this parameter can be set to null ptr.

Parameters
  • [out] plan: plan handle

  • [in] placement: placement of result

  • [in] transform_type: type of transform

  • [in] precision: precision

  • [in] dimensions: dimensions

  • [in] lengths: dimensions sized array of transform lengths

  • [in] number_of_transforms: number of transforms

  • [in] description: description handle created by rocfft_plan_description_create; can be null ptr for simple transforms

rocfft_status rocfft_plan_destroy(rocfft_plan plan)

Destroy an FFT plan.

This API frees the plan. This function destructs a plan after it is no longer needed.

Parameters
  • [in] plan: plan handle

The following functions are used to query for information after a plan is created.

rocfft_status rocfft_plan_get_work_buffer_size(const rocfft_plan plan, size_t *size_in_bytes)

Get work buffer size.

This is one of plan query functions to obtain information regarding a plan. This API gets the work buffer size.

Parameters
  • [in] plan: plan handle

  • [out] size_in_bytes: size of needed work buffer in bytes

rocfft_status rocfft_plan_get_print(const rocfft_plan plan)

Print all plan information.

This is one of plan query functions to obtain information regarding a plan. This API prints all plan info to stdout to help user verify plan specification.

Parameters
  • [in] plan: plan handle

Plan description

Most of the times, rocfft_plan_create() is all is needed to fully specify a transform. And the description object can be skipped. But when a transform specification has more details a description object need to be created and set up and the handle passed to the rocfft_plan_create(). Functions referred below can be used to manage plan description in order to specify more transform details. The plan description object can be safely deleted after call to the plan api rocfft_plan_create().

rocfft_status rocfft_plan_description_create(rocfft_plan_description *description)

Create plan description.

This API creates a plan description with which the user can set more plan properties

Parameters
  • [out] description: plan description handle

rocfft_status rocfft_plan_description_destroy(rocfft_plan_description description)

Destroy a plan description.

This API frees the plan description

Parameters
  • [in] description: plan description handle

rocfft_status rocfft_plan_description_set_data_layout(rocfft_plan_description description, rocfft_array_type in_array_type, rocfft_array_type out_array_type, const size_t *in_offsets, const size_t *out_offsets, size_t in_strides_size, const size_t *in_strides, size_t in_distance, size_t out_strides_size, const size_t *out_strides, size_t out_distance)

Set data layout.

This is one of plan description functions to specify optional additional plan properties using the description handle. This API specifies the layout of buffers. This function can be used to specify input and output array types. Not all combinations of array types are supported and error code will be returned for unsupported cases. Additionally, input and output buffer offsets can be specified. The function can be used to specify custom layout of data, with the ability to specify stride between consecutive elements in all dimensions. Also, distance between transform data members can be specified. The library will choose appropriate defaults if offsets/strides are set to null ptr and/or distances set to 0.

Parameters
  • [in] description: description handle

  • [in] in_array_type: array type of input buffer

  • [in] out_array_type: array type of output buffer

  • [in] in_offsets: offsets, in element units, to start of data in input buffer

  • [in] out_offsets: offsets, in element units, to start of data in output buffer

  • [in] in_strides_size: size of in_strides array (must be equal to transform dimensions)

  • [in] in_strides: array of strides, in each dimension, of input buffer; if set to null ptr library chooses defaults

  • [in] in_distance: distance between start of each data instance in input buffer

  • [in] out_strides_size: size of out_strides array (must be equal to transform dimensions)

  • [in] out_strides: array of strides, in each dimension, of output buffer; if set to null ptr library chooses defaults

  • [in] out_distance: distance between start of each data instance in output buffer

Execution

The following details the execution function. After a plan has been created, it can be used to compute a transform on specified data. Aspects of the execution can be controlled and any useful information returned to the user.

rocfft_status rocfft_execute(const rocfft_plan plan, void *in_buffer[], void *out_buffer[], rocfft_execution_info info)

Execute an FFT plan.

This API executes an FFT plan on buffers given by the user. If the transform is in-place, only the input buffer is needed and the output buffer parameter can be set to NULL. For not in-place transforms, output buffers have to be specified. Note that both input and output buffer are arrays of pointers, this is to facilitate passing planar buffers where real and imaginary parts are in 2 separate buffers. For the default interleaved format, just a unit sized array holding the pointer to input/output buffer need to be passed. The final parameter in this function is an execution_info handle. This parameter serves as a way for the user to control execution, as well as for the library to pass any execution related information back to the user.

Parameters
  • [in] plan: plan handle

  • [inout] in_buffer: array (of size 1 for interleaved data, of size 2 for planar data) of input buffers

  • [inout] out_buffer: array (of size 1 for interleaved data, of size 2 for planar data) of output buffers, can be nullptr for inplace result placement

  • [in] info: execution info handle created by rocfft_execution_info_create

Execution info

The execution api rocfft_execute() takes a rocfft_execution_info parameter. This parameter needs to be created and setup by the user and passed to the execution api. The execution info handle encapsulates information such as execution mode, pointer to any work buffer etc. It can also hold information that are side effect of execution such as event objects. The following functions deal with managing execution info object. Note that the set functions below need to be called before execution and get functions after execution.

rocfft_status rocfft_execution_info_create(rocfft_execution_info *info)

Create execution info.

This API creates an execution info with which the user can control plan execution & retrieve execution information

Parameters
  • [out] info: execution info handle

rocfft_status rocfft_execution_info_destroy(rocfft_execution_info info)

Destroy an execution info.

This API frees the execution info

Parameters
  • [in] info: execution info handle

rocfft_status rocfft_execution_info_set_work_buffer(rocfft_execution_info info, void *work_buffer, size_t size_in_bytes)

Set work buffer in execution info.

This is one of the execution info functions to specify optional additional information to control execution. This API specifies work buffer needed. It has to be called before the call to rocfft_execute. When a non-zero value is obtained from rocfft_plan_get_work_buffer_size, that means the library needs a work buffer to compute the transform. In this case, the user has to allocate the work buffer and pass it to the library via this api.

Parameters
  • [in] info: execution info handle

  • [in] work_buffer: work buffer

  • [in] size_in_bytes: size of work buffer in bytes

rocfft_status rocfft_execution_info_set_stream(rocfft_execution_info info, void *stream)

Set stream in execution info.

This is one of the execution info functions to specify optional additional information to control execution. This API specifies compute stream. It has to be called before the call to rocfft_execute. It is the underlying device queue/stream where the library computations would be inserted. The library assumes user has created such a stream in the program and merely assigns work to the stream.

Parameters
  • [in] info: execution info handle

  • [in] stream: underlying compute stream

Enumerations

This section provides all the enumerations used.

enum rocfft_status

rocfft status/error codes

Values:

rocfft_status_success
rocfft_status_failure
rocfft_status_invalid_arg_value
rocfft_status_invalid_dimensions
rocfft_status_invalid_array_type
rocfft_status_invalid_strides
rocfft_status_invalid_distance
rocfft_status_invalid_offset
enum rocfft_transform_type

Type of transform.

Values:

rocfft_transform_type_complex_forward
rocfft_transform_type_complex_inverse
rocfft_transform_type_real_forward
rocfft_transform_type_real_inverse
enum rocfft_precision

Precision.

Values:

rocfft_precision_single
rocfft_precision_double
enum rocfft_result_placement

Result placement.

Values:

rocfft_placement_inplace
rocfft_placement_notinplace
enum rocfft_array_type

Array type.

Values:

rocfft_array_type_complex_interleaved
rocfft_array_type_complex_planar
rocfft_array_type_real
rocfft_array_type_hermitian_interleaved
rocfft_array_type_hermitian_planar
enum rocfft_execution_mode

Execution mode.

Values:

rocfft_exec_mode_nonblocking
rocfft_exec_mode_nonblocking_with_flush
rocfft_exec_mode_blocking

rocSPARSE

Introduction

rocSPARSE is a library that contains basic linear algebra subroutines for sparse matrices and vectors written in HiP for GPU devices. It is designed to be used from C and C++ code.

The functionality of rocSPARSE is organized in the following categories:

The code is open and hosted here: https://github.com/ROCmSoftwarePlatform/rocSPARSE

Device and Stream Management

hipSetDevice() and hipGetDevice() are HIP device management APIs. They are NOT part of the rocSPARSE API.

Asynchronous Execution

All rocSPARSE library functions, unless otherwise stated, are non blocking and executed asynchronously with respect to the host. They may return before the actual computation has finished. To force synchronization, hipDeviceSynchronize() or hipStreamSynchronize() can be used. This will ensure that all previously executed rocSPARSE functions on the device / this particular stream have completed.

HIP Device Management

Before a HIP kernel invocation, users need to call hipSetDevice() to set a device, e.g. device 1. If users do not explicitly call it, the system by default sets it as device 0. Unless users explicitly call hipSetDevice() to set to another device, their HIP kernels are always launched on device 0.

The above is a HIP (and CUDA) device management approach and has nothing to do with rocSPARSE. rocSPARSE honors the approach above and assumes users have already set the device before a rocSPARSE routine call.

Once users set the device, they create a handle with rocsparse_create_handle().

Subsequent rocSPARSE routines take this handle as an input parameter. rocSPARSE ONLY queries (by hipGetDevice()) the user’s device; rocSPARSE does NOT set the device for users. If rocSPARSE does not see a valid device, it returns an error message. It is the users’ responsibility to provide a valid device to rocSPARSE and ensure the device safety.

Users CANNOT switch devices between rocsparse_create_handle() and rocsparse_destroy_handle(). If users want to change device, they must destroy the current handle and create another rocSPARSE handle.

HIP Stream Management

HIP kernels are always launched in a queue (also known as stream).

If users do not explicitly specify a stream, the system provides a default stream, maintained by the system. Users cannot create or destroy the default stream. However, users can freely create new streams (with hipStreamCreate()) and bind it to the rocSPARSE handle. HIP kernels are invoked in rocSPARSE routines. The rocSPARSE handle is always associated with a stream, and rocSPARSE passes its stream to the kernels inside the routine. One rocSPARSE routine only takes one stream in a single invocation. If users create a stream, they are responsible for destroying it.

Multiple Streams and Multiple Devices

If the system under test has multiple HIP devices, users can run multiple rocSPARSE handles concurrently, but can NOT run a single rocSPARSE handle on different discrete devices. Each handle is associated with a particular singular device, and a new handle should be created for each additional device.

Building and Installing

Installing from AMD ROCm repositories

rocSPARSE can be installed from AMD ROCm repositories by

sudo apt install rocsparse

Building rocSPARSE from Open-Source repository

Download rocSPARSE

The rocSPARSE source code is available at the rocSPARSE github page. Download the master branch using:

git clone -b master https://github.com/ROCmSoftwarePlatform/rocSPARSE.git
cd rocSPARSE

Note that if you want to contribute to rocSPARSE, you will need to checkout the develop branch instead of the master branch.

Below are steps to build different packages of the library, including dependencies and clients. It is recommended to install rocSPARSE using the install.sh script.

Using install.sh to build dependencies + library

The following table lists common uses of install.sh to build dependencies + library.

Using install.sh to build dependencies + library + client

The client contains example code, unit tests and benchmarks. Common uses of install.sh to build them are listed in the table below.

Command

Description

./install.sh -h

Print help information.

./install.sh -dc

Build dependencies, library and client in your local directory. The -d flag only needs to be |br| used once. For subsequent invocations of install.sh it is not necessary to rebuild the |br| dependencies.

./install.sh -c

Build library and client in your local directory. It is assumed dependencies are available.

./install.sh -idc

Build library, dependencies and client, then build and install rocSPARSE package in |br| /opt/rocm/rocsparse. You will be prompted for sudo access. This will install for all users.

./install.sh -ic

Build library and client, then build and install rocSPARSE package in opt/rocm/rocsparse. |br| You will be prompted for sudo access. This will install for all users.

Using individual commands to build rocSPARSE

CMake 3.5 or later is required in order to build rocSPARSE. The rocSPARSE library contains both, host and device code, therefore the HCC compiler must be specified during cmake configuration process.

rocSPARSE can be built using the following commands:

# Create and change to build directory
mkdir -p build/release ; cd build/release

# Default install path is /opt/rocm, use -DCMAKE_INSTALL_PREFIX=<path> to adjust it
CXX=/opt/rocm/bin/hcc cmake ../..

# Compile rocSPARSE library
make -j$(nproc)

# Install rocSPARSE to /opt/rocm
sudo make install

Boost and GoogleTest is required in order to build rocSPARSE client.

rocSPARSE with dependencies and client can be built using the following commands:

# Install boost on Ubuntu
sudo apt install libboost-program-options-dev
# Install boost on Fedora
sudo dnf install boost-program-options

# Install googletest
mkdir -p build/release/deps ; cd build/release/deps
cmake -DBUILD_BOOST=OFF ../../../deps
sudo make -j$(nproc) install

# Change to build directory
cd ..

# Configure rocSPARSE
# Build options:
#   BUILD_CLIENTS_TESTS      - build tests (OFF)
#   BUILD_CLIENTS_BENCHMARKS - build benchmarks (OFF)
#   BUILD_CLIENTS_SAMPLES    - build examples (ON)
#   BUILD_VERBOSE            - verbose output (OFF)
#   BUILD_SHARED_LIBS        - build rocSPARSE as a shared library (ON)

# Default install path is /opt/rocm, use -DCMAKE_INSTALL_PREFIX=<path> to adjust it
CXX=/opt/rocm/bin/hcc cmake ../.. -DBUILD_CLIENTS_TESTS=ON \
                                  -DBUILD_CLIENTS_BENCHMARKS=ON \
                                  -DBUILD_CLIENTS_SAMPLES=ON \
                                  -DBUILD_VERBOSE=OFF \
                                  -DBUILD_SHARED_LIBS=ON

# Compile rocSPARSE library
make -j$(nproc)

# Install rocSPARSE to /opt/rocm
sudo make install

Common build problems

  1. Issue: HIP (/opt/rocm/hip) was built using hcc 1.0.xxx-xxx-xxx-xxx, but you are using /opt/rocm/bin/hcc with version 1.0.yyy-yyy-yyy-yyy from hipcc (version mismatch). Please rebuild HIP including cmake or update HCC_HOME variable.

    Solution: Download HIP from github and use hcc to build from source and then use the built HIP instead of /opt/rocm/hip.

  2. Issue: For Carrizo - HCC RUNTIME ERROR: Failed to find compatible kernel

    Solution: Add the following to the cmake command when configuring: -DCMAKE_CXX_FLAGS=”–amdgpu-target=gfx801”

  3. Issue: For MI25 (Vega10 Server) - HCC RUNTIME ERROR: Failed to find compatible kernel

    Solution: export HCC_AMDGPU_TARGET=gfx900

  4. Issue: Could not find a package configuration file provided by “ROCM” with any of the following names:

    ROCMConfig.cmake |br| rocm-config.cmake

    Solution: Install ROCm cmake modules

Unit tests

To run unit tests, rocSPARSE has to be built with option -DBUILD_CLIENTS_TESTS=ON.

# Go to rocSPARSE build directory
cd rocSPARSE; cd build/release

# Run all tests
./clients/staging/rocsparse-test

Benchmarks

To run benchmarks, rocSPARSE has to be built with option -DBUILD_CLIENTS_BENCHMARKS=ON.

# Go to rocSPARSE build directory
cd rocSPARSE/build/release

# Run benchmark, e.g.
./clients/staging/rocsparse-bench -f hybmv --laplacian-dim 2000 -i 200

Storage Formats

COO storage format

The Coordinate (COO) storage format represents a \(m \times n\) matrix by

m

number of rows (integer).

n

number of columns (integer).

nnz

number of non-zero elements (integer).

coo_val

array of nnz elements containing the data (floating point).

coo_row_ind

array of nnz elements containing the row indices (integer).

coo_col_ind

array of nnz elements containing the column indices (integer).

The COO matrix is expected to be sorted by row indices and column indices per row. Furthermore, each pair of indices should appear only once. Consider the following \(3 \times 5\) matrix and the corresponding COO structures, with \(m = 3, n = 5\) and \(\text{nnz} = 8\) using zero based indexing:

\[\begin{split}A = \begin{pmatrix} 1.0 & 2.0 & 0.0 & 3.0 & 0.0 \\ 0.0 & 4.0 & 5.0 & 0.0 & 0.0 \\ 6.0 & 0.0 & 0.0 & 7.0 & 8.0 \\ \end{pmatrix}\end{split}\]

where

\[\begin{split}\begin{array}{ll} coo\_val[8] & = \{1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0\} \\ coo\_row\_ind[8] & = \{0, 0, 0, 1, 1, 2, 2, 2\} \\ coo\_col\_ind[8] & = \{0, 1, 3, 1, 2, 0, 3, 4\} \end{array}\end{split}\]

CSR storage format

The Compressed Sparse Row (CSR) storage format represents a \(m \times n\) matrix by

m

number of rows (integer).

n

number of columns (integer).

nnz

number of non-zero elements (integer).

csr_val

array of nnz elements containing the data (floating point).

csr_row_ptr

array of m+1 elements that point to the start of every row (integer).

csr_col_ind

array of nnz elements containing the column indices (integer).

The CSR matrix is expected to be sorted by column indices within each row. Furthermore, each pair of indices should appear only once. Consider the following \(3 \times 5\) matrix and the corresponding CSR structures, with \(m = 3, n = 5\) and \(\text{nnz} = 8\) using one based indexing:

\[\begin{split}A = \begin{pmatrix} 1.0 & 2.0 & 0.0 & 3.0 & 0.0 \\ 0.0 & 4.0 & 5.0 & 0.0 & 0.0 \\ 6.0 & 0.0 & 0.0 & 7.0 & 8.0 \\ \end{pmatrix}\end{split}\]

where

\[\begin{split}\begin{array}{ll} csr\_val[8] & = \{1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0\} \\ csr\_row\_ptr[4] & = \{1, 4, 6, 9\} \\ csr\_col\_ind[8] & = \{1, 2, 4, 2, 3, 1, 4, 5\} \end{array}\end{split}\]

ELL storage format

The Ellpack-Itpack (ELL) storage format represents a \(m \times n\) matrix by

m

number of rows (integer).

n

number of columns (integer).

ell_width

maximum number of non-zero elements per row (integer)

ell_val

array of m times ell_width elements containing the data (floating point).

ell_col_ind

array of m times ell_width elements containing the column indices (integer).

The ELL matrix is assumed to be stored in column-major format. Rows with less than ell_width non-zero elements are padded with zeros (ell_val) and \(-1\) (ell_col_ind). Consider the following \(3 \times 5\) matrix and the corresponding ELL structures, with \(m = 3, n = 5\) and \(\text{ell_width} = 3\) using zero based indexing:

\[\begin{split}A = \begin{pmatrix} 1.0 & 2.0 & 0.0 & 3.0 & 0.0 \\ 0.0 & 4.0 & 5.0 & 0.0 & 0.0 \\ 6.0 & 0.0 & 0.0 & 7.0 & 8.0 \\ \end{pmatrix}\end{split}\]

where

\[\begin{split}\begin{array}{ll} ell\_val[9] & = \{1.0, 4.0, 6.0, 2.0, 5.0, 7.0, 3.0, 0.0, 8.0\} \\ ell\_col\_ind[9] & = \{0, 1, 0, 1, 2, 3, 3, -1, 4\} \end{array}\end{split}\]

HYB storage format

The Hybrid (HYB) storage format represents a \(m \times n\) matrix by

m

number of rows (integer).

n

number of columns (integer).

nnz

number of non-zero elements of the COO part (integer)

ell_width

maximum number of non-zero elements per row of the ELL part (integer)

ell_val

array of m times ell_width elements containing the ELL part data (floating point).

ell_col_ind

array of m times ell_width elements containing the ELL part column indices (integer).

coo_val

array of nnz elements containing the COO part data (floating point).

coo_row_ind

array of nnz elements containing the COO part row indices (integer).

coo_col_ind

array of nnz elements containing the COO part column indices (integer).

The HYB format is a combination of the ELL and COO sparse matrix formats. Typically, the regular part of the matrix is stored in ELL storage format, and the irregular part of the matrix is stored in COO storage format. Three different partitioning schemes can be applied when converting a CSR matrix to a matrix in HYB storage format. For further details on the partitioning schemes, see rocsparse_hyb_partition_.

Types

rocsparse_handle

typedef struct _rocsparse_handle *rocsparse_handle

Handle to the rocSPARSE library context queue.

The rocSPARSE handle is a structure holding the rocSPARSE library context. It must be initialized using rocsparse_create_handle() and the returned handle must be passed to all subsequent library function calls. It should be destroyed at the end using rocsparse_destroy_handle().

rocsparse_mat_descr

typedef struct _rocsparse_mat_descr *rocsparse_mat_descr

Descriptor of the matrix.

The rocSPARSE matrix descriptor is a structure holding all properties of a matrix. It must be initialized using rocsparse_create_mat_descr() and the returned descriptor must be passed to all subsequent library calls that involve the matrix. It should be destroyed at the end using rocsparse_destroy_mat_descr().

rocsparse_mat_info

typedef struct _rocsparse_mat_info *rocsparse_mat_info

Info structure to hold all matrix meta data.

The rocSPARSE matrix info is a structure holding all matrix information that is gathered during analysis routines. It must be initialized using rocsparse_create_mat_info() and the returned info structure must be passed to all subsequent library calls that require additional matrix information. It should be destroyed at the end using rocsparse_destroy_mat_info().

rocsparse_hyb_mat

typedef struct _rocsparse_hyb_mat *rocsparse_hyb_mat

HYB matrix storage format.

The rocSPARSE HYB matrix structure holds the HYB matrix. It must be initialized using rocsparse_create_hyb_mat() and the returned HYB matrix must be passed to all subsequent library calls that involve the matrix. It should be destroyed at the end using rocsparse_destroy_hyb_mat().

rocsparse_action

enum rocsparse_action

Specify where the operation is performed on.

The rocsparse_action indicates whether the operation is performed on the full matrix, or only on the sparsity pattern of the matrix.

Values:

rocsparse_action_symbolic = 0

Operate only on indices.

rocsparse_action_numeric = 1

Operate on data and indices.

rocsparse_hyb_partition

enum rocsparse_hyb_partition

HYB matrix partitioning type.

The rocsparse_hyb_partition type indicates how the hybrid format partitioning between COO and ELL storage formats is performed.

Values:

rocsparse_hyb_partition_auto = 0

automatically decide on ELL nnz per row.

rocsparse_hyb_partition_user = 1

user given ELL nnz per row.

rocsparse_hyb_partition_max = 2

max ELL nnz per row, no COO part.

rocsparse_index_base

enum rocsparse_index_base

Specify the matrix index base.

The rocsparse_index_base indicates the index base of the indices. For a given rocsparse_mat_descr, the rocsparse_index_base can be set using rocsparse_set_mat_index_base(). The current rocsparse_index_base of a matrix can be obtained by rocsparse_get_mat_index_base().

Values:

rocsparse_index_base_zero = 0

zero based indexing.

rocsparse_index_base_one = 1

one based indexing.

rocsparse_matrix_type

enum rocsparse_matrix_type

Specify the matrix type.

The rocsparse_matrix_type indices the type of a matrix. For a given rocsparse_mat_descr, the rocsparse_matrix_type can be set using rocsparse_set_mat_type(). The current rocsparse_matrix_type of a matrix can be obtained by rocsparse_get_mat_type().

Values:

rocsparse_matrix_type_general = 0

general matrix type.

rocsparse_matrix_type_symmetric = 1

symmetric matrix type.

rocsparse_matrix_type_hermitian = 2

hermitian matrix type.

rocsparse_matrix_type_triangular = 3

triangular matrix type.

rocsparse_fill_mode

enum rocsparse_fill_mode

Specify the matrix fill mode.

The rocsparse_fill_mode indicates whether the lower or the upper part is stored in a sparse triangular matrix. For a given rocsparse_mat_descr, the rocsparse_fill_mode can be set using rocsparse_set_mat_fill_mode(). The current rocsparse_fill_mode of a matrix can be obtained by rocsparse_get_mat_fill_mode().

Values:

rocsparse_fill_mode_lower = 0

lower triangular part is stored.

rocsparse_fill_mode_upper = 1

upper triangular part is stored.

rocsparse_diag_type

enum rocsparse_diag_type

Indicates if the diagonal entries are unity.

The rocsparse_diag_type indicates whether the diagonal entries of a matrix are unity or not. If rocsparse_diag_type_unit is specified, all present diagonal values will be ignored. For a given rocsparse_mat_descr, the rocsparse_diag_type can be set using rocsparse_set_mat_diag_type(). The current rocsparse_diag_type of a matrix can be obtained by rocsparse_get_mat_diag_type().

Values:

rocsparse_diag_type_non_unit = 0

diagonal entries are non-unity.

rocsparse_diag_type_unit = 1

diagonal entries are unity

rocsparse_operation

enum rocsparse_operation

Specify whether the matrix is to be transposed or not.

The rocsparse_operation indicates the operation performed with the given matrix.

Values:

rocsparse_operation_none = 111

Operate with matrix.

rocsparse_operation_transpose = 112

Operate with transpose.

rocsparse_operation_conjugate_transpose = 113

Operate with conj. transpose.

rocsparse_pointer_mode

enum rocsparse_pointer_mode

Indicates if the pointer is device pointer or host pointer.

The rocsparse_pointer_mode indicates whether scalar values are passed by reference on the host or device. The rocsparse_pointer_mode can be changed by rocsparse_set_pointer_mode(). The currently used pointer mode can be obtained by rocsparse_get_pointer_mode().

Values:

rocsparse_pointer_mode_host = 0

scalar pointers are in host memory.

rocsparse_pointer_mode_device = 1

scalar pointers are in device memory.

rocsparse_analysis_policy

enum rocsparse_analysis_policy

Specify policy in analysis functions.

The rocsparse_analysis_policy specifies whether gathered analysis data should be re-used or not. If meta data from a previous e.g. rocsparse_csrilu0_analysis() call is available, it can be re-used for subsequent calls to e.g. rocsparse_csrsv_analysis() and greatly improve performance of the analysis function.

Values:

rocsparse_analysis_policy_reuse = 0

try to re-use meta data.

rocsparse_analysis_policy_force = 1

force to re-build meta data.

rocsparse_solve_policy

enum rocsparse_solve_policy

Specify policy in triangular solvers and factorizations.

This is a placeholder.

Values:

rocsparse_solve_policy_auto = 0

automatically decide on level information.

rocsparse_layer_mode

enum rocsparse_layer_mode

Indicates if layer is active with bitmask.

The rocsparse_layer_mode bit mask indicates the logging characteristics.

Values:

rocsparse_layer_mode_none = 0x0

layer is not active.

rocsparse_layer_mode_log_trace = 0x1

layer is in logging mode.

rocsparse_layer_mode_log_bench = 0x2

layer is in benchmarking mode.

rocsparse_status

enum rocsparse_status

List of rocsparse status codes definition.

This is a list of the rocsparse_status types that are used by the rocSPARSE library.

Values:

rocsparse_status_success = 0

success.

rocsparse_status_invalid_handle = 1

handle not initialized, invalid or null.

rocsparse_status_not_implemented = 2

function is not implemented.

rocsparse_status_invalid_pointer = 3

invalid pointer parameter.

rocsparse_status_invalid_size = 4

invalid size parameter.

rocsparse_status_memory_error = 5

failed memory allocation, copy, dealloc.

rocsparse_status_internal_error = 6

other internal library failure.

rocsparse_status_invalid_value = 7

invalid value parameter.

rocsparse_status_arch_mismatch = 8

device arch is not supported.

rocsparse_status_zero_pivot = 9

encountered zero pivot.

Logging

Three different environment variables can be set to enable logging in rocSPARSE: ROCSPARSE_LAYER, ROCSPARSE_LOG_TRACE_PATH and ROCSPARSE_LOG_BENCH_PATH.

ROCSPARSE_LAYER is a bit mask, where several logging modes can be combined as follows:

ROCSPARSE_LAYER unset

logging is disabled.

ROCSPARSE_LAYER set to 1

trace logging is enabled.

ROCSPARSE_LAYER set to 2

bench logging is enabled.

ROCSPARSE_LAYER set to 3

trace logging and bench logging is enabled.

When logging is enabled, each rocSPARSE function call will write the function name as well as function arguments to the logging stream. The default logging stream is stderr.

If the user sets the environment variable ROCSPARSE_LOG_TRACE_PATH to the full path name for a file, the file is opened and trace logging is streamed to that file. If the user sets the environment variable ROCSPARSE_LOG_BENCH_PATH to the full path name for a file, the file is opened and bench logging is streamed to that file. If the file cannot be opened, logging output is stream to stderr.

Note that performance will degrade when logging is enabled. By default, the environment variable ROCSPARSE_LAYER is unset and logging is disabled.

Sparse Auxiliary Functions

This module holds all sparse auxiliary functions.

The functions that are contained in the auxiliary module describe all available helper functions that are required for subsequent library calls.

rocsparse_create_handle()

rocsparse_status rocsparse_create_handle(rocsparse_handle *handle)

Create a rocsparse handle.

rocsparse_create_handle creates the rocSPARSE library context. It must be initialized before any other rocSPARSE API function is invoked and must be passed to all subsequent library function calls. The handle should be destroyed at the end using rocsparse_destroy_handle().

Parameters
  • [out] handle: the pointer to the handle to the rocSPARSE library context.

Return Value
  • rocsparse_status_success: the initialization succeeded.

  • rocsparse_status_invalid_handle: handle pointer is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

rocsparse_destroy_handle()

rocsparse_status rocsparse_destroy_handle(rocsparse_handle handle)

Destroy a rocsparse handle.

rocsparse_destroy_handle destroys the rocSPARSE library context and releases all resources used by the rocSPARSE library.

Parameters
  • [in] handle: the handle to the rocSPARSE library context.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: handle is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

rocsparse_set_stream()

rocsparse_status rocsparse_set_stream(rocsparse_handle handle, hipStream_t stream)

Specify user defined HIP stream.

rocsparse_set_stream specifies the stream to be used by the rocSPARSE library context and all subsequent function calls.

Example

This example illustrates, how a user defined stream can be used in rocSPARSE.

// Create rocSPARSE handle
rocsparse_handle handle;
rocsparse_create_handle(&handle);

// Create stream
hipStream_t stream;
hipStreamCreate(&stream);

// Set stream to rocSPARSE handle
rocsparse_set_stream(handle, stream);

// Do some work
// ...

// Clean up
rocsparse_destroy_handle(handle);
hipStreamDestroy(stream);

Parameters
  • [inout] handle: the handle to the rocSPARSE library context.

  • [in] stream: the stream to be used by the rocSPARSE library context.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: handle is invalid.

rocsparse_get_stream()

rocsparse_status rocsparse_get_stream(rocsparse_handle handle, hipStream_t *stream)

Get current stream from library context.

rocsparse_get_stream gets the rocSPARSE library context stream which is currently used for all subsequent function calls.

Parameters
  • [in] handle: the handle to the rocSPARSE library context.

  • [out] stream: the stream currently used by the rocSPARSE library context.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: handle is invalid.

rocsparse_set_pointer_mode()

rocsparse_status rocsparse_set_pointer_mode(rocsparse_handle handle, rocsparse_pointer_mode pointer_mode)

Specify pointer mode.

rocsparse_set_pointer_mode specifies the pointer mode to be used by the rocSPARSE library context and all subsequent function calls. By default, all values are passed by reference on the host. Valid pointer modes are rocsparse_pointer_mode_host or rocsparse_pointer_mode_device.

Parameters
  • [in] handle: the handle to the rocSPARSE library context.

  • [in] pointer_mode: the pointer mode to be used by the rocSPARSE library context.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: handle is invalid.

rocsparse_get_pointer_mode()

rocsparse_status rocsparse_get_pointer_mode(rocsparse_handle handle, rocsparse_pointer_mode *pointer_mode)

Get current pointer mode from library context.

rocsparse_get_pointer_mode gets the rocSPARSE library context pointer mode which is currently used for all subsequent function calls.

Parameters
  • [in] handle: the handle to the rocSPARSE library context.

  • [out] pointer_mode: the pointer mode that is currently used by the rocSPARSE library context.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: handle is invalid.

rocsparse_get_version()

rocsparse_status rocsparse_get_version(rocsparse_handle handle, int *version)

Get rocSPARSE version.

rocsparse_get_version gets the rocSPARSE library version number.

  • patch = version % 100

  • minor = version / 100 % 1000

  • major = version / 100000

Parameters
  • [in] handle: the handle to the rocSPARSE library context.

  • [out] version: the version number of the rocSPARSE library.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: handle is invalid.

rocsparse_get_git_rev()

rocsparse_status rocsparse_get_git_rev(rocsparse_handle handle, char *rev)

Get rocSPARSE git revision.

rocsparse_get_git_rev gets the rocSPARSE library git commit revision (SHA-1).

Parameters
  • [in] handle: the handle to the rocSPARSE library context.

  • [out] rev: the git commit revision (SHA-1).

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: handle is invalid.

rocsparse_create_mat_descr()

rocsparse_status rocsparse_create_mat_descr(rocsparse_mat_descr *descr)

Create a matrix descriptor.

rocsparse_create_mat_descr creates a matrix descriptor. It initializes rocsparse_matrix_type to rocsparse_matrix_type_general and rocsparse_index_base to rocsparse_index_base_zero. It should be destroyed at the end using rocsparse_destroy_mat_descr().

Parameters
  • [out] descr: the pointer to the matrix descriptor.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_pointer: descr pointer is invalid.

rocsparse_destroy_mat_descr()

rocsparse_status rocsparse_destroy_mat_descr(rocsparse_mat_descr descr)

Destroy a matrix descriptor.

rocsparse_destroy_mat_descr destroys a matrix descriptor and releases all resources used by the descriptor.

Parameters
  • [in] descr: the matrix descriptor.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_pointer: descr is invalid.

rocsparse_copy_mat_descr()

rocsparse_status rocsparse_copy_mat_descr(rocsparse_mat_descr dest, const rocsparse_mat_descr src)

Copy a matrix descriptor.

rocsparse_copy_mat_descr copies a matrix descriptor. Both, source and destination matrix descriptors must be initialized prior to calling rocsparse_copy_mat_descr.

Parameters
  • [out] dest: the pointer to the destination matrix descriptor.

  • [in] src: the pointer to the source matrix descriptor.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_pointer: src or dest pointer is invalid.

rocsparse_set_mat_index_base()

rocsparse_status rocsparse_set_mat_index_base(rocsparse_mat_descr descr, rocsparse_index_base base)

Specify the index base of a matrix descriptor.

rocsparse_set_mat_index_base sets the index base of a matrix descriptor. Valid options are rocsparse_index_base_zero or rocsparse_index_base_one.

Parameters
Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_pointer: descr pointer is invalid.

  • rocsparse_status_invalid_value: base is invalid.

rocsparse_get_mat_index_base()

rocsparse_index_base rocsparse_get_mat_index_base(const rocsparse_mat_descr descr)

Get the index base of a matrix descriptor.

rocsparse_get_mat_index_base returns the index base of a matrix descriptor.

Return

rocsparse_index_base_zero or rocsparse_index_base_one.

Parameters
  • [in] descr: the matrix descriptor.

rocsparse_set_mat_type()

rocsparse_status rocsparse_set_mat_type(rocsparse_mat_descr descr, rocsparse_matrix_type type)

Specify the matrix type of a matrix descriptor.

rocsparse_set_mat_type sets the matrix type of a matrix descriptor. Valid matrix types are rocsparse_matrix_type_general, rocsparse_matrix_type_symmetric, rocsparse_matrix_type_hermitian or rocsparse_matrix_type_triangular.

Parameters
Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_pointer: descr pointer is invalid.

  • rocsparse_status_invalid_value: type is invalid.

rocsparse_get_mat_type()

rocsparse_matrix_type rocsparse_get_mat_type(const rocsparse_mat_descr descr)

Get the matrix type of a matrix descriptor.

rocsparse_get_mat_type returns the matrix type of a matrix descriptor.

Return

rocsparse_matrix_type_general, rocsparse_matrix_type_symmetric, rocsparse_matrix_type_hermitian or rocsparse_matrix_type_triangular.

Parameters
  • [in] descr: the matrix descriptor.

rocsparse_set_mat_fill_mode()

rocsparse_status rocsparse_set_mat_fill_mode(rocsparse_mat_descr descr, rocsparse_fill_mode fill_mode)

Specify the matrix fill mode of a matrix descriptor.

rocsparse_set_mat_fill_mode sets the matrix fill mode of a matrix descriptor. Valid fill modes are rocsparse_fill_mode_lower or rocsparse_fill_mode_upper.

Parameters
Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_pointer: descr pointer is invalid.

  • rocsparse_status_invalid_value: fill_mode is invalid.

rocsparse_get_mat_fill_mode()

rocsparse_fill_mode rocsparse_get_mat_fill_mode(const rocsparse_mat_descr descr)

Get the matrix fill mode of a matrix descriptor.

rocsparse_get_mat_fill_mode returns the matrix fill mode of a matrix descriptor.

Return

rocsparse_fill_mode_lower or rocsparse_fill_mode_upper.

Parameters
  • [in] descr: the matrix descriptor.

rocsparse_set_mat_diag_type()

rocsparse_status rocsparse_set_mat_diag_type(rocsparse_mat_descr descr, rocsparse_diag_type diag_type)

Specify the matrix diagonal type of a matrix descriptor.

rocsparse_set_mat_diag_type sets the matrix diagonal type of a matrix descriptor. Valid diagonal types are rocsparse_diag_type_unit or rocsparse_diag_type_non_unit.

Parameters
Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_pointer: descr pointer is invalid.

  • rocsparse_status_invalid_value: diag_type is invalid.

rocsparse_get_mat_diag_type()

rocsparse_diag_type rocsparse_get_mat_diag_type(const rocsparse_mat_descr descr)

Get the matrix diagonal type of a matrix descriptor.

rocsparse_get_mat_diag_type returns the matrix diagonal type of a matrix descriptor.

Return

rocsparse_diag_type_unit or rocsparse_diag_type_non_unit.

Parameters
  • [in] descr: the matrix descriptor.

rocsparse_create_hyb_mat()

rocsparse_status rocsparse_create_hyb_mat(rocsparse_hyb_mat *hyb)

Create a HYB matrix structure.

rocsparse_create_hyb_mat creates a structure that holds the matrix in HYB storage format. It should be destroyed at the end using rocsparse_destroy_hyb_mat().

Parameters
  • [inout] hyb: the pointer to the hybrid matrix.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_pointer: hyb pointer is invalid.

rocsparse_destroy_hyb_mat()

rocsparse_status rocsparse_destroy_hyb_mat(rocsparse_hyb_mat hyb)

Destroy a HYB matrix structure.

rocsparse_destroy_hyb_mat destroys a HYB structure.

Parameters
  • [in] hyb: the hybrid matrix structure.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_pointer: hyb pointer is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

rocsparse_create_mat_info()

rocsparse_status rocsparse_create_mat_info(rocsparse_mat_info *info)

Create a matrix info structure.

rocsparse_create_mat_info creates a structure that holds the matrix info data that is gathered during the analysis routines available. It should be destroyed at the end using rocsparse_destroy_mat_info().

Parameters
  • [inout] info: the pointer to the info structure.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_pointer: info pointer is invalid.

rocsparse_destroy_mat_info()

rocsparse_status rocsparse_destroy_mat_info(rocsparse_mat_info info)

Destroy a matrix info structure.

rocsparse_destroy_mat_info destroys a matrix info structure.

Parameters
  • [in] info: the info structure.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_pointer: info pointer is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

Sparse Level 1 Functions

The sparse level 1 routines describe operations between a vector in sparse format and a vector in dense format. This section describes all rocSPARSE level 1 sparse linear algebra functions.

rocsparse_axpyi()

rocsparse_status rocsparse_saxpyi(rocsparse_handle handle, rocsparse_int nnz, const float *alpha, const float *x_val, const rocsparse_int *x_ind, float *y, rocsparse_index_base idx_base)
rocsparse_status rocsparse_daxpyi(rocsparse_handle handle, rocsparse_int nnz, const double *alpha, const double *x_val, const rocsparse_int *x_ind, double *y, rocsparse_index_base idx_base)

Scale a sparse vector and add it to a dense vector.

rocsparse_axpyi multiplies the sparse vector \(x\) with scalar \(\alpha\) and adds the result to the dense vector \(y\), such that

\[ y := y + \alpha \cdot x \]

for(i = 0; i < nnz; ++i)
{
    y[x_ind[i]] = y[x_ind[i]] + alpha * x_val[i];
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] nnz: number of non-zero entries of vector \(x\).

  • [in] alpha: scalar \(\alpha\).

  • [in] x_val: array of nnz elements containing the values of \(x\).

  • [in] x_ind: array of nnz elements containing the indices of the non-zero values of \(x\).

  • [inout] y: array of values in dense format.

  • [in] idx_base: rocsparse_index_base_zero or rocsparse_index_base_one.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_value: idx_base is invalid.

  • rocsparse_status_invalid_size: nnz is invalid.

  • rocsparse_status_invalid_pointer: alpha, x_val, x_ind or y pointer is invalid.

rocsparse_doti()

rocsparse_status rocsparse_sdoti(rocsparse_handle handle, rocsparse_int nnz, const float *x_val, const rocsparse_int *x_ind, const float *y, float *result, rocsparse_index_base idx_base)
rocsparse_status rocsparse_ddoti(rocsparse_handle handle, rocsparse_int nnz, const double *x_val, const rocsparse_int *x_ind, const double *y, double *result, rocsparse_index_base idx_base)

Compute the dot product of a sparse vector with a dense vector.

rocsparse_doti computes the dot product of the sparse vector \(x\) with the dense vector \(y\), such that

\[ \text{result} := y^T x \]

for(i = 0; i < nnz; ++i)
{
    result += x_val[i] * y[x_ind[i]];
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] nnz: number of non-zero entries of vector \(x\).

  • [in] x_val: array of nnz values.

  • [in] x_ind: array of nnz elements containing the indices of the non-zero values of \(x\).

  • [in] y: array of values in dense format.

  • [out] result: pointer to the result, can be host or device memory

  • [in] idx_base: rocsparse_index_base_zero or rocsparse_index_base_one.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_value: idx_base is invalid.

  • rocsparse_status_invalid_size: nnz is invalid.

  • rocsparse_status_invalid_pointer: x_val, x_ind, y or result pointer is invalid.

  • rocsparse_status_memory_error: the buffer for the dot product reduction could not be allocated.

  • rocsparse_status_internal_error: an internal error occurred.

rocsparse_gthr()

rocsparse_status rocsparse_sgthr(rocsparse_handle handle, rocsparse_int nnz, const float *y, float *x_val, const rocsparse_int *x_ind, rocsparse_index_base idx_base)
rocsparse_status rocsparse_dgthr(rocsparse_handle handle, rocsparse_int nnz, const double *y, double *x_val, const rocsparse_int *x_ind, rocsparse_index_base idx_base)

Gather elements from a dense vector and store them into a sparse vector.

rocsparse_gthr gathers the elements that are listed in x_ind from the dense vector \(y\) and stores them in the sparse vector \(x\).

for(i = 0; i < nnz; ++i)
{
    x_val[i] = y[x_ind[i]];
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] nnz: number of non-zero entries of \(x\).

  • [in] y: array of values in dense format.

  • [out] x_val: array of nnz elements containing the values of \(x\).

  • [in] x_ind: array of nnz elements containing the indices of the non-zero values of \(x\).

  • [in] idx_base: rocsparse_index_base_zero or rocsparse_index_base_one.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_value: idx_base is invalid.

  • rocsparse_status_invalid_size: nnz is invalid.

  • rocsparse_status_invalid_pointer: y, x_val or x_ind pointer is invalid.

rocsparse_gthrz()

rocsparse_status rocsparse_sgthrz(rocsparse_handle handle, rocsparse_int nnz, float *y, float *x_val, const rocsparse_int *x_ind, rocsparse_index_base idx_base)
rocsparse_status rocsparse_dgthrz(rocsparse_handle handle, rocsparse_int nnz, double *y, double *x_val, const rocsparse_int *x_ind, rocsparse_index_base idx_base)

Gather and zero out elements from a dense vector and store them into a sparse vector.

rocsparse_gthrz gathers the elements that are listed in x_ind from the dense vector \(y\) and stores them in the sparse vector \(x\). The gathered elements in \(y\) are replaced by zero.

for(i = 0; i < nnz; ++i)
{
    x_val[i]    = y[x_ind[i]];
    y[x_ind[i]] = 0;
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] nnz: number of non-zero entries of \(x\).

  • [inout] y: array of values in dense format.

  • [out] x_val: array of nnz elements containing the non-zero values of \(x\).

  • [in] x_ind: array of nnz elements containing the indices of the non-zero values of \(x\).

  • [in] idx_base: rocsparse_index_base_zero or rocsparse_index_base_one.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_value: idx_base is invalid.

  • rocsparse_status_invalid_size: nnz is invalid.

  • rocsparse_status_invalid_pointer: y, x_val or x_ind pointer is invalid.

rocsparse_roti()

rocsparse_status rocsparse_sroti(rocsparse_handle handle, rocsparse_int nnz, float *x_val, const rocsparse_int *x_ind, float *y, const float *c, const float *s, rocsparse_index_base idx_base)
rocsparse_status rocsparse_droti(rocsparse_handle handle, rocsparse_int nnz, double *x_val, const rocsparse_int *x_ind, double *y, const double *c, const double *s, rocsparse_index_base idx_base)

Apply Givens rotation to a dense and a sparse vector.

rocsparse_roti applies the Givens rotation matrix \(G\) to the sparse vector \(x\) and the dense vector \(y\), where

\[\begin{split} G = \begin{pmatrix} c & s \\ -s & c \end{pmatrix} \end{split}\]

for(i = 0; i < nnz; ++i)
{
    x_tmp = x_val[i];
    y_tmp = y[x_ind[i]];

    x_val[i]    = c * x_tmp + s * y_tmp;
    y[x_ind[i]] = c * y_tmp - s * x_tmp;
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] nnz: number of non-zero entries of \(x\).

  • [inout] x_val: array of nnz elements containing the non-zero values of \(x\).

  • [in] x_ind: array of nnz elements containing the indices of the non-zero values of \(x\).

  • [inout] y: array of values in dense format.

  • [in] c: pointer to the cosine element of \(G\), can be on host or device.

  • [in] s: pointer to the sine element of \(G\), can be on host or device.

  • [in] idx_base: rocsparse_index_base_zero or rocsparse_index_base_one.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_value: idx_base is invalid.

  • rocsparse_status_invalid_size: nnz is invalid.

  • rocsparse_status_invalid_pointer: c, s, x_val, x_ind or y pointer is invalid.

rocsparse_sctr()

rocsparse_status rocsparse_ssctr(rocsparse_handle handle, rocsparse_int nnz, const float *x_val, const rocsparse_int *x_ind, float *y, rocsparse_index_base idx_base)
rocsparse_status rocsparse_dsctr(rocsparse_handle handle, rocsparse_int nnz, const double *x_val, const rocsparse_int *x_ind, double *y, rocsparse_index_base idx_base)

Scatter elements from a dense vector across a sparse vector.

rocsparse_sctr scatters the elements that are listed in x_ind from the sparse vector \(x\) into the dense vector \(y\). Indices of \(y\) that are not listed in x_ind remain unchanged.

for(i = 0; i < nnz; ++i)
{
    y[x_ind[i]] = x_val[i];
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] nnz: number of non-zero entries of \(x\).

  • [in] x_val: array of nnz elements containing the non-zero values of \(x\).

  • [in] x_ind: array of nnz elements containing the indices of the non-zero values of x.

  • [inout] y: array of values in dense format.

  • [in] idx_base: rocsparse_index_base_zero or rocsparse_index_base_one.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_value: idx_base is invalid.

  • rocsparse_status_invalid_size: nnz is invalid.

  • rocsparse_status_invalid_pointer: x_val, x_ind or y pointer is invalid.

Sparse Level 2 Functions

This module holds all sparse level 2 routines.

The sparse level 2 routines describe operations between a matrix in sparse format and a vector in dense format.

rocsparse_coomv()

rocsparse_status rocsparse_scoomv(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int n, rocsparse_int nnz, const float *alpha, const rocsparse_mat_descr descr, const float *coo_val, const rocsparse_int *coo_row_ind, const rocsparse_int *coo_col_ind, const float *x, const float *beta, float *y)
rocsparse_status rocsparse_dcoomv(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int n, rocsparse_int nnz, const double *alpha, const rocsparse_mat_descr descr, const double *coo_val, const rocsparse_int *coo_row_ind, const rocsparse_int *coo_col_ind, const double *x, const double *beta, double *y)

Sparse matrix vector multiplication using COO storage format.

rocsparse_coomv multiplies the scalar \(\alpha\) with a sparse \(m \times n\) matrix, defined in COO storage format, and the dense vector \(x\) and adds the result to the dense vector \(y\) that is multiplied by the scalar \(\beta\), such that

\[ y := \alpha \cdot op(A) \cdot x + \beta \cdot y, \]
with
\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & if\: trans == rocsparse\_operation\_none \\ A^T, & if\: trans == rocsparse\_operation\_transpose \\ A^H, & if\: trans == rocsparse\_operation\_conjugate\_transpose \end{array} \right. \end{split}\]

The COO matrix has to be sorted by row indices. This can be achieved by using rocsparse_coosort_by_row().

for(i = 0; i < m; ++i)
{
    y[i] = beta * y[i];
}

for(i = 0; i < nnz; ++i)
{
    y[coo_row_ind[i]] += alpha * coo_val[i] * x[coo_col_ind[i]];
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Note

Currently, only trans == rocsparse_operation_none is supported.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] trans: matrix operation type.

  • [in] m: number of rows of the sparse COO matrix.

  • [in] n: number of columns of the sparse COO matrix.

  • [in] nnz: number of non-zero entries of the sparse COO matrix.

  • [in] alpha: scalar \(\alpha\).

  • [in] descr: descriptor of the sparse COO matrix. Currently, only rocsparse_matrix_type_general is supported.

  • [in] coo_val: array of nnz elements of the sparse COO matrix.

  • [in] coo_row_ind: array of nnz elements containing the row indices of the sparse COO matrix.

  • [in] coo_col_ind: array of nnz elements containing the column indices of the sparse COO matrix.

  • [in] x: array of n elements ( \(op(A) = A\)) or m elements ( \(op(A) = A^T\) or \(op(A) = A^H\)).

  • [in] beta: scalar \(\beta\).

  • [inout] y: array of m elements ( \(op(A) = A\)) or n elements ( \(op(A) = A^T\) or \(op(A) = A^H\)).

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m, n or nnz is invalid.

  • rocsparse_status_invalid_pointer: descr, alpha, coo_val, coo_row_ind, coo_col_ind, x, beta or y pointer is invalid.

  • rocsparse_status_arch_mismatch: the device is not supported.

  • rocsparse_status_not_implemented: trans != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_csrmv_analysis()

rocsparse_status rocsparse_scsrmv_analysis(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int n, rocsparse_int nnz, const rocsparse_mat_descr descr, const float *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info)
rocsparse_status rocsparse_dcsrmv_analysis(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int n, rocsparse_int nnz, const rocsparse_mat_descr descr, const double *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info)

Sparse matrix vector multiplication using CSR storage format.

rocsparse_csrmv_analysis performs the analysis step for rocsparse_scsrmv() and rocsparse_dcsrmv(). It is expected that this function will be executed only once for a given matrix and particular operation type. The gathered analysis meta data can be cleared by rocsparse_csrmv_clear().

Note

If the matrix sparsity pattern changes, the gathered information will become invalid.

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] trans: matrix operation type.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] n: number of columns of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] descr: descriptor of the sparse CSR matrix.

  • [in] csr_val: array of nnz elements of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] csr_col_ind: array of nnz elements containing the column indices of the sparse CSR matrix.

  • [out] info: structure that holds the information collected during the analysis step.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m, n or nnz is invalid.

  • rocsparse_status_invalid_pointer: descr, csr_val, csr_row_ptr, csr_col_ind or info pointer is invalid.

  • rocsparse_status_memory_error: the buffer for the gathered information could not be allocated.

  • rocsparse_status_internal_error: an internal error occurred.

  • rocsparse_status_not_implemented: trans != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_csrmv()

rocsparse_status rocsparse_scsrmv(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int n, rocsparse_int nnz, const float *alpha, const rocsparse_mat_descr descr, const float *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, const float *x, const float *beta, float *y)
rocsparse_status rocsparse_dcsrmv(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int n, rocsparse_int nnz, const double *alpha, const rocsparse_mat_descr descr, const double *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, const double *x, const double *beta, double *y)

Sparse matrix vector multiplication using CSR storage format.

rocsparse_csrmv multiplies the scalar \(\alpha\) with a sparse \(m \times n\) matrix, defined in CSR storage format, and the dense vector \(x\) and adds the result to the dense vector \(y\) that is multiplied by the scalar \(\beta\), such that

\[ y := \alpha \cdot op(A) \cdot x + \beta \cdot y, \]
with
\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & if\: trans == rocsparse\_operation\_none \\ A^T, & if\: trans == rocsparse\_operation\_transpose \\ A^H, & if\: trans == rocsparse\_operation\_conjugate\_transpose \end{array} \right. \end{split}\]

The info parameter is optional and contains information collected by rocsparse_scsrmv_analysis() or rocsparse_dcsrmv_analysis(). If present, the information will be used to speed up the csrmv computation. If info == NULL, general csrmv routine will be used instead.

for(i = 0; i < m; ++i)
{
    y[i] = beta * y[i];

    for(j = csr_row_ptr[i]; j < csr_row_ptr[i + 1]; ++j)
    {
        y[i] = y[i] + alpha * csr_val[j] * x[csr_col_ind[j]];
    }
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Note

Currently, only trans == rocsparse_operation_none is supported.

Example

This example performs a sparse matrix vector multiplication in CSR format using additional meta data to improve performance.

// Create matrix info structure
rocsparse_mat_info info;
rocsparse_create_mat_info(&info);

// Perform analysis step to obtain meta data
rocsparse_scsrmv_analysis(handle,
                          rocsparse_operation_none,
                          m,
                          n,
                          nnz,
                          descr,
                          csr_val,
                          csr_row_ptr,
                          csr_col_ind,
                          info);

// Compute y = Ax
rocsparse_scsrmv(handle,
                 rocsparse_operation_none,
                 m,
                 n,
                 nnz,
                 &alpha,
                 descr,
                 csr_val,
                 csr_row_ptr,
                 csr_col_ind,
                 info,
                 x,
                 &beta,
                 y);

// Do more work
// ...

// Clean up
rocsparse_destroy_mat_info(info);

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] trans: matrix operation type.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] n: number of columns of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] alpha: scalar \(\alpha\).

  • [in] descr: descriptor of the sparse CSR matrix. Currently, only rocsparse_matrix_type_general is supported.

  • [in] csr_val: array of nnz elements of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] csr_col_ind: array of nnz elements containing the column indices of the sparse CSR matrix.

  • [in] info: information collected by rocsparse_scsrmv_analysis() or rocsparse_dcsrmv_analysis(), can be NULL if no information is available.

  • [in] x: array of n elements ( \(op(A) == A\)) or m elements ( \(op(A) == A^T\) or \(op(A) == A^H\)).

  • [in] beta: scalar \(\beta\).

  • [inout] y: array of m elements ( \(op(A) == A\)) or n elements ( \(op(A) == A^T\) or \(op(A) == A^H\)).

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m, n or nnz is invalid.

  • rocsparse_status_invalid_pointer: descr, alpha, csr_val, csr_row_ptr, csr_col_ind, x, beta or y pointer is invalid.

  • rocsparse_status_arch_mismatch: the device is not supported.

  • rocsparse_status_not_implemented: trans != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_csrmv_analysis_clear()

rocsparse_status rocsparse_csrmv_clear(rocsparse_handle handle, rocsparse_mat_info info)

Sparse matrix vector multiplication using CSR storage format.

rocsparse_csrmv_clear deallocates all memory that was allocated by rocsparse_scsrmv_analysis() or rocsparse_dcsrmv_analysis(). This is especially useful, if memory is an issue and the analysis data is not required anymore for further computation, e.g. when switching to another sparse matrix format.

Note

Calling rocsparse_csrmv_clear is optional. All allocated resources will be cleared, when the opaque rocsparse_mat_info struct is destroyed using rocsparse_destroy_mat_info().

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [inout] info: structure that holds the information collected during analysis step.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_pointer: info pointer is invalid.

  • rocsparse_status_memory_error: the buffer for the gathered information could not be deallocated.

  • rocsparse_status_internal_error: an internal error occurred.

rocsparse_ellmv()

rocsparse_status rocsparse_sellmv(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int n, const float *alpha, const rocsparse_mat_descr descr, const float *ell_val, const rocsparse_int *ell_col_ind, rocsparse_int ell_width, const float *x, const float *beta, float *y)
rocsparse_status rocsparse_dellmv(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int n, const double *alpha, const rocsparse_mat_descr descr, const double *ell_val, const rocsparse_int *ell_col_ind, rocsparse_int ell_width, const double *x, const double *beta, double *y)

Sparse matrix vector multiplication using ELL storage format.

rocsparse_ellmv multiplies the scalar \(\alpha\) with a sparse \(m \times n\) matrix, defined in ELL storage format, and the dense vector \(x\) and adds the result to the dense vector \(y\) that is multiplied by the scalar \(\beta\), such that

\[ y := \alpha \cdot op(A) \cdot x + \beta \cdot y, \]
with
\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & if\: trans == rocsparse\_operation\_none \\ A^T, & if\: trans == rocsparse\_operation\_transpose \\ A^H, & if\: trans == rocsparse\_operation\_conjugate\_transpose \end{array} \right. \end{split}\]

for(i = 0; i < m; ++i)
{
    y[i] = beta * y[i];

    for(p = 0; p < ell_width; ++p)
    {
        idx = p * m + i;

        if((ell_col_ind[idx] >= 0) && (ell_col_ind[idx] < n))
        {
            y[i] = y[i] + alpha * ell_val[idx] * x[ell_col_ind[idx]];
        }
    }
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Note

Currently, only trans == rocsparse_operation_none is supported.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] trans: matrix operation type.

  • [in] m: number of rows of the sparse ELL matrix.

  • [in] n: number of columns of the sparse ELL matrix.

  • [in] alpha: scalar \(\alpha\).

  • [in] descr: descriptor of the sparse ELL matrix. Currently, only rocsparse_matrix_type_general is supported.

  • [in] ell_val: array that contains the elements of the sparse ELL matrix. Padded elements should be zero.

  • [in] ell_col_ind: array that contains the column indices of the sparse ELL matrix. Padded column indices should be -1.

  • [in] ell_width: number of non-zero elements per row of the sparse ELL matrix.

  • [in] x: array of n elements ( \(op(A) == A\)) or m elements ( \(op(A) == A^T\) or \(op(A) == A^H\)).

  • [in] beta: scalar \(\beta\).

  • [inout] y: array of m elements ( \(op(A) == A\)) or n elements ( \(op(A) == A^T\) or \(op(A) == A^H\)).

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m, n or ell_width is invalid.

  • rocsparse_status_invalid_pointer: descr, alpha, ell_val, ell_col_ind, x, beta or y pointer is invalid.

  • rocsparse_status_not_implemented: trans != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_hybmv()

rocsparse_status rocsparse_shybmv(rocsparse_handle handle, rocsparse_operation trans, const float *alpha, const rocsparse_mat_descr descr, const rocsparse_hyb_mat hyb, const float *x, const float *beta, float *y)
rocsparse_status rocsparse_dhybmv(rocsparse_handle handle, rocsparse_operation trans, const double *alpha, const rocsparse_mat_descr descr, const rocsparse_hyb_mat hyb, const double *x, const double *beta, double *y)

Sparse matrix vector multiplication using HYB storage format.

rocsparse_hybmv multiplies the scalar \(\alpha\) with a sparse \(m \times n\) matrix, defined in HYB storage format, and the dense vector \(x\) and adds the result to the dense vector \(y\) that is multiplied by the scalar \(\beta\), such that

\[ y := \alpha \cdot op(A) \cdot x + \beta \cdot y, \]
with
\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & if\: trans == rocsparse\_operation\_none \\ A^T, & if\: trans == rocsparse\_operation\_transpose \\ A^H, & if\: trans == rocsparse\_operation\_conjugate\_transpose \end{array} \right. \end{split}\]

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Note

Currently, only trans == rocsparse_operation_none is supported.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] trans: matrix operation type.

  • [in] alpha: scalar \(\alpha\).

  • [in] descr: descriptor of the sparse HYB matrix. Currently, only rocsparse_matrix_type_general is supported.

  • [in] hyb: matrix in HYB storage format.

  • [in] x: array of n elements ( \(op(A) == A\)) or m elements ( \(op(A) == A^T\) or \(op(A) == A^H\)).

  • [in] beta: scalar \(\beta\).

  • [inout] y: array of m elements ( \(op(A) == A\)) or n elements ( \(op(A) == A^T\) or \(op(A) == A^H\)).

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: hyb structure was not initialized with valid matrix sizes.

  • rocsparse_status_invalid_pointer: descr, alpha, hyb, x, beta or y pointer is invalid.

  • rocsparse_status_invalid_value: hyb structure was not initialized with a valid partitioning type.

  • rocsparse_status_arch_mismatch: the device is not supported.

  • rocsparse_status_memory_error: the buffer could not be allocated.

  • rocsparse_status_internal_error: an internal error occurred.

  • rocsparse_status_not_implemented: trans != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_csrsv_zero_pivot()

rocsparse_status rocsparse_csrsv_zero_pivot(rocsparse_handle handle, const rocsparse_mat_descr descr, rocsparse_mat_info info, rocsparse_int *position)

Sparse triangular solve using CSR storage format.

rocsparse_csrsv_zero_pivot returns rocsparse_status_zero_pivot, if either a structural or numerical zero has been found during rocsparse_scsrsv_solve() or rocsparse_dcsrsv_solve() computation. The first zero pivot \(j\) at \(A_{j,j}\) is stored in position, using same index base as the CSR matrix.

position can be in host or device memory. If no zero pivot has been found, position is set to -1 and rocsparse_status_success is returned instead.

Note

rocsparse_csrsv_zero_pivot is a blocking function. It might influence performance negatively.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] descr: descriptor of the sparse CSR matrix.

  • [in] info: structure that holds the information collected during the analysis step.

  • [inout] position: pointer to zero pivot \(j\), can be in host or device memory.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_pointer: info or position pointer is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

  • rocsparse_status_zero_pivot: zero pivot has been found.

rocsparse_csrsv_buffer_size()

rocsparse_status rocsparse_scsrsv_buffer_size(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int nnz, const rocsparse_mat_descr descr, const float *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, size_t *buffer_size)
rocsparse_status rocsparse_dcsrsv_buffer_size(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int nnz, const rocsparse_mat_descr descr, const double *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, size_t *buffer_size)

Sparse triangular solve using CSR storage format.

rocsparse_csrsv_buffer_size returns the size of the temporary storage buffer that is required by rocsparse_scsrsv_analysis(), rocsparse_dcsrsv_analysis(), rocsparse_scsrsv_solve() and rocsparse_dcsrsv_solve(). The temporary storage buffer must be allocated by the user. The size of the temporary storage buffer is identical to the size returned by rocsparse_scsrilu0_buffer_size() and rocsparse_dcsrilu0_buffer_size() if the matrix sparsity pattern is identical. The user allocated buffer can thus be shared between subsequent calls to those functions.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] trans: matrix operation type.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] descr: descriptor of the sparse CSR matrix.

  • [in] csr_val: array of nnz elements of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] csr_col_ind: array of nnz elements containing the column indices of the sparse CSR matrix.

  • [out] info: structure that holds the information collected during the analysis step.

  • [in] buffer_size: number of bytes of the temporary storage buffer required by rocsparse_scsrsv_analysis(), rocsparse_dcsrsv_analysis(), rocsparse_scsrsv_solve() and rocsparse_dcsrsv_solve().

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m or nnz is invalid.

  • rocsparse_status_invalid_pointer: descr, csr_val, csr_row_ptr, csr_col_ind, info or buffer_size pointer is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

  • rocsparse_status_not_implemented: trans != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_csrsv_analysis()

rocsparse_status rocsparse_scsrsv_analysis(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int nnz, const rocsparse_mat_descr descr, const float *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, rocsparse_analysis_policy analysis, rocsparse_solve_policy solve, void *temp_buffer)
rocsparse_status rocsparse_dcsrsv_analysis(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int nnz, const rocsparse_mat_descr descr, const double *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, rocsparse_analysis_policy analysis, rocsparse_solve_policy solve, void *temp_buffer)

Sparse triangular solve using CSR storage format.

rocsparse_csrsv_analysis performs the analysis step for rocsparse_scsrsv_solve() and rocsparse_dcsrsv_solve(). It is expected that this function will be executed only once for a given matrix and particular operation type. The analysis meta data can be cleared by rocsparse_csrsv_clear().

rocsparse_csrsv_analysis can share its meta data with rocsparse_scsrilu0_analysis() and rocsparse_dcsrilu0_analysis(). Selecting rocsparse_analysis_policy_reuse policy can greatly improve computation performance of meta data. However, the user need to make sure that the sparsity pattern remains unchanged. If this cannot be assured, rocsparse_analysis_policy_force has to be used.

Note

If the matrix sparsity pattern changes, the gathered information will become invalid.

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] trans: matrix operation type.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] descr: descriptor of the sparse CSR matrix.

  • [in] csr_val: array of nnz elements of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] csr_col_ind: array of nnz elements containing the column indices of the sparse CSR matrix.

  • [out] info: structure that holds the information collected during the analysis step.

  • [in] analysis: rocsparse_analysis_policy_reuse or rocsparse_analysis_policy_force.

  • [in] solve: rocsparse_solve_policy_auto.

  • [in] temp_buffer: temporary storage buffer allocated by the user.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m or nnz is invalid.

  • rocsparse_status_invalid_pointer: descr, csr_row_ptr, csr_col_ind, info or temp_buffer pointer is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

  • rocsparse_status_not_implemented: trans != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_csrsv_solve()

rocsparse_status rocsparse_scsrsv_solve(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int nnz, const float *alpha, const rocsparse_mat_descr descr, const float *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, const float *x, float *y, rocsparse_solve_policy policy, void *temp_buffer)
rocsparse_status rocsparse_dcsrsv_solve(rocsparse_handle handle, rocsparse_operation trans, rocsparse_int m, rocsparse_int nnz, const double *alpha, const rocsparse_mat_descr descr, const double *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, const double *x, double *y, rocsparse_solve_policy policy, void *temp_buffer)

Sparse triangular solve using CSR storage format.

rocsparse_csrsv_solve solves a sparse triangular linear system of a sparse \(m \times m\) matrix, defined in CSR storage format, a dense solution vector \(y\) and the right-hand side \(x\) that is multiplied by \(\alpha\), such that

\[ op(A) \cdot y = \alpha \cdot x, \]
with
\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & if\: trans == rocsparse\_operation\_none \\ A^T, & if\: trans == rocsparse\_operation\_transpose \\ A^H, & if\: trans == rocsparse\_operation\_conjugate\_transpose \end{array} \right. \end{split}\]

rocsparse_csrsv_solve requires a user allocated temporary buffer. Its size is returned by rocsparse_scsrsv_buffer_size() or rocsparse_dcsrsv_buffer_size(). Furthermore, analysis meta data is required. It can be obtained by rocsparse_scsrsv_analysis() or rocsparse_dcsrsv_analysis(). rocsparse_csrsv_solve reports the first zero pivot (either numerical or structural zero). The zero pivot status can be checked calling rocsparse_csrsv_zero_pivot(). If rocsparse_diag_type == rocsparse_diag_type_unit, no zero pivot will be reported, even if \(A_{j,j} = 0\) for some \(j\).

Note

The sparse CSR matrix has to be sorted. This can be achieved by calling rocsparse_csrsort().

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Note

Currently, only trans == rocsparse_operation_none is supported.

Example

Consider the lower triangular \(m \times m\) matrix \(L\), stored in CSR storage format with unit diagonal. The following example solves \(L \cdot y = x\).

// Create rocSPARSE handle
rocsparse_handle handle;
rocsparse_create_handle(&handle);

// Create matrix descriptor
rocsparse_mat_descr descr;
rocsparse_create_mat_descr(&descr);
rocsparse_set_mat_fill_mode(descr, rocsparse_fill_mode_lower);
rocsparse_set_mat_diag_type(descr, rocsparse_diag_type_unit);

// Create matrix info structure
rocsparse_mat_info info;
rocsparse_create_mat_info(&info);

// Obtain required buffer size
size_t buffer_size;
rocsparse_dcsrsv_buffer_size(handle,
                             rocsparse_operation_none,
                             m,
                             nnz,
                             descr,
                             csr_val,
                             csr_row_ptr,
                             csr_col_ind,
                             info,
                             &buffer_size);

// Allocate temporary buffer
void* temp_buffer;
hipMalloc(&temp_buffer, buffer_size);

// Perform analysis step
rocsparse_dcsrsv_analysis(handle,
                          rocsparse_operation_none,
                          m,
                          nnz,
                          descr,
                          csr_val,
                          csr_row_ptr,
                          csr_col_ind,
                          info,
                          rocsparse_analysis_policy_reuse,
                          rocsparse_solve_policy_auto,
                          temp_buffer);

// Solve Ly = x
rocsparse_dcsrsv_solve(handle,
                       rocsparse_operation_none,
                       m,
                       nnz,
                       &alpha,
                       descr,
                       csr_val,
                       csr_row_ptr,
                       csr_col_ind,
                       info,
                       x,
                       y,
                       rocsparse_solve_policy_auto,
                       temp_buffer);

// No zero pivot should be found, with L having unit diagonal

// Clean up
hipFree(temp_buffer);
rocsparse_destroy_mat_info(info);
rocsparse_destroy_mat_descr(descr);
rocsparse_destroy_handle(handle);

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] trans: matrix operation type.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] alpha: scalar \(\alpha\).

  • [in] descr: descriptor of the sparse CSR matrix.

  • [in] csr_val: array of nnz elements of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] csr_col_ind: array of nnz elements containing the column indices of the sparse CSR matrix.

  • [in] info: structure that holds the information collected during the analysis step.

  • [in] x: array of m elements, holding the right-hand side.

  • [out] y: array of m elements, holding the solution.

  • [in] policy: rocsparse_solve_policy_auto.

  • [in] temp_buffer: temporary storage buffer allocated by the user.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m or nnz is invalid.

  • rocsparse_status_invalid_pointer: descr, alpha, csr_val, csr_row_ptr, csr_col_ind, x or y pointer is invalid.

  • rocsparse_status_arch_mismatch: the device is not supported.

  • rocsparse_status_internal_error: an internal error occurred.

  • rocsparse_status_not_implemented: trans != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_csrsv_clear()

rocsparse_status rocsparse_csrsv_clear(rocsparse_handle handle, const rocsparse_mat_descr descr, rocsparse_mat_info info)

Sparse triangular solve using CSR storage format.

rocsparse_csrsv_clear deallocates all memory that was allocated by rocsparse_scsrsv_analysis() or rocsparse_dcsrsv_analysis(). This is especially useful, if memory is an issue and the analysis data is not required for further computation, e.g. when switching to another sparse matrix format. Calling rocsparse_csrsv_clear is optional. All allocated resources will be cleared, when the opaque rocsparse_mat_info struct is destroyed using rocsparse_destroy_mat_info().

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] descr: descriptor of the sparse CSR matrix.

  • [inout] info: structure that holds the information collected during the analysis step.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_pointer: info pointer is invalid.

  • rocsparse_status_memory_error: the buffer holding the meta data could not be deallocated.

  • rocsparse_status_internal_error: an internal error occurred.

Sparse Level 3 Functions

This module holds all sparse level 3 routines.

The sparse level 3 routines describe operations between a matrix in sparse format and multiple vectors in dense format that can also be seen as a dense matrix.

rocsparse_csrmm()

rocsparse_status rocsparse_scsrmm(rocsparse_handle handle, rocsparse_operation trans_A, rocsparse_operation trans_B, rocsparse_int m, rocsparse_int n, rocsparse_int k, rocsparse_int nnz, const float *alpha, const rocsparse_mat_descr descr, const float *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, const float *B, rocsparse_int ldb, const float *beta, float *C, rocsparse_int ldc)
rocsparse_status rocsparse_dcsrmm(rocsparse_handle handle, rocsparse_operation trans_A, rocsparse_operation trans_B, rocsparse_int m, rocsparse_int n, rocsparse_int k, rocsparse_int nnz, const double *alpha, const rocsparse_mat_descr descr, const double *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, const double *B, rocsparse_int ldb, const double *beta, double *C, rocsparse_int ldc)

Sparse matrix dense matrix multiplication using CSR storage format.

rocsparse_csrmm multiplies the scalar \(\alpha\) with a sparse \(m \times k\) matrix \(A\), defined in CSR storage format, and the dense \(k \times n\) matrix \(B\) and adds the result to the dense \(m \times n\) matrix \(C\) that is multiplied by the scalar \(\beta\), such that

\[ C := \alpha \cdot op(A) \cdot op(B) + \beta \cdot C, \]
with
\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & if\: trans\_A == rocsparse\_operation\_none \\ A^T, & if\: trans\_A == rocsparse\_operation\_transpose \\ A^H, & if\: trans\_A == rocsparse\_operation\_conjugate\_transpose \end{array} \right. \end{split}\]
and
\[\begin{split} op(B) = \left\{ \begin{array}{ll} B, & if\: trans\_B == rocsparse\_operation\_none \\ B^T, & if\: trans\_B == rocsparse\_operation\_transpose \\ B^H, & if\: trans\_B == rocsparse\_operation\_conjugate\_transpose \end{array} \right. \end{split}\]

for(i = 0; i < ldc; ++i)
{
    for(j = 0; j < n; ++j)
    {
        C[i][j] = beta * C[i][j];

        for(k = csr_row_ptr[i]; k < csr_row_ptr[i + 1]; ++k)
        {
            C[i][j] += alpha * csr_val[k] * B[csr_col_ind[k]][j];
        }
    }
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Note

Currently, only trans_A == rocsparse_operation_none is supported.

Example

This example multiplies a CSR matrix with a dense matrix.

//     1 2 0 3 0
// A = 0 4 5 0 0
//     6 0 0 7 8

rocsparse_int m   = 3;
rocsparse_int k   = 5;
rocsparse_int nnz = 8;

csr_row_ptr[m+1] = {0, 3, 5, 8};             // device memory
csr_col_ind[nnz] = {0, 1, 3, 1, 2, 0, 3, 4}; // device memory
csr_val[nnz]     = {1, 2, 3, 4, 5, 6, 7, 8}; // device memory

// Set dimension n of B
rocsparse_int n = 64;

// Allocate and generate dense matrix B
std::vector<float> hB(k * n);
for(rocsparse_int i = 0; i < k * n; ++i)
{
    hB[i] = static_cast<float>(rand()) / RAND_MAX;
}

// Copy B to the device
float* B;
hipMalloc((void**)&B, sizeof(float) * k * n);
hipMemcpy(B, hB.data(), sizeof(float) * k * n, hipMemcpyHostToDevice);

// alpha and beta
float alpha = 1.0f;
float beta  = 0.0f;

// Allocate memory for the resulting matrix C
float* C;
hipMalloc((void**)&C, sizeof(float) * m * n);

// Perform the matrix multiplication
rocsparse_scsrmm(handle,
                 rocsparse_operation_none,
                 rocsparse_operation_none,
                 m,
                 n,
                 k,
                 nnz,
                 &alpha,
                 descr,
                 csr_val,
                 csr_row_ptr,
                 csr_col_ind,
                 B,
                 k,
                 &beta,
                 C,
                 m);

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] trans_A: matrix \(A\) operation type.

  • [in] trans_B: matrix \(B\) operation type.

  • [in] m: number of rows of the sparse CSR matrix \(A\).

  • [in] n: number of columns of the dense matrix \(op(B)\) and \(C\).

  • [in] k: number of columns of the sparse CSR matrix \(A\).

  • [in] nnz: number of non-zero entries of the sparse CSR matrix \(A\).

  • [in] alpha: scalar \(\alpha\).

  • [in] descr: descriptor of the sparse CSR matrix \(A\). Currently, only rocsparse_matrix_type_general is supported.

  • [in] csr_val: array of nnz elements of the sparse CSR matrix \(A\).

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix \(A\).

  • [in] csr_col_ind: array of nnz elements containing the column indices of the sparse CSR matrix \(A\).

  • [in] B: array of dimension \(ldb \times n\) ( \(op(B) == B\)) or \(ldb \times k\) ( \(op(B) == B^T\) or \(op(B) == B^H\)).

  • [in] ldb: leading dimension of \(B\), must be at least \(\max{(1, k)}\) ( \(op(A) == A\)) or \(\max{(1, m)}\) ( \(op(A) == A^T\) or \(op(A) == A^H\)).

  • [in] beta: scalar \(\beta\).

  • [inout] C: array of dimension \(ldc \times n\).

  • [in] ldc: leading dimension of \(C\), must be at least \(\max{(1, m)}\) ( \(op(A) == A\)) or \(\max{(1, k)}\) ( \(op(A) == A^T\) or \(op(A) == A^H\)).

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m, n, k, nnz, ldb or ldc is invalid.

  • rocsparse_status_invalid_pointer: descr, alpha, csr_val, csr_row_ptr, csr_col_ind, B, beta or C pointer is invalid.

  • rocsparse_status_arch_mismatch: the device is not supported.

  • rocsparse_status_not_implemented: trans_A != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

Preconditioner Functions

This module holds all sparse preconditioners.

The sparse preconditioners describe manipulations on a matrix in sparse format to obtain a sparse preconditioner matrix.

rocsparse_csrilu0_zero_pivot()

rocsparse_status rocsparse_csrilu0_zero_pivot(rocsparse_handle handle, rocsparse_mat_info info, rocsparse_int *position)

Incomplete LU factorization with 0 fill-ins and no pivoting using CSR storage format.

rocsparse_csrilu0_zero_pivot returns rocsparse_status_zero_pivot, if either a structural or numerical zero has been found during rocsparse_scsrilu0() or rocsparse_dcsrilu0() computation. The first zero pivot \(j\) at \(A_{j,j}\) is stored in position, using same index base as the CSR matrix.

position can be in host or device memory. If no zero pivot has been found, position is set to -1 and rocsparse_status_success is returned instead.

Note

rocsparse_csrilu0_zero_pivot is a blocking function. It might influence performance negatively.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] info: structure that holds the information collected during the analysis step.

  • [inout] position: pointer to zero pivot \(j\), can be in host or device memory.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_pointer: info or position pointer is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

  • rocsparse_status_zero_pivot: zero pivot has been found.

rocsparse_csrilu0_buffer_size()

rocsparse_status rocsparse_scsrilu0_buffer_size(rocsparse_handle handle, rocsparse_int m, rocsparse_int nnz, const rocsparse_mat_descr descr, const float *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, size_t *buffer_size)
rocsparse_status rocsparse_dcsrilu0_buffer_size(rocsparse_handle handle, rocsparse_int m, rocsparse_int nnz, const rocsparse_mat_descr descr, const double *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, size_t *buffer_size)

Incomplete LU factorization with 0 fill-ins and no pivoting using CSR storage format.

rocsparse_csrilu0_buffer_size returns the size of the temporary storage buffer that is required by rocsparse_scsrilu0_analysis(), rocsparse_dcsrilu0_analysis(), rocsparse_scsrilu0() and rocsparse_dcsrilu0(). The temporary storage buffer must be allocated by the user. The size of the temporary storage buffer is identical to the size returned by rocsparse_scsrsv_buffer_size() and rocsparse_dcsrsv_buffer_size() if the matrix sparsity pattern is identical. The user allocated buffer can thus be shared between subsequent calls to those functions.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] descr: descriptor of the sparse CSR matrix.

  • [in] csr_val: array of nnz elements of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] csr_col_ind: array of nnz elements containing the column indices of the sparse CSR matrix.

  • [out] info: structure that holds the information collected during the analysis step.

  • [in] buffer_size: number of bytes of the temporary storage buffer required by rocsparse_scsrilu0_analysis(), rocsparse_dcsrilu0_analysis(), rocsparse_scsrilu0() and rocsparse_dcsrilu0().

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m or nnz is invalid.

  • rocsparse_status_invalid_pointer: descr, csr_val, csr_row_ptr, csr_col_ind, info or buffer_size pointer is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

  • rocsparse_status_not_implemented: trans != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_csrilu0_analysis()

rocsparse_status rocsparse_scsrilu0_analysis(rocsparse_handle handle, rocsparse_int m, rocsparse_int nnz, const rocsparse_mat_descr descr, const float *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, rocsparse_analysis_policy analysis, rocsparse_solve_policy solve, void *temp_buffer)
rocsparse_status rocsparse_dcsrilu0_analysis(rocsparse_handle handle, rocsparse_int m, rocsparse_int nnz, const rocsparse_mat_descr descr, const double *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, rocsparse_analysis_policy analysis, rocsparse_solve_policy solve, void *temp_buffer)

Incomplete LU factorization with 0 fill-ins and no pivoting using CSR storage format.

rocsparse_csrilu0_analysis performs the analysis step for rocsparse_scsrilu0() and rocsparse_dcsrilu0(). It is expected that this function will be executed only once for a given matrix and particular operation type. The analysis meta data can be cleared by rocsparse_csrilu0_clear().

rocsparse_csrilu0_analysis can share its meta data with rocsparse_scsrsv_analysis() and rocsparse_dcsrsv_analysis(). Selecting rocsparse_analysis_policy_reuse policy can greatly improve computation performance of meta data. However, the user need to make sure that the sparsity pattern remains unchanged. If this cannot be assured, rocsparse_analysis_policy_force has to be used.

Note

If the matrix sparsity pattern changes, the gathered information will become invalid.

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] descr: descriptor of the sparse CSR matrix.

  • [in] csr_val: array of nnz elements of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] csr_col_ind: array of nnz elements containing the column indices of the sparse CSR matrix.

  • [out] info: structure that holds the information collected during the analysis step.

  • [in] analysis: rocsparse_analysis_policy_reuse or rocsparse_analysis_policy_force.

  • [in] solve: rocsparse_solve_policy_auto.

  • [in] temp_buffer: temporary storage buffer allocated by the user.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m or nnz is invalid.

  • rocsparse_status_invalid_pointer: descr, csr_val, csr_row_ptr, csr_col_ind, info or temp_buffer pointer is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

  • rocsparse_status_not_implemented: trans != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_csrilu0()

rocsparse_status rocsparse_scsrilu0(rocsparse_handle handle, rocsparse_int m, rocsparse_int nnz, const rocsparse_mat_descr descr, float *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, rocsparse_solve_policy policy, void *temp_buffer)
rocsparse_status rocsparse_dcsrilu0(rocsparse_handle handle, rocsparse_int m, rocsparse_int nnz, const rocsparse_mat_descr descr, double *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_mat_info info, rocsparse_solve_policy policy, void *temp_buffer)

Incomplete LU factorization with 0 fill-ins and no pivoting using CSR storage format.

rocsparse_csrilu0 computes the incomplete LU factorization with 0 fill-ins and no pivoting of a sparse \(m \times m\) CSR matrix \(A\), such that

\[ A \approx LU \]

rocsparse_csrilu0 requires a user allocated temporary buffer. Its size is returned by rocsparse_scsrilu0_buffer_size() or rocsparse_dcsrilu0_buffer_size(). Furthermore, analysis meta data is required. It can be obtained by rocsparse_scsrilu0_analysis() or rocsparse_dcsrilu0_analysis(). rocsparse_csrilu0 reports the first zero pivot (either numerical or structural zero). The zero pivot status can be obtained by calling rocsparse_csrilu0_zero_pivot().

Note

The sparse CSR matrix has to be sorted. This can be achieved by calling rocsparse_csrsort().

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Example

Consider the sparse \(m \times m\) matrix \(A\), stored in CSR storage format. The following example computes the incomplete LU factorization \(M \approx LU\) and solves the preconditioned system \(My = x\).

// Create rocSPARSE handle
rocsparse_handle handle;
rocsparse_create_handle(&handle);

// Create matrix descriptor for M
rocsparse_mat_descr descr_M;
rocsparse_create_mat_descr(&descr_M);

// Create matrix descriptor for L
rocsparse_mat_descr descr_L;
rocsparse_create_mat_descr(&descr_L);
rocsparse_set_mat_fill_mode(descr_L, rocsparse_fill_mode_lower);
rocsparse_set_mat_diag_type(descr_L, rocsparse_diag_type_unit);

// Create matrix descriptor for U
rocsparse_mat_descr descr_U;
rocsparse_create_mat_descr(&descr_U);
rocsparse_set_mat_fill_mode(descr_U, rocsparse_fill_mode_upper);
rocsparse_set_mat_diag_type(descr_U, rocsparse_diag_type_non_unit);

// Create matrix info structure
rocsparse_mat_info info;
rocsparse_create_mat_info(&info);

// Obtain required buffer size
size_t buffer_size_M;
size_t buffer_size_L;
size_t buffer_size_U;
rocsparse_dcsrilu0_buffer_size(handle,
                              m,
                              nnz,
                              descr_M,
                              csr_val,
                              csr_row_ptr,
                              csr_col_ind,
                              info,
                              &buffer_size_M);
rocsparse_dcsrsv_buffer_size(handle,
                             rocsparse_operation_none,
                             m,
                             nnz,
                             descr_L,
                             csr_val,
                             csr_row_ptr,
                             csr_col_ind,
                             info,
                             &buffer_size_L);
rocsparse_dcsrsv_buffer_size(handle,
                             rocsparse_operation_none,
                             m,
                             nnz,
                             descr_U,
                             csr_val,
                             csr_row_ptr,
                             csr_col_ind,
                             info,
                             &buffer_size_U);

size_t buffer_size = max(buffer_size_M, max(buffer_size_L, buffer_size_U));

// Allocate temporary buffer
void* temp_buffer;
hipMalloc(&temp_buffer, buffer_size);

// Perform analysis steps, using rocsparse_analysis_policy_reuse to improve
// computation performance
rocsparse_dcsrilu0_analysis(handle,
                            m,
                            nnz,
                            descr_M,
                            csr_val,
                            csr_row_ptr,
                            csr_col_ind,
                            info,
                            rocsparse_analysis_policy_reuse,
                            rocsparse_solve_policy_auto,
                            temp_buffer);
rocsparse_dcsrsv_analysis(handle,
                          rocsparse_operation_none,
                          m,
                          nnz,
                          descr_L,
                          csr_val,
                          csr_row_ptr,
                          csr_col_ind,
                          info,
                          rocsparse_analysis_policy_reuse,
                          rocsparse_solve_policy_auto,
                          temp_buffer);
rocsparse_dcsrsv_analysis(handle,
                          rocsparse_operation_none,
                          m,
                          nnz,
                          descr_U,
                          csr_val,
                          csr_row_ptr,
                          csr_col_ind,
                          info,
                          rocsparse_analysis_policy_reuse,
                          rocsparse_solve_policy_auto,
                          temp_buffer);

// Check for zero pivot
rocsparse_int position;
if(rocsparse_status_zero_pivot == rocsparse_csrilu0_zero_pivot(handle,
                                                               info,
                                                               &position))
{
    printf("A has structural zero at A(%d,%d)\n", position, position);
}

// Compute incomplete LU factorization
rocsparse_dcsrilu0(handle,
                   m,
                   nnz,
                   descr_M,
                   csr_val,
                   csr_row_ptr,
                   csr_col_ind,
                   info,
                   rocsparse_solve_policy_auto,
                   temp_buffer);

// Check for zero pivot
if(rocsparse_status_zero_pivot == rocsparse_csrilu0_zero_pivot(handle,
                                                               info,
                                                               &position))
{
    printf("U has structural and/or numerical zero at U(%d,%d)\n",
           position,
           position);
}

// Solve Lz = x
rocsparse_dcsrsv_solve(handle,
                       rocsparse_operation_none,
                       m,
                       nnz,
                       &alpha,
                       descr_L,
                       csr_val,
                       csr_row_ptr,
                       csr_col_ind,
                       info,
                       x,
                       z,
                       rocsparse_solve_policy_auto,
                       temp_buffer);

// Solve Uy = z
rocsparse_dcsrsv_solve(handle,
                       rocsparse_operation_none,
                       m,
                       nnz,
                       &alpha,
                       descr_U,
                       csr_val,
                       csr_row_ptr,
                       csr_col_ind,
                       info,
                       z,
                       y,
                       rocsparse_solve_policy_auto,
                       temp_buffer);

// Clean up
hipFree(temp_buffer);
rocsparse_destroy_mat_info(info);
rocsparse_destroy_mat_descr(descr_M);
rocsparse_destroy_mat_descr(descr_L);
rocsparse_destroy_mat_descr(descr_U);
rocsparse_destroy_handle(handle);

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] descr: descriptor of the sparse CSR matrix.

  • [inout] csr_val: array of nnz elements of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] csr_col_ind: array of nnz elements containing the column indices of the sparse CSR matrix.

  • [in] info: structure that holds the information collected during the analysis step.

  • [in] policy: rocsparse_solve_policy_auto.

  • [in] temp_buffer: temporary storage buffer allocated by the user.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m or nnz is invalid.

  • rocsparse_status_invalid_pointer: descr, csr_val, csr_row_ptr or csr_col_ind pointer is invalid.

  • rocsparse_status_arch_mismatch: the device is not supported.

  • rocsparse_status_internal_error: an internal error occurred.

  • rocsparse_status_not_implemented: trans != rocsparse_operation_none or rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_csrilu0_clear()

rocsparse_status rocsparse_csrilu0_clear(rocsparse_handle handle, rocsparse_mat_info info)

Incomplete LU factorization with 0 fill-ins and no pivoting using CSR storage format.

rocsparse_csrilu0_clear deallocates all memory that was allocated by rocsparse_scsrilu0_analysis() or rocsparse_dcsrilu0_analysis(). This is especially useful, if memory is an issue and the analysis data is not required for further computation.

Note

Calling rocsparse_csrilu0_clear is optional. All allocated resources will be cleared, when the opaque rocsparse_mat_info struct is destroyed using rocsparse_destroy_mat_info().

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [inout] info: structure that holds the information collected during the analysis step.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_pointer: info pointer is invalid.

  • rocsparse_status_memory_error: the buffer holding the meta data could not be deallocated.

  • rocsparse_status_internal_error: an internal error occurred.

Sparse Conversion Functions

This module holds all sparse conversion routines.

The sparse conversion routines describe operations on a matrix in sparse format to obtain a matrix in a different sparse format.

rocsparse_csr2coo()

rocsparse_status rocsparse_csr2coo(rocsparse_handle handle, const rocsparse_int *csr_row_ptr, rocsparse_int nnz, rocsparse_int m, rocsparse_int *coo_row_ind, rocsparse_index_base idx_base)

Convert a sparse CSR matrix into a sparse COO matrix.

rocsparse_csr2coo converts the CSR array containing the row offsets, that point to the start of every row, into a COO array of row indices.

Note

It can also be used to convert a CSC array containing the column offsets into a COO array of column indices.

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Example

This example converts a CSR matrix into a COO matrix.

//     1 2 0 3 0
// A = 0 4 5 0 0
//     6 0 0 7 8

rocsparse_int m   = 3;
rocsparse_int n   = 5;
rocsparse_int nnz = 8;

csr_row_ptr[m+1] = {0, 3, 5, 8};             // device memory
csr_col_ind[nnz] = {0, 1, 3, 1, 2, 0, 3, 4}; // device memory
csr_val[nnz]     = {1, 2, 3, 4, 5, 6, 7, 8}; // device memory

// Allocate COO matrix arrays
rocsparse_int* coo_row_ind;
rocsparse_int* coo_col_ind;
float* coo_val;

hipMalloc((void**)&coo_row_ind, sizeof(rocsparse_int) * nnz);
hipMalloc((void**)&coo_col_ind, sizeof(rocsparse_int) * nnz);
hipMalloc((void**)&coo_val, sizeof(float) * nnz);

// Convert the csr row offsets into coo row indices
rocsparse_csr2coo(handle,
                  csr_row_ptr,
                  nnz,
                  m,
                  coo_row_ind,
                  rocsparse_index_base_zero);

// Copy the column and value arrays
hipMemcpy(coo_col_ind,
          csr_col_ind,
          sizeof(rocsparse_int) * nnz,
          hipMemcpyDeviceToDevice);

hipMemcpy(coo_val,
          csr_val,
          sizeof(float) * nnz,
          hipMemcpyDeviceToDevice);

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] m: number of rows of the sparse CSR matrix.

  • [out] coo_row_ind: array of nnz elements containing the row indices of the sparse COO matrix.

  • [in] idx_base: rocsparse_index_base_zero or rocsparse_index_base_one.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m or nnz is invalid.

  • rocsparse_status_invalid_pointer: csr_row_ptr or coo_row_ind pointer is invalid.

  • rocsparse_status_arch_mismatch: the device is not supported.

rocsparse_coo2csr()

rocsparse_status rocsparse_coo2csr(rocsparse_handle handle, const rocsparse_int *coo_row_ind, rocsparse_int nnz, rocsparse_int m, rocsparse_int *csr_row_ptr, rocsparse_index_base idx_base)

Convert a sparse COO matrix into a sparse CSR matrix.

rocsparse_coo2csr converts the COO array containing the row indices into a CSR array of row offsets, that point to the start of every row. It is assumed that the COO row index array is sorted.

Note

It can also be used, to convert a COO array containing the column indices into a CSC array of column offsets, that point to the start of every column. Then, it is assumed that the COO column index array is sorted, instead.

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Example

This example converts a COO matrix into a CSR matrix.

//     1 2 0 3 0
// A = 0 4 5 0 0
//     6 0 0 7 8

rocsparse_int m   = 3;
rocsparse_int n   = 5;
rocsparse_int nnz = 8;

coo_row_ind[nnz] = {0, 0, 0, 1, 1, 2, 2, 2}; // device memory
coo_col_ind[nnz] = {0, 1, 3, 1, 2, 0, 3, 4}; // device memory
coo_val[nnz]     = {1, 2, 3, 4, 5, 6, 7, 8}; // device memory

// Allocate CSR matrix arrays
rocsparse_int* csr_row_ptr;
rocsparse_int* csr_col_ind;
float* csr_val;

hipMalloc((void**)&csr_row_ptr, sizeof(rocsparse_int) * (m + 1));
hipMalloc((void**)&csr_col_ind, sizeof(rocsparse_int) * nnz);
hipMalloc((void**)&csr_val, sizeof(float) * nnz);

// Convert the coo row indices into csr row offsets
rocsparse_coo2csr(handle,
                  coo_row_ind,
                  nnz,
                  m,
                  csr_row_ptr,
                  rocsparse_index_base_zero);

// Copy the column and value arrays
hipMemcpy(csr_col_ind,
          coo_col_ind,
          sizeof(rocsparse_int) * nnz,
          hipMemcpyDeviceToDevice);

hipMemcpy(csr_val,
          coo_val,
          sizeof(float) * nnz,
          hipMemcpyDeviceToDevice);

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] coo_row_ind: array of nnz elements containing the row indices of the sparse COO matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] m: number of rows of the sparse CSR matrix.

  • [out] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] idx_base: rocsparse_index_base_zero or rocsparse_index_base_one.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m or nnz is invalid.

  • rocsparse_status_invalid_pointer: coo_row_ind or csr_row_ptr pointer is invalid.

rocsparse_csr2csc_buffer_size()

rocsparse_status rocsparse_csr2csc_buffer_size(rocsparse_handle handle, rocsparse_int m, rocsparse_int n, rocsparse_int nnz, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_action copy_values, size_t *buffer_size)

Convert a sparse CSR matrix into a sparse CSC matrix.

rocsparse_csr2csc_buffer_size returns the size of the temporary storage buffer required by rocsparse_scsr2csc() and rocsparse_dcsr2csc(). The temporary storage buffer must be allocated by the user.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] n: number of columns of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] csr_col_ind: array of nnz elements containing the column indices of the sparse CSR matrix.

  • [in] copy_values: rocsparse_action_symbolic or rocsparse_action_numeric.

  • [out] buffer_size: number of bytes of the temporary storage buffer required by sparse_csr2csc().

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m, n or nnz is invalid.

  • rocsparse_status_invalid_pointer: csr_row_ptr, csr_col_ind or buffer_size pointer is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

rocsparse_csr2csc()

rocsparse_status rocsparse_scsr2csc(rocsparse_handle handle, rocsparse_int m, rocsparse_int n, rocsparse_int nnz, const float *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, float *csc_val, rocsparse_int *csc_row_ind, rocsparse_int *csc_col_ptr, rocsparse_action copy_values, rocsparse_index_base idx_base, void *temp_buffer)
rocsparse_status rocsparse_dcsr2csc(rocsparse_handle handle, rocsparse_int m, rocsparse_int n, rocsparse_int nnz, const double *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, double *csc_val, rocsparse_int *csc_row_ind, rocsparse_int *csc_col_ptr, rocsparse_action copy_values, rocsparse_index_base idx_base, void *temp_buffer)

Convert a sparse CSR matrix into a sparse CSC matrix.

rocsparse_csr2csc converts a CSR matrix into a CSC matrix. rocsparse_csr2csc can also be used to convert a CSC matrix into a CSR matrix. copy_values decides whether csc_val is being filled during conversion (rocsparse_action_numeric) or not (rocsparse_action_symbolic).

rocsparse_csr2csc requires extra temporary storage buffer that has to be allocated by the user. Storage buffer size can be determined by rocsparse_csr2csc_buffer_size().

Note

The resulting matrix can also be seen as the transpose of the input matrix.

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Example

This example computes the transpose of a CSR matrix.

//     1 2 0 3 0
// A = 0 4 5 0 0
//     6 0 0 7 8

rocsparse_int m_A   = 3;
rocsparse_int n_A   = 5;
rocsparse_int nnz_A = 8;

csr_row_ptr_A[m+1] = {0, 3, 5, 8};             // device memory
csr_col_ind_A[nnz] = {0, 1, 3, 1, 2, 0, 3, 4}; // device memory
csr_val_A[nnz]     = {1, 2, 3, 4, 5, 6, 7, 8}; // device memory

// Allocate memory for transposed CSR matrix
rocsparse_int m_T   = n_A;
rocsparse_int n_T   = m_A;
rocsparse_int nnz_T = nnz_A;

rocsparse_int* csr_row_ptr_T;
rocsparse_int* csr_col_ind_T;
float* csr_val_T;

hipMalloc((void**)&csr_row_ptr_T, sizeof(rocsparse_int) * (m_T + 1));
hipMalloc((void**)&csr_col_ind_T, sizeof(rocsparse_int) * nnz_T);
hipMalloc((void**)&csr_val_T, sizeof(float) * nnz_T);

// Obtain the temporary buffer size
size_t buffer_size;
rocsparse_csr2csc_buffer_size(handle,
                              m_A,
                              n_A,
                              nnz_A,
                              csr_row_ptr_A,
                              csr_col_ind_A,
                              rocsparse_action_numeric,
                              &buffer_size);

// Allocate temporary buffer
void* temp_buffer;
hipMalloc(&temp_buffer, buffer_size);

rocsparse_scsr2csc(handle,
                   m_A,
                   n_A,
                   nnz_A,
                   csr_val_A,
                   csr_row_ptr_A,
                   csr_col_ind_A,
                   csr_val_T,
                   csr_col_ind_T,
                   csr_row_ptr_T,
                   rocsparse_action_numeric,
                   rocsparse_index_base_zero,
                   temp_buffer);

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] n: number of columns of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] csr_val: array of nnz elements of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] csr_col_ind: array of nnz elements containing the column indices of the sparse CSR matrix.

  • [out] csc_val: array of nnz elements of the sparse CSC matrix.

  • [out] csc_row_ind: array of nnz elements containing the row indices of the sparse CSC matrix.

  • [out] csc_col_ptr: array of n+1 elements that point to the start of every column of the sparse CSC matrix.

  • [in] copy_values: rocsparse_action_symbolic or rocsparse_action_numeric.

  • [in] idx_base: rocsparse_index_base_zero or rocsparse_index_base_one.

  • [in] temp_buffer: temporary storage buffer allocated by the user, size is returned by rocsparse_csr2csc_buffer_size().

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m, n or nnz is invalid.

  • rocsparse_status_invalid_pointer: csr_val, csr_row_ptr, csr_col_ind, csc_val, csc_row_ind, csc_col_ptr or temp_buffer pointer is invalid.

  • rocsparse_status_arch_mismatch: the device is not supported.

  • rocsparse_status_internal_error: an internal error occurred.

rocsparse_csr2ell_width()

rocsparse_status rocsparse_csr2ell_width(rocsparse_handle handle, rocsparse_int m, const rocsparse_mat_descr csr_descr, const rocsparse_int *csr_row_ptr, const rocsparse_mat_descr ell_descr, rocsparse_int *ell_width)

Convert a sparse CSR matrix into a sparse ELL matrix.

rocsparse_csr2ell_width computes the maximum of the per row non-zero elements over all rows, the ELL width, for a given CSR matrix.

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] csr_descr: descriptor of the sparse CSR matrix. Currently, only rocsparse_matrix_type_general is supported.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] ell_descr: descriptor of the sparse ELL matrix. Currently, only rocsparse_matrix_type_general is supported.

  • [out] ell_width: pointer to the number of non-zero elements per row in ELL storage format.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m is invalid.

  • rocsparse_status_invalid_pointer: csr_descr, csr_row_ptr, or ell_width pointer is invalid.

  • rocsparse_status_internal_error: an internal error occurred.

  • rocsparse_status_not_implemented: rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_csr2ell()

rocsparse_status rocsparse_scsr2ell(rocsparse_handle handle, rocsparse_int m, const rocsparse_mat_descr csr_descr, const float *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, const rocsparse_mat_descr ell_descr, rocsparse_int ell_width, float *ell_val, rocsparse_int *ell_col_ind)
rocsparse_status rocsparse_dcsr2ell(rocsparse_handle handle, rocsparse_int m, const rocsparse_mat_descr csr_descr, const double *csr_val, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, const rocsparse_mat_descr ell_descr, rocsparse_int ell_width, double *ell_val, rocsparse_int *ell_col_ind)

Convert a sparse CSR matrix into a sparse ELL matrix.

rocsparse_csr2ell converts a CSR matrix into an ELL matrix. It is assumed, that ell_val and ell_col_ind are allocated. Allocation size is computed by the number of rows times the number of ELL non-zero elements per row, such that \( nnz_{ELL} = m \cdot ell\_width\). The number of ELL non-zero elements per row is obtained by rocsparse_csr2ell_width().

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Example

This example converts a CSR matrix into an ELL matrix.

//     1 2 0 3 0
// A = 0 4 5 0 0
//     6 0 0 7 8

rocsparse_int m   = 3;
rocsparse_int n   = 5;
rocsparse_int nnz = 8;

csr_row_ptr[m+1] = {0, 3, 5, 8};             // device memory
csr_col_ind[nnz] = {0, 1, 3, 1, 2, 0, 3, 4}; // device memory
csr_val[nnz]     = {1, 2, 3, 4, 5, 6, 7, 8}; // device memory

// Create ELL matrix descriptor
rocsparse_mat_descr ell_descr;
rocsparse_create_mat_descr(&ell_descr);

// Obtain the ELL width
rocsparse_int ell_width;
rocsparse_csr2ell_width(handle,
                        m,
                        csr_descr,
                        csr_row_ptr,
                        ell_descr,
                        &ell_width);

// Compute ELL non-zero entries
rocsparse_int ell_nnz = m * ell_width;

// Allocate ELL column and value arrays
rocsparse_int* ell_col_ind;
hipMalloc((void**)&ell_col_ind, sizeof(rocsparse_int) * ell_nnz);

float* ell_val;
hipMalloc((void**)&ell_val, sizeof(float) * ell_nnz);

// Format conversion
rocsparse_scsr2ell(handle,
                   m,
                   csr_descr,
                   csr_val,
                   csr_row_ptr,
                   csr_col_ind,
                   ell_descr,
                   ell_width,
                   ell_val,
                   ell_col_ind);

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] csr_descr: descriptor of the sparse CSR matrix. Currently, only rocsparse_matrix_type_general is supported.

  • [in] csr_val: array containing the values of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [in] csr_col_ind: array containing the column indices of the sparse CSR matrix.

  • [in] ell_descr: descriptor of the sparse ELL matrix. Currently, only rocsparse_matrix_type_general is supported.

  • [in] ell_width: number of non-zero elements per row in ELL storage format.

  • [out] ell_val: array of m times ell_width elements of the sparse ELL matrix.

  • [out] ell_col_ind: array of m times ell_width elements containing the column indices of the sparse ELL matrix.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m or ell_width is invalid.

  • rocsparse_status_invalid_pointer: csr_descr, csr_val, csr_row_ptr, csr_col_ind, ell_descr, ell_val or ell_col_ind pointer is invalid.

  • rocsparse_status_not_implemented: rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_ell2csr_nnz()

rocsparse_status rocsparse_ell2csr_nnz(rocsparse_handle handle, rocsparse_int m, rocsparse_int n, const rocsparse_mat_descr ell_descr, rocsparse_int ell_width, const rocsparse_int *ell_col_ind, const rocsparse_mat_descr csr_descr, rocsparse_int *csr_row_ptr, rocsparse_int *csr_nnz)

Convert a sparse ELL matrix into a sparse CSR matrix.

rocsparse_ell2csr_nnz computes the total CSR non-zero elements and the CSR row offsets, that point to the start of every row of the sparse CSR matrix, for a given ELL matrix. It is assumed that csr_row_ptr has been allocated with size m + 1.

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] m: number of rows of the sparse ELL matrix.

  • [in] n: number of columns of the sparse ELL matrix.

  • [in] ell_descr: descriptor of the sparse ELL matrix. Currently, only rocsparse_matrix_type_general is supported.

  • [in] ell_width: number of non-zero elements per row in ELL storage format.

  • [in] ell_col_ind: array of m times ell_width elements containing the column indices of the sparse ELL matrix.

  • [in] csr_descr: descriptor of the sparse CSR matrix. Currently, only rocsparse_matrix_type_general is supported.

  • [out] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.

  • [out] csr_nnz: pointer to the total number of non-zero elements in CSR storage format.

Return Value
  • rocsparse_status_success: the operation completed successfully.

  • rocsparse_status_invalid_handle: the library context was not initialized.

  • rocsparse_status_invalid_size: m, n or ell_width is invalid.

  • rocsparse_status_invalid_pointer: ell_descr, ell_col_ind, csr_descr, csr_row_ptr or csr_nnz pointer is invalid.

  • rocsparse_status_not_implemented: rocsparse_matrix_type != rocsparse_matrix_type_general.

rocsparse_ell2csr()

rocsparse_status rocsparse_csr2csc_buffer_size(rocsparse_handle handle, rocsparse_int m, rocsparse_int n, rocsparse_int nnz, const rocsparse_int *csr_row_ptr, const rocsparse_int *csr_col_ind, rocsparse_action copy_values, size_t *buffer_size)

Convert a sparse CSR matrix into a sparse CSC matrix.

rocsparse_csr2csc_buffer_size returns the size of the temporary storage buffer required by rocsparse_scsr2csc() and rocsparse_dcsr2csc(). The temporary storage buffer must be allocated by the user.

Parameters
  • [in] handle: handle to the rocsparse library context queue.

  • [in] m: number of rows of the sparse CSR matrix.

  • [in] n: number of columns of the sparse CSR matrix.

  • [in] nnz: number of non-zero entries of the sparse CSR matrix.

  • [in] csr_row_ptr: array of m+1 elements that point to the start of every row of the sparse CSR matrix.