HIP terminology comparison with OpenCL, Cuda, C++ AMP and HCC

Term

CUDA

HIP

HC

C++AMP

OpenCL

Device

int deviceId

int deviceId

hc::accelerator

concurrency:: accelerator

cl_device

Queue

cudaStream_t

hipStream_t

hc:: accelerator_view

concurrency:: accelerator_view

cl_command_queue

Event

cudaEvent_t

hipEvent_t

hc:: completion_future

concurrency:: completion_future

cl_event

Memory

void *

void *

void *; hc::array; hc::array_view

concurrency::array;

concurrency::array_view

cl_mem

grid

block

thread

warp

grid

block

thread

warp

extent

tile

thread

wavefront

extent

tile

thread

N/A

NDRange

work-group

work-item

sub-group

Thread index

threadIdx.x

hipThreadIdx_x

t_idx.local[0]

t_idx.local[0]

get_local_id(0)

Block index

blockIdx.x

hipBlockIdx_x

t_idx.tile[0]

t_idx.tile[0]

get_group_id(0)

Block dim

blockDim.x

hipBlockDim_x

t_ext.tile_dim[0]

t_idx.tile_dim0

get_local_size(0)

Grid-dim

gridDim.x

hipGridDim_x

t_ext[0]

t_ext[0]

get_global_size(0)

Device Function

__device__

__device__

[[hc]] (detected automatically in many case)

restrict(amp)

Implied in device Compilation

Host Function

__host_

(default)

__host_ (default)

[[cpu]] (default)

strict(cpu) (default)

Implied in host Compilation

Host + Device Function

__host__ __device__

__host_

__device__

[[hc]] [[cpu]]

restrict(amp,cpu)

No equivalent

Kernel Launch

<<< >>>

hipLaunchKernel

GGL

hc:: parallel_for_each

concurrency:: parallel_for_each

clEnqueueND- RangeKernel

Global Memory

__global__

__global__

Unnecessary/ Implied

Unnecessary/Implied

__global

Group Memory

__shared__

__shared__

tile_static

tile_static

__local

Constant

__constant__

__constant__

Unnecessary/ Implied

Unnecessary / Implied

__constant

__syncthreads

__syncthreads

tile_static.barrier()

t_idx.barrier()

barrier(CLK_LOCAL_MEMFENCE)

Atomic Builtins

atomicAdd

atomicAdd

hc::atomic_fetch_add

concurrency:: atomic_fetch_add

atomic_add

Precise Math

cos(f)

cos(f)

hc:: precise_math::cos(f)

concurrency:: precise_math::cos(f)

cos(f)

Fast Math

__cos(f)

__cos(f)

hc::fast_math::cos(f)

concurrency:: fast_math::cos(f)

native_cos(f)

Vector

float4

float4

hc:: short_vector::float4

concurrency:: graphics::float_4

float4

Notes

  • For HC and C++AMP, assume a captured tiled_ext named “t_ext” and captured extent named “ext”. These languages use captured variables to pass information to the kernel rather than using special built-in functions so the exact variable name may vary.

  • The indexing functions (starting with thread-index) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.

  • HC allows tile dimensions to be specified at runtime while C++AMP requires that tile dimensions be specified at compile-time.Thus hc syntax for tile dims is t_ext.tile_dim[0] while C++AMP is t_ext.tile_dim0.