Virtual Reality, Visualization and Imaging Research Centre

Department of Computer Science & Engineering, CUHK.

 

Yongming Xie

 

Home

Demo

Research

Publications

Awards

Patent

Contact

 

Demo

CUDA OpenGL Tutorials

by Xie Yongming 

26/02 2007

 

Introduction

      CUDA ("Compute Unified Device Architecture"), is a GPGPU technology that allows a programmer to use the C programming language to code algorithms for execution on the GPU. CUDA has been developed by NVIDIA and to use this architecture requires an NVIDIA GPU and special stream processing drivers. CUDA only works with the new GeForce 8 Series, featuring G8X GPUs; NVIDIA guarantees that programs developed for the GeForce 8 series will also work without modification on all future NVIDIA video cards. CUDA gives developers unfettered access to the native instruction set and memory of the massively parallel computational elements in CUDA GPUs. Using CUDA, NVIDIA GeForce-based GPUs effectively become powerful, programmable open architectures like today’s CPUs (Central Processing Units). By opening up the architecture, CUDA provides developers both with the low-level, deterministic, and for repeatable access to hardware that is necessary API to develop essential high-level programming tools such as compilers, debuggers, math libraries, and application platforms.

CUDA Opengl pipeline

CUDA

Device
Both APIs provide a way to enumerate the devices available on the system, query their properties, and select one of them for kernel executions.
Memory
Device memory can be allocated either as linear memory or as arrays.
Linear memory exists on the device in a 32-bit address space, so separately allocated entities can reference one another via pointers, for example, in a binary tree.
Arrays are opaque memory layouts optimized for texture fetching. They are one-dimensional or two-dimensional and composed of elements, each of which has 1, 2 or 4 components that may be signed or unsigned 8-, 16- or 32-bit integers, 16-bit floats (CUDA driver only), or 32-bit floats. Arrays are only readable by kernels through texture fetching.
Texture
Both linear memory and arrays can be bound to texture references and several distinct texture references might be bound to the same texture or to textures that overlap in memory.
Device memory reads through texture fetching present several advantages over reads from global or constant memory:
OpenGL Interoperability
OpenGL buffer objects may be mapped into the address space of CUDA, either to enable CUDA to read data written by OpenGL or to enable CUDA to write data for consumption by OpenGL.
Direct3D Interoperability
Direct3D 9.0 vertex buffers may be mapped into the address space of CUDA, either to enable CUDA to read data written by Direct3D or to enable CUDA to write data for consumption by Direct3D.

Thread Block
A thread block is a batch of threads that can cooperate together by efficiently sharing data through some fast shared memory and synchronizing their execution to coordinate memory accesses. More precisely, one can specify synchronization points in the kernel, where threads in a block are suspended until they all reach the synchronization point.

Memory Model
A thread that executes on the device has only access to the device’s DRAM and on-chip memory through the following memory spaces:
Read-write per-thread registers,
Read-write per-thread local memory,
Read-write per-block shared memory,
Read-write per-grid global memory,
Read-only per-grid constant memory,
Read-only per-grid texture memory.
The global, constant, and texture memory spaces can be read from or written to by the host and are persistent across kernel calls by the same application.


OpenGL Interoperability


cudaError_t cudaGLRegisterBufferObject(GLuint bufferObj);
registers the buffer object of ID bufferObj for access by CUDA. This function must be called before CUDA can map the buffer object. While it is registered, the buffer object cannot be used by any OpenGL commands except as a data source for OpenGL drawing commands.
cudaError_t cudaMapBufferObject(void** devPtr,unsigned int* size,GLuint bufferObj);
maps the buffer object of ID bufferObj into the address space of CUDA and returns in *devPtr and *size the base pointer and size of the resulting mapping.
cudaError_t cudaGLUnmapBufferObject(GLuint bufferObj);
unmaps the buffer object of ID bufferObj for access by CUDA.
cudaError_t cudaGLUnregisterBufferObject(GLuint bufferObj);
unregisters the buffer object of ID bufferObj for access by CUDA.


Sobel Edge Detector



 
Figure 1 Sobel convolution masks

 
These masks are designed to respond maximally to edges running vertically and horizontally relative to the pixel grid, one mask for each of the two perpendicular orientations. The masks can be applied separately to the input image, to produce separate measurements of the gradient component in each orientation (call these Gx and Gy). These can then be combined together to find the absolute magnitude of the gradient at each point and the orientation of that gradient. The gradient magnitude is given by:

Although typically, an approximate magnitude is computed using:

which is much faster to compute.
The angle of orientation of the edge (relative to the pixel grid) giving rise to the spatial gradient is given by:


CUDA Sobel Edge Detector kernel


Demo&source code.

CUDA Image Processing

CUDA Sobel

Download Demo: Demo.zip for windows XP Geforce 8800GTX.


References

http://en.wikipedia.org

http://developer.nvidia.com/