Course Outline


  • What is CUDA?
  • CUDA vs OpenCL vs SYCL
  • Overview of CUDA features and architecture
  • Setting up the Development Environment

Getting Started

  • Creating a new CUDA project using Visual Studio Code
  • Exploring the project structure and files
  • Compiling and running the program
  • Displaying the output using printf and fprintf


  • Understanding the role of CUDA API in the host program
  • Using CUDA API to query device information and capabilities
  • Using CUDA API to allocate and deallocate device memory
  • Using CUDA API to copy data between host and device
  • Using CUDA API to launch kernels and synchronize threads
  • Using CUDA API to handle errors and exceptions


  • Understanding the role of CUDA C/C++ in the device program
  • Using CUDA C/C++ to write kernels that execute on the GPU and manipulate data
  • Using CUDA C/C++ data types, qualifiers, operators, and expressions
  • Using CUDA C/C++ built-in functions, such as math, atomic, warp, etc.
  • Using CUDA C/C++ built-in variables, such as threadIdx, blockIdx, blockDim, etc.
  • Using CUDA C/C++ libraries, such as cuBLAS, cuFFT, cuRAND, etc.

CUDA Memory Model

  • Understanding the difference between host and device memory models
  • Using CUDA memory spaces, such as global, shared, constant, and local
  • Using CUDA memory objects, such as pointers, arrays, textures, and surfaces
  • Using CUDA memory access modes, such as read-only, write-only, read-write, etc.
  • Using CUDA memory consistency model and synchronization mechanisms

CUDA Execution Model

  • Understanding the difference between host and device execution models
  • Using CUDA threads, blocks, and grids to define the parallelism
  • Using CUDA thread functions, such as threadIdx, blockIdx, blockDim, etc.
  • Using CUDA block functions, such as __syncthreads, __threadfence_block, etc.
  • Using CUDA grid functions, such as gridDim, gridSync, cooperative groups, etc.


  • Understanding the common errors and bugs in CUDA programs
  • Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
  • Using CUDA-GDB to debug CUDA programs on Linux
  • Using CUDA-MEMCHECK to detect memory errors and leaks
  • Using NVIDIA Nsight to debug and analyze CUDA programs on Windows


  • Understanding the factors that affect the performance of CUDA programs
  • Using CUDA coalescing techniques to improve memory throughput
  • Using CUDA caching and prefetching techniques to reduce memory latency
  • Using CUDA shared memory and local memory techniques to optimize memory accesses and bandwidth
  • Using CUDA profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps


  • An understanding of C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors


  • Developers who wish to learn how to use CUDA to program NVIDIA GPUs and exploit their parallelism
  • Developers who wish to write high-performance and scalable code that can run on different CUDA devices
  • Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance
 28 Hours

Number of participants

Price per participant

Testimonials (1)

Related Categories