Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- What is ROCm?
- What is HIP?
- ROCm vs CUDA vs OpenCL
- Overview of ROCm and HIP features and architecture
- Setting up the Development Environment
Getting Started
- Creating a new ROCm project using Visual Studio Code
- Exploring the project structure and files
- Compiling and running the program
- Displaying the output using printf and fprintf
ROCm API
- Understanding the role of ROCm API in the host program
- Using ROCm API to query device information and capabilities
- Using ROCm API to allocate and deallocate device memory
- Using ROCm API to copy data between host and device
- Using ROCm API to launch kernels and synchronize threads
- Using ROCm API to handle errors and exceptions
HIP Language
- Understanding the role of HIP language in the device program
- Using HIP language to write kernels that execute on the GPU and manipulate data
- Using HIP data types, qualifiers, operators, and expressions
- Using HIP built-in functions, variables, and libraries to perform common tasks and operations
ROCm and HIP Memory Model
- Understanding the difference between host and device memory models
- Using ROCm and HIP memory spaces, such as global, shared, constant, and local
- Using ROCm and HIP memory objects, such as pointers, arrays, textures, and surfaces
- Using ROCm and HIP memory access modes, such as read-only, write-only, read-write, etc.
- Using ROCm and HIP memory consistency model and synchronization mechanisms
ROCm and HIP Execution Model
- Understanding the difference between host and device execution models
- Using ROCm and HIP threads, blocks, and grids to define the parallelism
- Using ROCm and HIP thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
- Using ROCm and HIP block functions, such as __syncthreads, __threadfence_block, etc.
- Using ROCm and HIP grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, etc.
Debugging
- Understanding the common errors and bugs in ROCm and HIP programs
- Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
- Using ROCm Debugger to debug ROCm and HIP programs on AMD devices
- Using ROCm Profiler to analyze ROCm and HIP programs on AMD devices
Optimization
- Understanding the factors that affect the performance of ROCm and HIP programs
- Using ROCm and HIP coalescing techniques to improve memory throughput
- Using ROCm and HIP caching and prefetching techniques to reduce memory latency
- Using ROCm and HIP shared memory and local memory techniques to optimize memory accesses and bandwidth
- Using ROCm and HIP profiling and profiling tools to measure and improve the execution time and resource utilization
Summary and Next Steps
Requirements
- An understanding of C/C++ language and parallel programming concepts
- Basic knowledge of computer architecture and memory hierarchy
- Experience with command-line tools and code editors
Audience
- Developers who wish to learn how to use ROCm and HIP to program AMD GPUs and exploit their parallelism
- Developers who wish to write high-performance and scalable code that can run on different AMD devices
- Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance
28 Hours
Testimonials (2)
Very interactive with various examples, with a good progression in complexity between the start and the end of the training.
Jenny - Andheo
Course - GPU Programming with CUDA and Python
Trainers energy and humor.