Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- What is OpenCL?
- OpenCL vs CUDA vs SYCL
- Overview of OpenCL features and architecture
- Setting up the Development Environment
Getting Started
- Creating a new OpenCL project using Visual Studio Code
- Exploring the project structure and files
- Compiling and running the program
- Displaying the output using printf and fprintf
OpenCL API
- Understanding the role of OpenCL API in the host program
- Using OpenCL API to query device information and capabilities
- Using OpenCL API to create contexts, command queues, buffers, kernels, and events
- Using OpenCL API to enqueue commands, such as read, write, copy, map, unmap, execute, and wait
- Using OpenCL API to handle errors and exceptions
OpenCL C
- Understanding the role of OpenCL C in the device program
- Using OpenCL C to write kernels that execute on the device and manipulate data
- Using OpenCL C data types, qualifiers, operators, and expressions
- Using OpenCL C built-in functions, such as math, geometric, relational, etc.
- Using OpenCL C extensions and libraries, such as atomic, image, cl_khr_fp16, etc.
OpenCL Memory Model
- Understanding the difference between host and device memory models
- Using OpenCL memory spaces, such as global, local, constant, and private
- Using OpenCL memory objects, such as buffers, images, and pipes
- Using OpenCL memory access modes, such as read-only, write-only, read-write, etc.
- Using OpenCL memory consistency model and synchronization mechanisms
OpenCL Execution Model
- Understanding the difference between host and device execution models
- Using OpenCL work-items, work-groups, and ND-ranges to define the parallelism
- Using OpenCL work-item functions, such as get_global_id, get_local_id, get_group_id, etc.
- Using OpenCL work-group functions, such as barrier, work_group_reduce, work_group_scan, etc.
- Using OpenCL device functions, such as get_num_groups, get_global_size, get_local_size, etc.
Debugging
- Understanding the common errors and bugs in OpenCL programs
- Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
- Using CodeXL to debug and analyze OpenCL programs on AMD devices
- Using Intel VTune to debug and analyze OpenCL programs on Intel devices
- Using NVIDIA Nsight to debug and analyze OpenCL programs on NVIDIA devices
Optimization
- Understanding the factors that affect the performance of OpenCL programs
- Using OpenCL vector data types and vectorization techniques to improve arithmetic throughput
- Using OpenCL loop unrolling and loop tiling techniques to reduce control overhead and increase locality
- Using OpenCL local memory and local memory functions to optimize memory accesses and bandwidth
- Using OpenCL profiling and profiling tools to measure and improve the execution time and resource utilization
Summary and Next Steps
Requirements
- An understanding of C/C++ language and parallel programming concepts
- Basic knowledge of computer architecture and memory hierarchy
- Experience with command-line tools and code editors
Audience
- Developers who wish to learn how to use OpenCL to program heterogeneous devices and exploit their parallelism
- Developers who wish to write portable and scalable code that can run on different platforms and devices
- Programmers who wish to explore the low-level aspects of heterogeneous programming and optimize their code performance
28 Hours
Testimonials (2)
Very interactive with various examples, with a good progression in complexity between the start and the end of the training.
Jenny - Andheo
Course - GPU Programming with CUDA and Python
Trainers energy and humor.