Course Outline

Introduction

  • What is GPU programming?
  • Why use CUDA with Python?
  • Key concepts: Threads, Blocks, Grids

Overview of CUDA Features and Architecture

  • GPU vs CPU architecture
  • Understanding SIMT (Single Instruction, Multiple Threads)
  • CUDA programming model

Setting up the Development Environment

  • Installing CUDA Toolkit and drivers
  • Installing Python and Numba
  • Setting up and verifying the environment

Parallel Programming Fundamentals

  • Introduction to parallel execution
  • Understanding threads and thread hierarchies
  • Working with warps and synchronization

Working with the Numba Compiler

  • Introduction to Numba
  • Writing CUDA kernels with Numba
  • Understanding @cuda.jit decorators

Building a Custom CUDA Kernel

  • Writing and launching a basic kernel
  • Using threads for element-wise operations
  • Managing grid and block dimensions

Memory Management

  • Types of GPU memory (global, shared, local, constant)
  • Memory transfer between host and device
  • Optimizing memory usage and avoiding bottlenecks

Advanced Topics in GPU Acceleration

  • Shared memory and synchronization
  • Using streams for asynchronous execution
  • Multi-GPU programming basics

Converting CPU-based Applications to GPU

  • Profiling CPU code
  • Identifying parallelizable sections
  • Porting logic to CUDA kernels

Troubleshooting

  • Debugging CUDA applications
  • Common errors and how to resolve them
  • Tools and techniques for testing and validation

Summary and Next Steps

  • Review of key concepts
  • Best practices in GPU programming
  • Resources for continued learning

Requirements

  • Python programming experience
  • Experience with NumPy (ndarrays, ufuncs, etc.)

Audience

  • Developers
 14 Hours

Testimonials (1)

Related Categories