Name	Name	Last commit message	Last commit date
parent directory ..
examples	examples
README.md	README.md
content.md	content.md

Name

Last commit message

Last commit date

Module 2: Advanced GPU Memory Management

Overview

This module focuses on GPU memory hierarchy mastery and performance optimization: shared memory tiling, memory coalescing, texture/read-only memory usage, unified memory, and bandwidth optimization.

Learning Objectives

After completing this module, you will be able to:

Organize GPU threads in multidimensional grids
Map threads efficiently to data structures
Implement image processing algorithms on GPU
Create optimized matrix multiplication kernels
Handle boundary conditions in multidimensional algorithms

Module Content

content.md - Complete module content
examples/ - Practical code examples

Quick Start

Prerequisites

NVIDIA GPU with CUDA support OR AMD GPU with ROCm support
CUDA Toolkit 13.0+ or ROCm 7.0+ (Docker images provide CUDA 13.0.1 and ROCm 7.0)
C/C++ compiler (GCC, Clang, or MSVC)

Recommended: use our Docker dev environment

./docker/scripts/run.sh --auto

Build and Run

cd modules/module2/examples
make            # auto-detects your GPU and builds accordingly

# Run a few examples (binaries in build/)
./build/01_shared_memory_transpose_cuda    # or _hip on AMD
./build/02_memory_coalescing_cuda          # or _hip on AMD
./build/04_unified_memory_cuda

Topics to be Covered

1. Multidimensional Grid Organization

2D and 3D thread block configurations
Grid size calculations for arbitrary data sizes
Thread-to-data mapping strategies

2. Memory Access Patterns

Coalesced vs strided access
Structure of Arrays vs Array of Structures
Read-only/texture cache benefits

3. Shared Memory and Tiling

Tiled transpose with bank-conflict avoidance
Block-level cooperation and synchronization
Padding strategies to avoid bank conflicts

4. Unified Memory and Bandwidth

Unified memory prefetch and advice
Measuring and optimizing memory bandwidth
Analyzing profiler metrics for memory performance

Duration: 6-8 hours
Difficulty: Beginner-Intermediate
Prerequisites: Module 1 completion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

Module 2: Advanced GPU Memory Management

Overview

Learning Objectives

Module Content

Quick Start

Prerequisites

Build and Run

Topics to be Covered

1. Multidimensional Grid Organization

2. Memory Access Patterns

3. Shared Memory and Tiling

4. Unified Memory and Bandwidth

Uh oh!

FilesExpand file tree

module2

Directory actions

More options

Directory actions

More options

Latest commit

History

module2

Folders and files

parent directory

README.md

Module 2: Advanced GPU Memory Management

Overview

Learning Objectives

Module Content

Quick Start

Prerequisites

Build and Run

Topics to be Covered

1. Multidimensional Grid Organization

2. Memory Access Patterns

3. Shared Memory and Tiling

4. Unified Memory and Bandwidth