GPU Programming | Fridays with Faraday

Showcase: Interactive Deep-Dives on Fridays with Faraday

A demonstration of the new high-performance technical blog features, including interactive GPU analysis and rigorous mathematical proofs.

Jan 6, 2026 · 2 min read

#interactive#tutorial#meta

gpu programming expert

Habana Gaudi2 Memory Subsystem: Optimization Strategies for LLM Inference

Deep dive into Gaudi2's HBM architecture, SRAM hierarchy, and TPC memory access patterns. Practical optimization techniques for maximizing memory bandwidth utilization in transformer workloads.

Nov 11, 2024 · 26 min read

#gaudi#habana#hpu +3

gpu programming advanced

CUDA Graphs for Inference: Eliminating CPU Launch Overhead

Deep dive into CUDA graph capture, replay, and the specific challenges of applying graphs to dynamic LLM inference workloads. Includes capture strategies and performance measurements.

Nov 8, 2024 · 18 min read

#cuda#cuda-graphs#inference +2

gpu programming advanced

Writing Efficient CUDA Kernels: From Naive to Optimized

Step-by-step optimization of a CUDA kernel using memory coalescing, shared memory, occupancy tuning, and instruction-level parallelism.

Nov 2, 2024 · 20 min read

#cuda#kernel#optimization +2

← All Categories