⚡ Fridays with Faraday
  • Posts
  • Categories
  • Series
  • About
  • Posts
  • Categories
  • Series
  • About
ESC

Start typing to search...

🎮

GPU Programming

CUDA, Habana Gaudi, GPU architecture, and accelerator programming.

3 articles

gpu programming expert

Habana Gaudi2 Memory Subsystem: Optimization Strategies for LLM Inference

Deep dive into Gaudi2's HBM architecture, SRAM hierarchy, and TPC memory access patterns. Practical optimization techniques for maximizing memory bandwidth utilization in transformer workloads.

Nov 11, 2024 · 26 min read
#gaudi#habana#hpu +3
gpu programming advanced

CUDA Graphs for Inference: Eliminating CPU Launch Overhead

Deep dive into CUDA graph capture, replay, and the specific challenges of applying graphs to dynamic LLM inference workloads. Includes capture strategies and performance measurements.

Nov 8, 2024 · 18 min read
#cuda#cuda-graphs#inference +2
gpu programming advanced

Writing Efficient CUDA Kernels: From Naive to Optimized

Step-by-step optimization of a CUDA kernel using memory coalescing, shared memory, occupancy tuning, and instruction-level parallelism.

Nov 2, 2024 · 20 min read
#cuda#kernel#optimization +2
← All Categories
⚡ Fridays with Faraday

Deep technical explorations in systems performance optimization, from bare-metal microcontrollers to large-scale LLM inference systems.

Categories

  • Microcontrollers
  • vLLM
  • LLM Inference
  • Hardware
  • Profiling
  • GPU Programming

Resources

  • About
  • RSS Feed
  • Sitemap
  • GitHub

© 2025 Fridays with Faraday. Built with Astro.

"Measure. Optimize. Repeat."