Categories | Fridays with Faraday

Microcontrollers

Bare-metal programming, register-level optimization, and embedded systems deep-dives. ESP32, ARM Cortex, and low-power design.

vLLM internals, PagedAttention, continuous batching, and inference serving optimization.

Large language model inference optimization, KV cache management, and transformer performance.

Attention mechanisms, architecture analysis, and transformer implementation details.

Performance analysis with perf, DTrace, eBPF, flame graphs, and systematic debugging.

Distributed inference, tensor parallelism, model sharding, and cluster optimization.

CUDA, Habana Gaudi, GPU architecture, and accelerator programming.