3 posts
Microcontrollers
Bare-metal programming, register-level optimization, and embedded systems deep-dives. ESP32, ARM Cortex, and low-power design.
Browse articles → 2 posts
vLLM
vLLM internals, PagedAttention, continuous batching, and inference serving optimization.
Browse articles → 3 posts
LLM Inference
Large language model inference optimization, KV cache management, and transformer performance.
Browse articles → 3 posts
Transformers
Attention mechanisms, architecture analysis, and transformer implementation details.
Browse articles → 2 posts
Profiling
Performance analysis with perf, DTrace, eBPF, flame graphs, and systematic debugging.
Browse articles → 2 posts
Distributed Systems
Distributed inference, tensor parallelism, model sharding, and cluster optimization.
Browse articles → 3 posts
GPU Programming
CUDA, Habana Gaudi, GPU architecture, and accelerator programming.
Browse articles →