vllm expert
Dissecting vLLM's PagedAttention: A Memory-Level Analysis
A deep technical analysis of PagedAttention's memory management, including KV cache fragmentation analysis, page table implementation details, and performance implications at the GPU memory hierarchy level.
vllm expert
Implementing Continuous Batching: From Scheduling Theory to vLLM Practice
A detailed analysis of continuous batching algorithms, including iteration-level scheduling, preemption strategies, and the interaction between scheduler and memory manager in production inference systems.