vLLM | Fridays with Faraday

Dissecting vLLM's PagedAttention: A Memory-Level Analysis

A deep technical analysis of PagedAttention's memory management, including KV cache fragmentation analysis, page table implementation details, and performance implications at the GPU memory hierarchy level.

Nov 15, 2024 · 25 min read

#vllm#pagedattention#kv-cache +2

vllm expert

Implementing Continuous Batching: From Scheduling Theory to vLLM Practice

A detailed analysis of continuous batching algorithms, including iteration-level scheduling, preemption strategies, and the interaction between scheduler and memory manager in production inference systems.

Nov 10, 2024 · 23 min read

#vllm#continuous-batching#scheduling +2

← All Categories