⚡ Fridays with Faraday
  • Posts
  • Categories
  • Series
  • About
  • Posts
  • Categories
  • Series
  • About
ESC

Start typing to search...

🔮

Transformers

Attention mechanisms, architecture analysis, and transformer implementation details.

3 articles

transformers advanced

Attention Variants Compared: MHA, MQA, GQA, and MLA

Technical comparison of Multi-Head Attention, Multi-Query Attention, Grouped-Query Attention, and Multi-head Latent Attention. Analysis of memory-compute trade-offs and implementation considerations.

Nov 7, 2024 · 22 min read
#attention#mha#mqa +4
transformers advanced

Grouped Query Attention: Memory-Throughput Trade-offs

Analysis of GQA's KV cache reduction mechanism, optimal group sizes, and performance implications for inference at scale. Includes benchmarks across different model sizes.

Nov 6, 2024 · 16 min read
#attention#gqa#mha +3
transformers advanced

RoPE Embeddings: Implementation and Long Context Scaling

Understanding Rotary Position Embeddings, their efficient implementation, and techniques for extending context length including YaRN and Dynamic NTK scaling.

Oct 31, 2024 · 17 min read
#rope#positional-encoding#transformers +2
← All Categories
⚡ Fridays with Faraday

Deep technical explorations in systems performance optimization, from bare-metal microcontrollers to large-scale LLM inference systems.

Categories

  • Microcontrollers
  • vLLM
  • LLM Inference
  • Hardware
  • Profiling
  • GPU Programming

Resources

  • About
  • RSS Feed
  • Sitemap
  • GitHub

© 2025 Fridays with Faraday. Built with Astro.

"Measure. Optimize. Repeat."