Welcome to the new Fridays with Faraday. We have upgraded the platform to support the rigorous needs of systems performance engineering.
1. Interactive Performance Analysis
Understanding if a kernel is memory-bound or compute-bound is easier with interactive tools. Use the slider below to see how arithmetic intensity shifts the “performance dot” along the roofline curve.
Native Roofline Model
2. Rigorous Mathematical Notation
For complex performance proofs, we now use specialized Theorem and Proof containers to distinguish theory from implementation.
For a processor with peak performance (TFLOPS) and peak bandwidth (GB/s), the ridge point where an algorithm transitions from memory-bound to compute-bound is defined as:
3. Reproducibility & Colab Integration
Notice the “Run in Google Colab” button at the top of this post. It dynamically links to the Jupyter Notebook associated with this technical analysis, allowing you to run the benchmarks yourself.
4. Reading Progress
As you scroll down this post, watch the accent-blue bar at the very top of your browser. It provides visual feedback for long-form technical deep-dives (especially useful for our 25+ minute “expert” series).
5. Multi-Author Support
At the bottom of this page, you will see a dynamic Author Bio. By moving authors to a separate data collection, we now support guest contributions from the wider performance engineering community.
What’s next? Try the new Search feature in the header to find other posts involving CUDA or Registers.