Readings

A collection of interesting papers and articles

Beej's Guide to C Programming
Lightweight HTML e-book format of Beej's Guide to C Programming.
How to think about GPUs - A Special section of 'How to Scale Your Model'
Part 12 of How To Scale Your Model (Part 11: Conclusion | The End)
NVIDIA Tensor Core Evolution: From Volta To Blackwell
Technical overview of the evolution of NVIDIA's Tensor Core architecture, from the Volta GPU to the latest Blackwell GPU, highlighting the key architectural changes, performance improvements, and programming model advancements over the generations.
How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
Iterative optimization of a CUDA matrix multiplication (SGEMM) kernel to achieve near-peak performance on a GPU, exploring techniques such as global memory coalescing, shared memory caching, and increasing arithmetic intensity.
LLM Training on GPU Clusters: Ultra-Scale Playbook
An ultra-scale playbook of LLM Training on GPU Clusters containing the theory, the code and efficiency benchmarking.
How to Scale your model
A Systems view of LLMs on TPUs (and GPUs) - A playbook