Some may also find Junji Hashimoto's GPU programming library in lean (w/ webgpu) interesting:
https://github.com/Verilean/hesper
Even includes an example of transformer inference (quantized 1.5 bit):
https://github.com/Verilean/hesper/blob/a688ce9848d6416b2e95...