**Show HN: Hlb-CIFAR10 0.2.0: New world record (~<12.38s) on single-GPU CIFAR10**
Hello everyone,
After recreating the accuracy/rough speed from David Page's implementation in hlb-CIFAR10 0.1.0 (18.1s on an A100, SXM4, Colab), it was down to some basic NVIDIA kernel profiling to figure out which operations were the long poles in the tent. Perhaps (somewhat?) unsurprisingly, the NCHW <-> NHWC thrash was the worst part, but unfortunately the GhostBatchNorm was a barrier even using the faster-on-Ampere channels\ ... ⌘ Read more
matched #26u2d5a score:13.59 Search by: