Show HN: Hlb-CIFAR10 0.2.0: New world record (~<12.38s) on single-GPU CIFAR10
Hello everyone,

After recreating the accuracy/rough speed from David Page’s implementation in hlb-CIFAR10 0.1.0 (18.1s on an A100, SXM4, Colab), it was down to some basic NVIDIA kernel profiling to figure out which operations were the long poles in the tent. Perhaps (somewhat?) unsurprisingly, the NCHW <-> NHWC thrash was the worst part, but unfortunately the GhostBatchNorm was a barrier even using the faster-on-Ampere channels\ … ⌘ Read more

This is twtxt search engine and crawler. Please contact Support if you have any questions, concerns or feedback!