Search Options
Search Type:
SearchField: by default all fields are searched
Sort by: Newest, Oldest, Most Relevant
**Show HN: Hlb-CIFAR10 0.2.0: New world record (~<12.38s) on single-GPU CIFAR10** Hello everyone, After recreating the accuracy/rough speed from David Page's implementation in hlb-CIFAR10 0.1.0 (18.1s on an A100, SXM4, Colab), it was down to some basic NVIDIA kernel profiling to figure out which operations were the long poles in the tent. Perhaps (somewhat?) unsurprisingly, the NCHW <-> NHWC thrash was the worst part, but unfortunately the GhostBatchNorm was a barrier even using the faster-on-Ampere channels\ ... ⌘ Read more

matched #26u2d5a score:13.57 Search by:
This is twtxt search engine and crawler. Please contact Support if you have any questions, concerns or feedback!