DAWNBench Makes Way for MLPerf

Article By : Sally Ward-Foxton

Stanford AI accelerator benchmark steps aside to consolidate benchmarking efforts.

DAWNBench, the AI accelerator benchmark, is being retired to make room for MLPerf, according to its creators. DAWNBench will stop accepting rolling submissions on 3/27 in order to help consolidate benchmarking efforts across the industry.

Created as part of the five-year DAWN project at Stanford, DAWNBench launched in 2017 and was the first benchmark to compare end-to-end training and inference across multiple deep learning frameworks and tasks. It allowed optimizations across model architectures, optimization procedures, software frameworks and hardware platforms.

DAWNBench offered benchmark specifications for image classification and question answering, with systems benchmarked on accuracy, computation time and cost (previous AI accelerator benchmarks had focused purely on accuracy). Submissions to the benchmark came from companies such as Alibaba, Huawei, Myrtle and Apple.

Nvidia Tesla V100 Tensor Core GPU

Nvidia’s Tesla V100 Tensor Core GPU was a popular hardware choice for systems benchmarked by DAWNBench (Image: Nvidia)

MLPerf was directly inspired by DAWNBench, but considers more tasks, models and scenarios. MLPerf’s creators say that unlike previous machine learning benchmarks such as DAWNBench, their benchmark includes a range of scenarios designed to represent real-world scenarios, without too much focus on specific machine learning applications, such as computer vision, or specific domains, such as embedded inference.

“Building on our experience with DAWNBench, we helped create MLPerf as an industry-standard for measuring machine learning system performance. Now that both the MLPerf Training and Inference benchmark suites have successfully launched, we have decided to end rolling submissions to DAWNBench on 3/27/2020 to consolidate benchmarking efforts,” said DAWNBench’s creators Cody Coleman, Daniel Kang, Deepak Narayanan, Peter Bailis, and Matei Zaharia, in a blog post.

In the same blog post, DAWNBench’s creators note that results on ImageNet have seen particularly significant improvements over the last 2 years. ImageNet training time dropped from 30 minutes to under 3 minutes, and ImageNet inference latency dropped by 20x.

The other parts of Stanford’s DAWN project, which aims to create infrastructure and tools that make machine learning easier to use, are unaffected.


Leave a comment