Quantcast
Viewing latest article 5
Browse Latest Browse All 18

NVIDIA –“[Intel] Should Get Their Facts Straight” on Machine Learning Benchmarks

NVIDIA responds to the machine learning benchmark results presented by Intel at ISC’16, “It’s great that Intel is now working on deep learning. This is the most important computing revolution with the era of AI upon us and deep learning is too big to ignore. But they should get their facts straight.” (Source: NVIDIA)

NVIDIA notes further that, “While we can correct each of their wrong claims, we think deep learning testing against old Kepler GPUs and outdated software versions are mistakes that are easily fixed in order to keep the industry up to date.” (Source: NVIDIA)

We expect to see many more benchmark comparisons now that  NVIDIA Pascal and the newest Intel Xeon Phi are available to 3rd parties.  Sign up for access to the newest Intel Xeon Phi on the Intel Machine Learning Portal. Now that Pascal is in the wild, we expect 3rd party NVIDIA Pascal benchmarks to soon become available as well.

The Intel benchmarks were announced at ISC’16.

Intel Furthers Machine Learning Capabilities

NVIDIA Information

Fresh vs Stale Caffe

Intel used Caffe AlexNet data that is 18 months old, comparing a system with four Maxwell GPUs to four Xeon Phi servers. With the more recent implementation of Caffe AlexNet, publicly available here, Intel would have discovered that the same system with four Maxwell GPUs delivers 30% faster training time than four Xeon Phi servers.

In fact, a system with four Pascal-based NVIDIA TITAN X GPUs trains 90% faster and a single NVIDIA DGX-1 is over 5x faster than four Xeon Phi servers.

38% Better Scaling

Intel is comparing Caffe GoogleNet training performance on 32 Xeon Phi servers to 32 servers from Oak Ridge National Laboratory’s Titan supercomputer. Titan uses four-year-old GPUs (Tesla K20X) and an interconnect technology inherited from the prior Jaguar supercomputer. Xeon Phi results were based on recent interconnect technology.

Using more recent Maxwell GPUs and interconnect, Baidu has shown that their speech training workload scales almost linearly up to 128 GPUs.

Image may be NSFW.
Clik here to view.
NVIDIA ML scaling

Source: Persistent RNNs: Stashing Recurrent Weights On-Chip, G.Diamos

Scalability relies on the interconnect and architectural optimizations in the code as much as the underlying processor. GPUs are delivering great scaling for customers like Baidu.

Strong-Scaling to 128 Nodes

Intel claims that 128 Xeon Phi servers deliver 50x faster performance compared with a single Xeon Phi server, while no such scaling data exists for GPUs. As noted above, Baidu already published results showing near-linear scaling up to 128 GPUs.

For strong-scaling, we believe strong nodes are better than weak nodes. A single strong server with numerous powerful GPUs delivers superior performance than lots of weak nodes, each with one or two sockets of less-capable processors, like Xeon Phi. For example, a single DGX-1 system offers better strong-scaling performance than at least 21 Xeon Phi servers (DGX-1 is 5.3x faster than 4 Xeon Phi servers).

For more information, see the NVIDIA blog:

Correcting Some Mistakes

Intel information

In comparison, TechEnablement has been running a sponsored series by Intel showing the benefits of the Intel Scalable System Framework (that includes Intel Xeon Phi). These articles are:

Faster Deep Learning with the Intel® Scalable System Framework: Next Generation Processors

How the Intel® OPA Fabric Facilitates Distributed Training

How Lustre and DAOS Enable Faster Deep Learning

How Intel® MPI Enables Scalable Distributed Machine Learning

 

The post NVIDIA – “[Intel] Should Get Their Facts Straight” on Machine Learning Benchmarks appeared first on TechEnablement.


Viewing latest article 5
Browse Latest Browse All 18

Trending Articles