Skylake server processors sport an average 65% performance boost over its prior Broadwell chips, according to Intel, while top-end versions of the new Xeon Scalable family nudge ahead of rival AMD’s recently released Epyc CPUs in performance but don’t pack as much I/O.

The results suggest that Intel will have no problem maintaining its dominance in the lucrative data centre. Nevertheless, AMD’s Epyc and a rising tide of ARM-based server chips from Qualcomm and others are expected to find significant footholds in the broad and diverse cloud computing sector.

Platinum 8180 and 8160 versions of Skylake edged AMD’s Epyc 7601 by 2% to 28% in performance and by 12% to 22% in performance/watt on the Specint_rate2006 benchmark. The results could be skewed by Intel’s tendency to use optimised compilers for its benchmarks compared to standard ones that AMD uses.

The high-end 8100 series packs 28 cores running at up to 3.6GHz with up to 48 PCIe 3.0 lanes and six channels of DDR4-2666 memory. By contrast, AMD’s high-end Epyc packs up to 32 cores and all nine of its family support 128 PCI Express 3.0 lanes and eight DDR4-2666 channels.

Intel showed tests with two dozen companies, each with different workloads. Results ranged from Skylake beating Broadwell chips by 1.4x for Ansys manufacturing software to 2.2x for apps using Skylake along with Intel’s proprietary Optane solid-state memory drives.

The results are “impressive … the increase over Broadwell is much better than they have had in typical generations, especially when you consider [that] these are both 14nm parts,” said Nathan Brookwood, principal of market watcher Insight64.

20170718_EETI_Intel-Skylake_01 (cr) Figure 1: Skylake edges past AMD's Epyc, but Intel's optimised compilers may skew results in its favour. (Source: Intel)

“AMD got 25% of the server market when it had a vastly superior product with Opteron, but I don’t think Epyc is vastly better than Skylake,” said Brookwood.

Last month, AMD showed a range of benchmarks for Epyc that averaged around 45% more performance than Broadwell. However, the server sector includes a wide range of markets and requirements, many where Intel will have an edge and a few where AMD may score hits.

For example, AMD hopes to use its advantage in PCIe and DDR4 to replace dual-socket Broadwell with single-socket Epyc servers. However, Skylake’s new AVX-512 vector processing extensions far outstrip Epyc’s abilities in floating-point intensive jobs.

Architecturally, Skylake uses a single processor die with a separate I/O chip. Epyc packs four die in a package including I/O, giving AMD greater flexibility and lower cost at the expense of latency in some operations.

Intel has already shipped more than 500,000 of the chips, which are already running in data centres at Alibaba, Amazon, AT&T and Google. They are in use at more than 30 customers, including a system in Barcelona ranked as the world’s 13th fastest supercomputer.

20170718_EETI_Intel-Skylake_02 (cr) **Figure **2: Mainstream applications generally scored lower than more rarified HPC benchmarks in Intel’s calculations of average performance. Intel planned to update exact figures just prior to the Skylake launch.

A brief tour of what’s new in Skylake

Skylake’s gains come from a laundry list of generally step-wise innovations, including an upgraded microarchitecture and expanded instruction set. The chips use a mesh network-on-chip that Intel says provides more bandwidth and more consistent low latencies than prior ring buses.

20170718_EETI_Intel-Skylake_03 (cr) Figure 3: Skylake's mesh on-chip network replaces Broadwell's dual ring.

AVX-512 doubles single- and double-precision performance to 64 and 32 flops/cycle, respectively, over the AVX2 on Broadwell. It does this while maintaining the same power levels and lowering frequency requirements of Intel’s previous chips.

The extensions support up to 85.33 INT8 and 64 FP32 operations/cycle per core, boosting performance on machine-learning training and inference jobs, said Intel, adding that Skylake gives a 3.4x boost over Broadwell in integer general matrix multiply tasks.

Rather than expanding cache size, Intel revamped its approach to caching. Thus, the chips use slightly less memory but they are better optimized for data centres.

The companion I/O chip for Skylake, called Lewisburg, supports four 10G Ethernet ports compared to a single GE port for the Broadwell I/O chip. It is also the first to integrate the crypto and compression functions that Intel calls its Quick Assist technology.

Intel also boosted its processor bus, now called the Ultra Path Interconnect, from 9.6 to 10.4 GTransfers/second. It put up to three of the links on high-end chips.

20170718_EETI_Intel-Skylake_04 (cr) Figure 4: AVX-512 doubles floating-point performance while holding down power consumption and frequency.

The Xeon Scalable family consists of nearly 50 versions made in different variants of Intel’s 14nm process. Prices range from nearly ₹6.12 lakh ($9,000) for eight-socket versions to about ₹27,197.93 ($400) for entry-level parts.

They range from consuming 205 to 70W. The low-end bronze 3100 series uses up to eight cores running at 1.7GHz, supporting DDR4-2133 but not dual threading.

A handful of the new devices put Intel’s Omnipath interconnect in the same package as the processor for high-performance computing. Intel is sampling versions that put an FPGA in the package but won’t ship the products until early next year.

20170718_EETI_Intel-Skylake_05 (cr) Figure 5: Compared to Broadwell, Skylake sports a larger branch predictor, higher throughput decoder, more load/store bandwidth, deeper load/store buffers and an improved scheduler, execution engine and prefetcher.

Your Skylake secret decoder ring

20170718_EETI_Intel-Skylake_06 (cr)

Figuring out which of nearly 50 versions of Skylake (below) that your system should use may require a decoder ring for product names, which Intel provided (above).

20170718_EETI_Intel-Skylake_07 (cr)

First published by EE Times U.S.