Two and a half years ago, Xilinx only had a handful of data center applications, perhaps one-tenth the required critical mass. Then, Amazon AWS started offering F1 acceleration to reach a larger community of developers.

Now Xilinx needs to land a couple more Super 7 data centers to add momentum to the joint Intel/Xilinx win at Microsoft. A tall order, but I believe the pieces of the puzzle are starting to fall into place.

The Alveo U50 is a compact acceleration card targeting cloud and on-premises applications. It is part of a data-center-first strategy Xilinx CEO Victor Peng forged, focusing on applications that need about 10x the performance of a CPU at lower power and with lower latencies.

Adopters, however, must clear formidable development hurdles to realize these gains. Xilinx has a three-pronged strategy to surmount these barriers.

First, the company secured design wins at major cloud providers such as Amazon AWS, Baidu, and Alibaba. Then they began working with partners to develop software and hardware stacks that can be run in those cloud instances. Finally, Xilinx launched Alveo PCIe cards in late 2018 to ease adoption for applications that run on-premises.

The new U50 is based on an UltraScale+ FPGA equipped with 8 GBytes HBM2 on a low profile, half-length card. It consumes 75 watts while supporting CCIX, PCIe Gen 4, and 100GbE networking, initially targeting compute, storage, and low-latency networking applications.

Unlike some accelerators, FPGAs can optimize math precision to fit the job, which can significantly increase performance and efficiency. Combined with the ability to build custom memory hierarchies and data paths, FPGAs have begun to demonstrate very impressive performance in data centers, including Microsoft where millions of FPGAs are believed to have been installed.

Beyond the hardware, Xilinx has expanded its software stack for accelerators from a handful to nearly 30 applications and IP blocks in the last 30 months. This surge in features is due in part to the availability of FPGAs as a cloud service and in part to the successful rollout of the initial Alveo cards a year ago.

In benchmarks, Xilinx claims its products outperform an Intel CPU by about 4-20x--and are even 10x more efficient in performance/watt than a state-of-the-art Nvidia GPU on speech translation using Google’s Transformer NMT. Raw performance data on other machine-learning models (such as CNNs, RNNs, etc.) will be needed to assess the general applicability of this latter claim since GPUs are very fast but not always the most power efficient.

Xilinx also claimed a relative cost benefit of 40% for Hadoop acceleration and a whopping 8x advantage in video transcoding. The latter result was thanks to the exceptional performance of the NGCodec software that Xilinx liked so much, it bought the private company last month.

Karl Freund is a senior analyst at Moor Insights & Strategy.