Bifrost boosts graphics, bridges to machine learning

Article By : Peter Clarke

Bifrost includes maths capabilities that could be used by other software as part of a heterogeneous system architecture, possibly including neural network software.

ARM has rolled out a new architecture, Bifrost, to support the Vulkan API from the industry-run Khronos Group.

The architecture includes maths capabilities that could be used by other software as part of a heterogeneous system architecture. That could include neural network software but ARM executives stressed that Bifrost is first and foremost an architecture for raster, tile-based graphics processing units (GPUs).

The previous architecture—Midgard—is the one that underlies ARM's T-series Mali GPUs and has up to 16 unified shader cores and SIMD [single-instruction multiple data] instruction set architecture. Bifrost supports up to 32 unified shader cores with a scalar ISA, full hardware cache coherency and something called clause execution.

The primary goal, according to Sean Ellis, GPU architect with ARM, was to achieve more performance per square millimetre of silicon and per line of "real-world" shader code. And this has been achieved to tune of about 50 per cent through the use of a new scalar, clause-based ISA, with quad-based arithmetic units

[EETI Bifrost 01]
__Figure 1:__ *Top level architecture of Bifrost showing up to 32 universal shader cores. (Source ARM)*

Whereas Midgard GPUs use SIMD vectorisation, Bifrost GPUs will use quad vectorisation in which four scalar threads from a 2 by 2 pixel are executed in lock step. Each thread fills one 32-bit lane of the hardware and four threads doing a vec3 FP32 add takes three cycles. In short, quad-vectorisation is compiler friendly and improves resource utilization.

Clause execution is another refinement that is used to reduce overhead compared with the previous graphics architecture. A "clause" is defined as a sequence of instructions that are self-dependent and without variable latency. Whereas previously temporary registers are used after every instruction under Bifrost an architecturally visible state through temporary registers is only guaranteed after each clause. The back-to-back execution of instructions within a clause allows for aggressive optimization and saves power. Clause boundaries are decided in the compiler, Ellis told journalists and analysts.

When asked if there was specific support within Bifrost for GPU-computing—where the GPU is used to run software to which it may be better suited than the CPU core cluster—ARM executives said that decisions had been taken to include support for a variety of data types that are not generally used in graphics. These include 8, 16 and 32bit integers as well as 16-bit floating point.

The FP16 can be used for some pixel shaders at twice the nominal throughput. Similarly Bifrost supports 64bit floating-point precision at half nominal throughput. Meanwhile the integer math and FP16 are useful for deep learning applications, Ellis said.

ARM has never been particularly keen on the raytracing approach to graphics rendering, which is a completely different approach to tile-based rendering. Indeed it acquired Geomerics Ltd. in 2013, a leader in software engines for lighting effects in software games. Ellis told EE Times Europe: "Ray tracing is not explicitly excluded [from Bifrost]. But we can do lighting, shadowing, glare effects in other ways."

Vulkan

Vulkan is a 3D graphics API for the next 20 years, said Jem Davies, ARM Fellow and vice president of technology for media processing. "Vulkan 1.0 was released in February with unprecedented support. It is available on the desktop in Windows and Linux and will be supported in upcoming N generation of the Android operating system.

"In 2014, the traditional 3D APIs were in trouble with unpredictable performance and the emergence of proprietary efforts such as Mantle, DX12." So a crash effort in a next-generation OpenGL initiative was launched. AMD donated its Mantle technology.

The major result is that under Vulkan more responsibility is given to the application making for a lower overhead driver. The driver handles memory allocation, resources, and thread management to generate command buffers. Vulkan is multithread and multicore friendly and error checking is opt-in, said Davies. "Vulkan is a great fit for mobile graphics architectures because there is no wasted effort trying to look like a desktop GPU," he added.

ARM already has Vulkan drivers for T880/T860/T760 and the Mali-G71 driver is ready and awaiting silicon.

[EETI Bifrost 02]
__Figure 2:__ *Inside the shader core showing quad-thread fragment management and execution engines. (Source: ARM)*

And progress continues with Vulkan 1.1 expected soon, said Davies. "I think we will see features added to further reduce power and bandwidth. Thermal throttling of processors is a big deal." Davies said that texture compression helps in this regard and AFBC [ARM Frame Buffer Compression] is becoming commonly supported but when asked if AFBC would be standardized within Vulkan 1.1 said: "We would welcome AFBC being established as standard but it's unlikely."

However, Vulkan 1.1 could also include further developments to support GPU-compute. "The GPU-compute voice is getting louder as time goes on," Davies said.

Virtual Event - PowerUP Asia 2024 is coming (May 21-23, 2024)

Power Semiconductor Innovations Toward Green Goals, Decarbonization and Sustainability

Day 1: GaN and SiC Semiconductors

Day 2: Power Semiconductors in Low- and High-Power Applications

Day 3: Power Semiconductor Packaging Technologies and Renewable Energy

Register to watch 30+ conference speeches and visit booths, download technical whitepapers.

Leave a comment