Bringing Feng Shui to HPEC System Architecture

9 October 2013
milaero-blog.png

One school of thought in designing processing elements for HPEC systems is that bigger is always better—more cores, more memory, more bandwidth, etc. While this is broadly true, it is important to take a balanced view of things. This is especially true when some architectural elements are maximized at the expense of others. It is rare to be able to maximize all design parameters simultaneously, which can lead to imbalance and architectures that do one thing well, but others only acceptably. In computer architecture, just as in life, harmony and balance are desirable.

Take embedded computing boards designed for HPEC systems, for example. Given the very high performance expected from HPEC systems, you might expect that, for our latest products—such as the DSP281 multiprocessor—we’d choose server-class components and, especially, server-class processors from Intel. They have to be the best choice—right?

Well, not necessarily. For the most part, we at GE Intelligent Platforms favor designs that use ball grid array (BGA) devices that can be soldered down, allowing us to perform the magic that makes it possible for commercial or industrial grade silicon to survive reliably in harsh environments with extremes of shock, vibration and temperatures.

This leads to some component selections that at first blush appear to be compromises—but are they really? As an example, take the current crop of Intel-based multiprocessor designs. Given what our customers expect from our boards in terms of robust ruggedness and absolute reliability, BGA is “must have.” But going down the BGA path restricts our choices to mobile chipsets for the most part—that is to say, devices primarily designed for applications in laptops and the like.

This means forgoing some of the goodies that are available in the server-class product lines. Server processors can add such features as more and faster memory and inter-processor connections that enable symmetric multiprocessor (SMP) designs. The memory enhancements can potentially increase system performance in applications that are memory-bound (either by data size or by the ratio of memory accesses to compute operations). SMP can sometimes simplify software design by allowing two or more multicore processors to appear to the operating system and applications as one processor with a core count that is the sum of the individual processors.

The tradeoff is that server-class devices typically only come in land grid array (LGA) packages that are intended to be affixed to the motherboard by sockets—not generally a very rugged arrangement—and prone to issues under vibration. The alternative approach of having a third party convert LGA devices to pseudo-BGA, is seen as risky and is frowned upon by many integrators.

So, it would seem that, by prioritizing ruggedness and reliability, we’re sacrificing performance. Certainly, the following table appears to confirm that to be the case, showing as it does some relevant parameters for the latest Intel Core i7 and Xeon® processors that are being used in embedded systems. The launch of Xeon LGA devices tends to lag that of mobile BGA processors, so currently boards are available with 4th generation Core i7s, and 3rd generation Xeons.

Capture

It certainly seems to support the argument that more is better. However, if we look through a lens that focuses on what is important in mil-aero systems, a different picture emerges.

First: SWaP-C is of primary concern to most programs—so in addition to raw processing rate, we should also factor for performance per watt and per dollar (the P and C of SWaP-C, respectively).

Second: When it comes to assessing bandwidth, we should factor for the number of cores that have to share each link.

The number of cores offered by the Xeon processor is offset by more operations per cycle courtesy of the newly added fused multiply-add (FMA) function in the AVX2 instruction set.

The following chart compares factored parameters for two board types that represent the state of the art for dual processor 6U OpenVPX platforms. The parameters are normalized to the larger of the two under comparison. Further from the graph origin is better.

milaero blog

As we can see, from this perspective, mobile matches or beats out server class in all but one measurement.

This performance is reflected by that of the DSP281, our latest HPEC dual-processor board, which combines two quad core 4th generation i7s with two Mellanox ConnectX-3 network interfaces to provide best-of-breed SWaP-C. The new AVX2 instruction set can be harnessed by using the AXISLib-AVX 2.0 math library.

“Compromise” is, for many people, a dirty word, as it implies that something, somewhere is not as good as it could or should be. And as we’ve seen, prioritizing ruggedness and reliability, as GE Intelligent Platforms does, appears to involve a compromise in processing power, because server-class silicon—which doesn’t have the characteristics required for reliable operation in harsh environments—seems, on the surface, to offer more in terms of performance. But, as we’ve seen, performance is not an absolute—it’s relative to the requirements of the application. Mil/aero applications aren’t server applications. They have their own unique requirements and characteristics—and, as it turns out, the mobile chipset with its superior resistance to shock and vibration, delivers a very similar level of performance in mil/aero applications to what could be expected from Xeon-class devices.

So, next time you evaluate multiprocessor boards for integration into a system that must perform all day, every day in critical deployments, don’t forget to consider balance. Feng shui is, after all, about creating harmony with the surrounding environment.