Heterogeneous computing

From Wikipedia, the free encyclopedia - View original article

Jump to: navigation, search

Heterogeneous computing systems refer to electronic systems that use a variety of different types of computational units. A computational unit could be a general-purpose processor (GPP), a special-purpose processor (i.e. digital signal processor (DSP) or graphics processing unit (GPU)), a co-processor, or custom acceleration logic (application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA)). In general, a heterogeneous computing platform consists of processors with different instruction set architectures (ISAs).

The demand for increased heterogeneity in computing systems is partially due to the need for high-performance, highly reactive systems that interact with other environments (audio/video systems, control systems, networked applications, etc). In the past, huge advances in technology and frequency scaling allowed the majority of computer applications to increase in performance without requiring structural changes or custom hardware acceleration. While these advances continue, their effect on modern applications is not as dramatic as other obstacles such as the memory-wall and power-wall come into play.[1][2] Now, with these additional constraints, the primary method of gaining extra performance out of computing systems is to introduce additional specialized resources, thus making a computing system heterogeneous.[3][4] This allows a designer to use multiple types of processing elements, each able to perform the tasks that it is best suited for.[5] The addition of extra, independent computing resources necessarily allows most heterogeneous systems to be considered parallel computing, or multi-core (computing) systems. Another term sometimes seen for this type of computing is "hybrid computing".[6] Hybrid-core computing is a form of heterogeneous computing wherein asymmetric computational units coexist with a "commodity" processor.

The level of heterogeneity in modern computing systems gradually rises as increases in chip area and further scaling of fabrication technologies allows for formerly discrete components to become integrated parts of a system-on-chip, or SoC. For example, many new processors now include built-in logic for interfacing with other devices (SATA, PCI, Ethernet, RFID, Radios, UARTs, and memory controllers), as well as programmable functional units and hardware accelerators (GPUs, cryptography co-processors, programmable network processors, A/V encoders/decoders, etc.).


Common features

Heterogeneous computing systems present new challenges not found in typical homogeneous systems. The presence of multiple processing elements raises all of the issues involved with homogeneous parallel processing systems, while the level of heterogeneity in the system can introduce non-uniformity in system development, programming practices, and overall system capability. Areas of heterogeneity can include [7]:

ISA or instruction set architecture
Compute elements may have different instruction set architectures, leading to binary incompatibility.
ABI or application binary interface
Compute elements may interpret memory in different ways. This may include both endianness, calling convention, and memory layout, and depends on both the architecture and compiler being used.
API or application programming interface
Library and OS services may not be uniformly available to all compute elements.
Low-Level Implementation of Language Features
Language features such as functions and threads are often implemented using function pointers, a mechanism which requires additional translation or abstraction when used in heterogeneous environments.
Memory Interface and Hierarchy
Compute elements may have different cache structures, cache coherency protocols, and memory access may be uniform or non-uniform memory access (NUMA). Differences can also be found in the ability to read arbitrary data lengths as some processors/units can only perform byte-, word-, or burst accesses.
Compute elements may have differing types of interconnect aside from basic memory/bus interfaces. This may include dedicated network interfaces, Direct memory access (DMA) devices, mailboxes, FIFOs, and scratchpad memories, etc.

Heterogeneous platforms often require the use of multiple compilers in order to target the different types of compute elements found in such platforms. This results in a more complicated development process compared to homogeneous systems process; as multiple compilers and linkers must be used together in a cohesive way in order to properly target a heterogeneous platform. Interpretive techniques can be used to hide heterogeneity, but the cost (overhead) of interpretation often requires the use of just-in-time compilation mechanisms that result in a more complex run-time system that may be unsuitable in embedded, or real-time scenarios.

Heterogeneous computing platforms

Programming Heterogeneous Computing Architectures

Programming heterogeneous machines can be difficult since developing programs that make best use of characteristics of different processors increases the programmer's burden. Requiring hardware specific code to be interleaved throughout application code increases the complexity and decreases the portability of software on heterogenous architectures.[8] Balancing the application workload across processors can be problematic, especially given that they typically have different performance characteristics. There are different conceptual models to deal with the problem; for example, by using a coordination language and program building blocks (programming libraries and/or higher order functions). Each block can have a different native implementation for each processor type.[9] Users simply program using these abstractions and an intelligent compiler chooses the best implementation based on the context.[10]

See also


  1. ^ IBM. "Cell Broadband Engine Programming Tutorial". http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/FC857AE550F7EB83872571A80061F788. Retrieved 2009-05-06.[dead link]
  2. ^ John Shalf. "The New Landscape of Parallel Computer Architecture". http://www.iop.org/EJ/article/1742-6596/78/1/012066/jpconf7_78_012066.pdf. Retrieved 2010-02-25.
  3. ^ Michael Gschwind. "The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor". International Journal of Parallel Programming. http://domino.research.ibm.com/library/cyberdig.nsf/papers/1B2480A9DBF5B9538525723D0051A8C1/$File/rc24128.pdf.
  4. ^ Brodtkorb, André Rigland; Christopher Dyken, Trond R. Hagen, Jon M. Hjelmervik, Olaf O. Storaasli (May 2010). "State-of-the-Art in Heterogeneous Computing". Scientific Programming 18: 1–33. http://iospress.metapress.com/content/t2502035x8708hh1/?p=67327ef9301f4e5aa7b8362b4e2d6f3b&pi=0.
  5. ^ "Heterogeneous Processing: a Strategy for Augmenting Moore's Law". Linux Journal. http://www.linuxjournal.com/article/8368. Retrieved 2007-10-03.
  6. ^ "Visions for Application Development on Hybrid Computing Systems". http://rssi.ncsa.uiuc.edu/proceedings/posters/rssi07_12_poster.pdf. Retrieved 2009-02-09.
  7. ^ Brian Flachs, Godfried Goldrian, Peter Hofstee, Jorg-Stephan Vogt. "Bringing Heterogeneous Multiprocessors Into The Mainstream". 2009 Symposium on Application Accelerators in High-Performance Computing (SAAHPC'09). http://www.cs.utk.edu/~dongarra/WEB-PAGES/cscads-libtune-09/talk08-flachs.pdf.
  8. ^ Kunzman, D. M.; Kale, L. V. (2011). "Programming Heterogeneous Systems". 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum. pp. 2061. doi:10.1109/IPDPS.2011.377. ISBN 978-1-61284-425-1. edit
  9. ^ Siegfried Benkner, Sabri Pllana, Jesper Larsson Träff, Philippas Tsigas, Andrew Richards, Raymond Namyst, Beverly Bachmayer, Christoph Kessler, David Moloney, Peter Sanders (2012), "The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures", Advances in Parallel Computing, IOS Press 22: 361–368, http://www.booksonline.iospress.nl/Content/View.aspx?piid=30416
  10. ^ John Darlinton, Moustafa Ghanem, Yike Guo, Hing Wing To (1996), "Guided Resource Organisation in Heterogeneous Parallel Computing", Journal of High Performance Computing 4 (1): 13–23, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=