AHPCRC Projects

Project 4-1: Stream Programming for High Performance Computing
Principal Investigators: Alex Aiken, William Dally, Pat Hanrahan (Stanford)

sequoia machine model

  virtual levels
Sequoia machine model   Virtual levels
Graphics this page courtesy Alex Aiken, William Dally, Pat Hanrahan (Stanford University).

Parallel programming is an intrinsic part of high performance computing (HPC)—codes must be designed to run accurately, reliably, and efficiently on systems with tens to thousands of processors working cooperatively. An HPC programmer must design code that fits the specific characteristics of a given system architecture. Code that works especially well on one architecture may not achieve nearly the same level of performance on a system with a different size or structure. Conversely, programs written to be highly portable may not perform optimally on any system. The Sequoia language seeks to address this problem by allowing programmers to write code that is functionally correct on any system, then tune the performance to the characteristics of a specific system.

High-performance parallel architectures increase performance and efficiency by allowing software to manage a hierarchy of memories. Such systems consist of many processing elements operating in isolation, drawing data only from their own small, fast local memory devices. Data and code move between levels in the hierarchy as asynchronous block transfers explicitly orchestrated by the software. Programmers must build into the software the directives to move data between nodes at adjacent levels of the memory hierarchy. Explicit management of the memory hierarchy gives the programmer direct control over locality, allowing the programmer to improve performance by writing locality-aware programs.  

The Sequoia language places data movement and placement explicitly under the control of the programmer. Machine architecture is represented in the language as abstracted memory hierarchy trees. Self-contained computations called tasks are used as the basic units of computation. Tasks provide for the expression of explicit communication and locality, isolation and parallelism, algorithmic variants, and parameterization. These properties allow Sequoia programs to be portable across machines without sacrificing the ability to tune for performance. Sequoia programmers work with an abstract memory hierarchy, which does not depend on the specific memory sizes, number of computer nodes, or depth of a particular memory hierarchy. This allows a programmer a high degree of control over both the data and the parallel computation without tying a program to a particular machine architecture.

The portions of the program that deal with the higher-level code common to all machines is kept separate from the part that handles machine-specific mapping and optimization. Programmers can control all details of mapping an algorithm to a specific machine, including defining and selecting values for the tunable parameters. The Stanford team has designed and implemented a Sequoia “autotuner” that automatically searches the space of tunable parameters for the highest performance combinations, relieving the programmer of specifying all but the most critical tunables. The autotuner has performed as well as or better than live programmers in all programs tested to date.

Sequoia syntax is an extension of the C++ programming language, with language constructs that make it easier to develop a parallel program that is “aware” of the memory hierarchy configuration in the machine on which it is running. Computations are localized to specific memory locations, and the language mechanisms describe communications among these locations.

A complete Sequoia programming system has been implemented that delivers efficient performance running Sequoia programs on  GPUs and distributed memory clusters. The first version of this system has been released, along with its documentation. Development work continues, and new capabilities and improved performance will be featured in subsequent releases.

Read more about this project...