In Closing
We are seeing a significant evolution in the HPC industry, comparable to that we witnessed in the 1990s when the “attack of the killer micros” moved us to distribute memory systems from simple shared memory vector processors. At that time there was a tremendous programming challenge facing the application programmer to introduce some form of message passing to develop a program for a non-shared memory system. Many accepted the challenge and today applications are achieving performance gains that were inconceivable on shared memory vector systems. With the advent of new architectures to pursue the Exascale target, there are new challenges which will require a significant commitment from the application programmer to port and optimize their application for the new systems.
With lower performance multi-cores, algorithms need to be scaled to multi-million, multi-core processors and significant work will be necessary to address the memory and network latency and bandwidth concerns.
With the advent of viable accelerators we are essentially seeing a “back to the future” situation where algorithms must be developed that can utilize large parallel vector systems with memory accessing constraints.
One lesson we can derive from HPC history is that there is no “silver bullet”, there will be no software that auto-magically generates efficient code. While software systems will be available to help the application developer restructure their programs, the major work will be performed by the programmer. From history we can also re-visit parallelization and vectorization techniques that were pioneered on machines such as the CDC Star, Cray 1, Thinking Machines CM, Intel Paragon and Cray T3E.
Unfortunately, there is a new generation of programmers in the HPC community who probably are not aware of the large repository of excellent algorithm research that was conducted on the early machines. This author often used research from the Illiac IV era to develop parallel vector code for the Cray vector systems. These new accelerators will significantly benefit from the work performed for the SIMD Thinking Machines CM5, Cyber 205, NEC SX, Fujitsu VPP, and Cray vector systems. They will also benefit from the work performed on the MIMD shared memory systems from Cray, NEC and Fujitsu.
We are seeing a significant evolution in the HPC industry, comparable to that we witnessed in the 1990s when the “attack of the killer micros” moved us to distribute memory systems from simple shared memory vector processors. At that time there was a tremendous programming challenge facing the application programmer to introduce some form of message passing to develop a program for a non-shared memory system. Many accepted the challenge and today applications are achieving performance gains that were inconceivable on shared memory vector systems. With the advent of new architectures to pursue the Exascale target, there are new challenges which will require a significant commitment from the application programmer to port and optimize their application for the new systems.
With lower performance multi-cores, algorithms need to be scaled to multi-million, multi-core processors and significant work will be necessary to address the memory and network latency and bandwidth concerns.
With the advent of viable accelerators we are essentially seeing a “back to the future” situation where algorithms must be developed that can utilize large parallel vector systems with memory accessing constraints.
One lesson we can derive from HPC history is that there is no “silver bullet”, there will be no software that auto-magically generates efficient code. While software systems will be available to help the application developer restructure their programs, the major work will be performed by the programmer. From history we can also re-visit parallelization and vectorization techniques that were pioneered on machines such as the CDC Star, Cray 1, Thinking Machines CM, Intel Paragon and Cray T3E.
Unfortunately, there is a new generation of programmers in the HPC community who probably are not aware of the large repository of excellent algorithm research that was conducted on the early machines. This author often used research from the Illiac IV era to develop parallel vector code for the Cray vector systems. These new accelerators will significantly benefit from the work performed for the SIMD Thinking Machines CM5, Cyber 205, NEC SX, Fujitsu VPP, and Cray vector systems. They will also benefit from the work performed on the MIMD shared memory systems from Cray, NEC and Fujitsu.