An Analysis of the CPU2K Benchmarks on the Intel Itanium 2 Processor

Wednesday, March 5, 2003 - 17:30
TH 331
Allan D. Knies Intel Corporation
The Itanium 2 processor is the second processor in the Itanium Processor Family and provides approximately twice the performance of the original Itanium processor. While this is not unusual for a subsequent generation processor to get such a big improvement, this is unusual when both chips are produced on the same process generation and have approximately the same die size. HP has reported that their Itanium 2 server achieves a score of 810 on SPECintbase_2000 --- higher than any 0.18u micron processor on the SPEC website (as of July 9). To help explain how it achieves its excellent performance (810 int/1356 FP), we analyze how different compilers use the Itanium architecture features and specific characteristics of the Itanium 2 processor microarchitecture. In the first part of talk, we will show how the Intel and HP compilers make use of Itanium architecture features to optimize application performance. We include detailed data regarding instruction set mixes, branch prediction and predication, software pipelining, control and data speculation, and the register stack engine. The analysis provides data from both the Intel and HP Itanium-based compilers and shows that both compilers find instruction level parallelism of nearly 3 instructions per clock. The results show that predication, speculation, and the register stack combined provide substantial benefits. The results also show that independently developed compiler technology achieves good results and that substantial value can be added based on OS policies and implementation. In the second part of the talk, we examine detailed behavior of the Itanium 2 processor's execution resources, fetch bandwidth, and cache hierarchy to explain some of the benefits and trade-offs made during the Itanium 2 development. This analysis provides detailed breakdowns to show where time is being spent, what are the limiting factors in performance, and how the microarchitecture has been optimized for performance across a wide variety of applications (integer, floating-point, security, commercial). All of the data was gathered on real hardware using the Itanium 2's performance monitoring hardware under HP-UX and early versions of Microsoft's 64-bit OS. [Note: this presentation was originally co-authored and presented with James McCormick at HP]

Allan Knies is a Senior Computer Architect with Intel Corporation. He joined the IA-64 architecture team in 1995 to work on the first generation IA-64 processor specializing in architecture, compiler technology, and performance analysis.
He and his team are currently responsible for the definition and evolution of the IA-64 application architecture as well as investigating related issues with respect to compiler technology and performance evaluation of future IA-64 microprocessors.
He has a B.S. in Mathematics and Computer Science from Ohio University, an M.S. in Computer Science from Purdue University, and a Ph.D. from the School of Electrical and Computer Engineering at Purdue. He is a member of Phi Beta Kappa and Eta Kappa Nu.