The
Itanium 2 processor is the second processor in the Itanium
Processor Family and provides approximately twice the performance
of the original Itanium processor. While this is not unusual
for a subsequent generation processor to get such a big
improvement, this is unusual when both chips are produced
on the same process generation and have approximately the
same die size. HP has reported that their Itanium 2 server
achieves a score of 810 on SPECintbase_2000 --- higher than
any 0.18u micron processor on the SPEC website (as of July
9). To help explain how it achieves its excellent performance
(810 int/1356 FP), we analyze how different compilers use
the Itanium architecture features and specific characteristics
of the Itanium 2 processor microarchitecture.
In
the first part of talk, we will show how the Intel and HP
compilers make use of Itanium architecture features to optimize
application performance. We include detailed data regarding
instruction set mixes, branch prediction and predication,
software pipelining, control and data speculation, and the
register stack engine. The analysis provides data from both
the Intel and HP Itanium-based compilers and shows that
both compilers find instruction level parallelism of nearly
3 instructions per clock. The results show that predication,
speculation, and the register stack combined provide substantial
benefits. The results also show that independently developed
compiler technology achieves good results and that substantial
value can be added based on OS policies and implementation.
In
the second part of the talk, we examine detailed behavior
of the Itanium 2 processor's execution resources, fetch
bandwidth, and cache hierarchy to explain some of the benefits
and trade-offs made during the Itanium 2 development. This
analysis provides detailed breakdowns to show where time
is being spent, what are the limiting factors in performance,
and how the microarchitecture has been optimized for performance
across a wide variety of applications (integer, floating-point,
security, commercial).
All
of the data was gathered on real hardware using the Itanium
2's performance monitoring hardware under HP-UX and early
versions of Microsoft's 64-bit OS. [Note: this presentation
was originally co-authored and presented with James McCormick
at HP]