David Ofelt Juniper Networks
Performance estimation of computer systems is an important topic to a large number of people in the computer industry. Computer architects need to be able to study future machines, compiler writers need to be able to evaluate the compiler output before a machine exists, and developers need insight into the machine's performance in order to tune their code. There are many performance estimation techniques that range from profile-based approaches to full machine simulation. Detailed simulation is the most common method for estimating performance. It suffers, however, from potentially long run times when simulating large applications using detailed processor models. This paper discusses a profile-based performance estimation technique that uses a lightweight instrumentation phase that runs in order number of dynamic instructions, followed by an analysis phase that runs in roughly order number of static instructions. This technique accurately predicts the performance of a detailed out-of-order issue processor model while scheduling far fewer instructions than does full simulation. The difference between the predicted execution time and the time obtained from full simulation is only a few percent. This paper illustrates how this approach improves on earlier profile based analysis methods especially for the more advanced processor pipelines and illustrates how future processor trends will need new approaches.