Efficient Performance Estimation for Advanced Microprocessors
Performance estimation of computer systems is an important topic to a large number of people in the computer industry. Computer architects need to be able to study future machines, compiler writers need to be able to evaluate the compiler output before a machine exists, and developers need insight into the machine's performance in order to tune their code. There are many performance estimation techniques that range from profile-based approaches to full machine simulation. Detailed simulation is the most common method for estimating performance. It suffers, however, from potentially long run times when simulating large applications using detailed processor models. This paper discusses a profile-based performance estimation technique that uses a lightweight instrumentation phase that runs in order number of dynamic instructions, followed by an analysis phase that runs in roughly order number of static instructions. This technique accurately predicts the performance of a detailed out-of-order issue processor model while scheduling far fewer instructions than does full simulation. The difference between the predicted execution time and the time obtained from full simulation is only a few percent. This paper illustrates how this approach improves on earlier profile based analysis methods especially for the more advanced processor pipelines and illustrates how future processor trends will need new approaches.
Dave did his undergraduate work in EE and Physics at the University of Minnesota and recceived his Ph.D. in EE from Stanford. While at Stanford, he helped finish the DASH machine by doing a lot of the grungy bits and then helped architect, design, and build the FLASH multiprocessor. He balanced his academic pursuits by consulting at several companies, including SGI and Transmeta. Leaving Stanford at last he went to work for Layer5, subsequently purchased by Juniper Networks, where he now works full time.