We
present metrics for computing differences between individual
benchmarks in a benchmark suite. The minimum difference
is 0 (for identical benchmarks) and the maximum difference
is 1 (or 100%, for the most different benchmarks). Individual
benchmarks can be geometrically interpreted as points in
a "program space". Redundant benchmarks are visualized as
clusters of close points.
Using
quantitative differences between individual benchmarks and
cluster analysis techniques it is possible to develop efficient
and practical quantitative methods for evaluation and design
of benchmark suites. The main design goals are to eliminate
excessive redundancy and to achieve a desired distribution
of benchmarks in the program space. In particular, we propose
the design of universal benchmark suites which have the
following properties: (1) uniform and low redundancy between
component workloads, (2) maximum size, and (3) uniform distribution
of workloads within the program space.
The
proposed approach to standard performance evaluation has
the following main advantages:
(1)
For each benchmark suite it is possible to provide a scientific
proof of the validity of selecting component benchmarks.
(2) The number of component benchmarks can be minimized.
(3) Standard performance indicators can be customized and
made workload-sensitive.
(4) The process of updating benchmark suites can be strictly
controlled, less frequent, faster, and less expensive. These
features can increase the credibility and versatility of
standard industrial benchmarks, and significantly reduce
the cost of benchmarking.