SPEC CPU95

SPEC CPU95

Summary

This document provides an overview of SPEC CPU95; what it is and why it exists. By providing this background, SPEC hopes to help the user set their expectations and usage appropriately to get the most efficient and beneficial use out of this benchmark product.

Overall, SPEC designed SPEC CPU95 to provide a comparative measure of compute intensive performance across the widest range of hardware as possible. This resulted in source code benchmarks developed from real user applications. These benchmarks are dependent on the processor, memory and compiler on the tested system.

SPEC BACKGROUND

What is SPEC?

SPEC is an acronym for the Standard Performance Evaluation Corporation. SPEC is a non-profit organisation composed of computer vendors, systems integrators, universities, research organisations, publishers and consultants whose goal is to establish, maintain and endorse a standardised set of relevant benchmarks for computer systems. And while no one set of tests can fully characterise overall system

performance, SPEC believes that the user community will benefit from an objective series of tests which can serve as a common reference point.

What is a benchmark?

The definition from Webster's Dictionary states that a benchmark is "A standard of measurement or evaluation." A computer benchmark is typically a computer program that performs a strictly defined set of operations (a workload) and returns some form of result (a metric) describing how the tested computer performed. Computer benchmark metrics usually measure speed (how fast was the workload completed) or throughput (how many workloads per unit time were measured). Running the same computer benchmark on multiple computers allows a comparison to be made.

Why use a benchmark?

Typically, the best comparison test for systems is your own application with your own workload. Unfortunately, it is often very difficult to get a wide base of reliable, repeatable and comparable measurements for comparisons of different systems on your own application with your own workload. This might be due to time, money or other constraints. What options are viable in this case?

At this point, you can consider using standardised benchmarks as a reference point. Ideally, a standardised benchmark will be portable and maybe already run on the platforms that you are interested in. However, before you consider the results you need to be sure that you understand the correlation between your application/computing needs and what the benchmark is measuring. Are the workloads similar and have the same characteristics? Based on your answers to these questions, you can begin to see how the benchmark may approximate your reality.

What does SPEC CPU95 measure?

SPEC CPU95 focuses on compute intensive performance which means these benchmarks emphasise the performance of the computer's processor, the memory architecture and the compiler. It is important to remember the contribution of the latter two components; performance is more than just the processor.

Also, SPEC CPU95 is made up of two subcomponents that focus on two different types of compute intensive performance:

CINT95 for measuring and comparing compute-intensive integer performance, and
CFP95 for measuring and comparing compute-intensive floating point performance.

Note that SPEC CPU95 does not stress other computer components such as I/O (disk drives), networking, operating system or graphics. It might be possible to configure a system in such a way that one or more of these components impact the performance of CINT95 and CFP95, but that is not the intent of the suites.

Why use SPEC CPU95?

As mentioned above, SPEC CPU95 provides a comparative measure of integer and/or compute intensive performance. If this matches with the type of workloads you are interested in, SPEC CPU95 provides a good reference point.

Other advantages to using SPEC CPU95:

Benchmark programs are developed from actual enduser applications as opposed to being synthetic benchmarks.

Supported by the industry.
SPEC CPU95 is highly portable.
A wide range of results are available (contact SPEC for information on obtaining results).
The benchmarks are required to be run and reported according to a set of rules to ensure comparability and repeatability.

Note: It is not intended that the SPEC benchmark suites be used as a replacement for the benchmarking of actual customer applications to determine vendor or product selection.

What exactly makes up SPEC95 suites?

CINT95 and CFP95 are based on compute-intensive applications provided as source code. CINT95 contains eight applications written in C that are used as benchmarks:

Name	Ref Time	Remarks
099.go	4600	Artificial intelligence; plays the game of "Go"
124.m88ksim	1900	Moto 88K Chip simulator; runs test program
126.gcc	1700	New version of GCC; builds SPARC code
129.compress	1800	Compresses and decompresses file in memory
130.li	1900	LISP interpreter
132.ijpeg	2400	Graphic compression and decompression
134.perl	1900	Manipulates strings (anagrams) and prime numbers in Perl
147.vortex	2700	A database program

CFP95 contains 10 applications written in FORTRAN that are used as benchmarks:

Name	Ref Time	Remarks
101.tomcatv	3700	A mesh-generation program
102.swim	8600	Shallow water model with 1024 x 1024 grid
103.su2cor	1400	Quantum physics; Monte Carlo simulation
104.hydro2d	2400	Astrophysics; Hydrodynamical Navier Stokes equations
107.mgrid	2500	Multi-grid solver in 3D potential field
110.applu	2200	Parabolic/elliptic partial differential equations
125.turb3d	4100	Simulates isotropic, homogeneous turbulence in a cube
141.apsi	2100	Solves problems of temperature, wind, velocity and distribution of pollutants
145.fpppp	9600	Quantum chemistry
146.wave5	3000	Plasma physics; Electromagnetic particle simulation

Some of the benchmark names sound familiar; are these comparable to other programs?

Many of the SPEC benchmarks have been derived from publicly available application programs and all have been developed to be portable to as many current and future hardware platforms as possible. Hardware dependencies have been minimised to avoid unfairly favouring one hardware platform over another. For this reason, the application programs in this distribution should not be used to assess the probable performance of commercially available, tuned versions of the same application. The individual benchmarks in this suite may be similar, but NOT identical to benchmarks or programs with the same name which are available from sources other than SPEC; therefore, it is not valid to compare SPEC CPU95 benchmark results with anything other than other SPEC CPU95 benchmark results. (Note: This also means that it is not valid to compare SPEC CPU95 results to older SPEC CPU benchmarks; these benchmarks have been changed and should be considered different and not comparable.)

SPEC METRICS

What metrics can be measured?

The CINT95 and CFP95 suites can be used to measure and calculate the following metrics:

CINT95 (for integer compute intensive performance comparisons):

SPECint95: The geometric mean of eight normalized ratios (one for each integer benchmark) when compiled with aggressive optimization for each benchmark.
SPECint_base95: The geometric mean of eight normalized ratios when compiled with conservative optimization for each benchmark.
SPECint_rate95: The geometric mean of eight normalized throughput ratios when compiled with aggressive optimization for each benchmark.
SPECint_rate_base95: The geometric mean of eight normalized throughput ratios when compiled with conservative optimization for each benchmark.

CFP95 (for floating point compute intensive performance comparisons:

SPECfp95: The geometric mean of ten normalized ratios (one for each floating point benchmark) when compiled with aggressive optimization for each benchmark.
SPECfp_base95: The geometric mean of ten normalized ratios when compiled with conservative optimization for each benchmark.
SPECfp_rate95: The geometric mean of ten normalized throughput ratios when compiled with aggressive optimization for each benchmark.
SPECfp_rate_base95: The geometric mean of ten normalized throughput ratios when compiled with conservative optimization for each benchmark.

The ratio for each of the benchmarks is calculated using a SPEC-determined reference time and the run time of the benchmark.

What is the difference between a "base" metric and a "non-base" metric?

In order to provide comparisons across different computer hardware, SPEC had to provide the benchmarks as source code. Thus, in order to run the benchmarks, they must be compiled. There was agreement that the benchmarks should be compiled the way users compile programs. But how do users compile programs?

On one side, people might experiment with many different compilers and compiler flags to achieve the best performance. On the other side, people might just compile with the basic options suggested by the compiler vendor. SPEC recognises that it cannot exactly match how everyone uses compilers, but two reference points are possible:

The base metrics (ie., SPECint_base95) are required for all reported results and have set guidelines for compilation (i.e., the same flags must be used in the same order for all benchmarks). This is the point closest to those who simplyuse the recommended compiler flags for compilation.
The non-base metrics (ie., SPECint95) are optional and have ess strict requirements (ie., different compiler options may be used on each benchmark). This is the point closest to those who may experiment with different compiler options to get the best possible performance possible.

Note that the base metric rules are a subset of the non-base metric rules. For example, a legal base metric is also legal under the non-base rules but a legal non-base metric is NOT legal under the base rules.

A full description of the distinctions and required guidelines can be found in the SPEC CPU95 Run and Reporting Rules available with SPEC CPU95.

What is the difference between a "rate" and a "non-rate" metric?

There are several different ways to measure computer performance. One way is to measure how fast the computer completes a single task; this is a speed measure. Another way is to measure how many tasks a computer can accomplish in a certain amount of time; this is called a throughput, capacity or rate measure.

The SPEC speed metrics (i.e., SPECint95) are used for comparing the ability of a computer to complete single tasks.
The SPEC rate metrics (i.e., SPECint_rate95) measure the throughput or rate of a machine carrying out a number of tasks.

Which SPEC CPU95 metric should be used to compare performance?

It depends on your needs. SPEC provides the benchmarks and results as tools for you to use. You need to determine how you use a computer or what your performance requirements are and then choose the appropriate SPEC benchmark or metrics.

A single user running a compute-intensive integer program, for example, might only be interested in SPECint95 or SPECint_base95. On the other hand, a person who maintains a machine used by multiple scientists running floating point simulations might be more concerned with SPECfp_rate95 or SPECfp_rate_base95.