SPEC CPU95

Summary

This document provides an overview of SPEC CPU95; what it is and why it exists. By providing this background, SPEC hopes to help the user set their expectations and usage appropriately to get the most efficient and beneficial use out of this benchmark product.

Overall, SPEC designed SPEC CPU95 to provide a comparative measure of compute intensive performance across the widest range of hardware as possible. This resulted in source code benchmarks developed from real user applications. These benchmarks are dependent on the processor, memory and compiler on the tested system.

 

SPEC BACKGROUND

What is SPEC?

SPEC is an acronym for the Standard Performance Evaluation Corporation. SPEC is a non-profit organisation composed of computer vendors, systems integrators, universities, research organisations, publishers and consultants whose goal is to establish, maintain and endorse a standardised set of relevant benchmarks for computer systems. And while no one set of tests can fully characterise overall system

performance, SPEC believes that the user community will benefit from an objective series of tests which can serve as a common reference point.

What is a benchmark?

The definition from Webster's Dictionary states that a benchmark is "A standard of measurement or evaluation." A computer benchmark is typically a computer program that performs a strictly defined set of operations (a workload) and returns some form of result (a metric) describing how the tested computer performed. Computer benchmark metrics usually measure speed (how fast was the workload completed) or throughput (how many workloads per unit time were measured). Running the same computer benchmark on multiple computers allows a comparison to be made.

Why use a benchmark?

Typically, the best comparison test for systems is your own application with your own workload. Unfortunately, it is often very difficult to get a wide base of reliable, repeatable and comparable measurements for comparisons of different systems on your own application with your own workload. This might be due to time, money or other constraints. What options are viable in this case?

At this point, you can consider using standardised benchmarks as a reference point. Ideally, a standardised benchmark will be portable and maybe already run on the platforms that you are interested in. However, before you consider the results you need to be sure that you understand the correlation between your application/computing needs and what the benchmark is measuring. Are the workloads similar and have the same characteristics? Based on your answers to these questions, you can begin to see how the benchmark may approximate your reality.

What does SPEC CPU95 measure?

SPEC CPU95 focuses on compute intensive performance which means these benchmarks emphasise the performance of the computer's processor, the memory architecture and the compiler. It is important to remember the contribution of the latter two components; performance is more than just the processor.

Also, SPEC CPU95 is made up of two subcomponents that focus on two different types of compute intensive performance:

Note that SPEC CPU95 does not stress other computer components such as I/O (disk drives), networking, operating system or graphics. It might be possible to configure a system in such a way that one or more of these components impact the performance of CINT95 and CFP95, but that is not the intent of the suites.

Why use SPEC CPU95?

As mentioned above, SPEC CPU95 provides a comparative measure of integer and/or compute intensive performance. If this matches with the type of workloads you are interested in, SPEC CPU95 provides a good reference point.

Other advantages to using SPEC CPU95:

Benchmark programs are developed from actual enduser applications as opposed to being synthetic benchmarks.

Note: It is not intended that the SPEC benchmark suites be used as a replacement for the benchmarking of actual customer applications to determine vendor or product selection.

 

What exactly makes up SPEC95 suites?

CINT95 and CFP95 are based on compute-intensive applications provided as source code. CINT95 contains eight applications written in C that are used as benchmarks:

Name Ref Time Remarks
099.go 4600 Artificial intelligence; plays the game of "Go"
124.m88ksim 1900 Moto 88K Chip simulator; runs test program
126.gcc 1700 New version of GCC; builds SPARC code
129.compress 1800 Compresses and decompresses file in memory
130.li 1900 LISP interpreter
132.ijpeg 2400 Graphic compression and decompression
134.perl 1900 Manipulates strings (anagrams) and prime numbers in Perl
147.vortex 2700 A database program

CFP95 contains 10 applications written in FORTRAN that are used as benchmarks:

Name Ref Time Remarks
101.tomcatv 3700 A mesh-generation program
102.swim 8600 Shallow water model with 1024 x 1024 grid
103.su2cor 1400 Quantum physics; Monte Carlo simulation
104.hydro2d 2400 Astrophysics; Hydrodynamical Navier Stokes equations
107.mgrid 2500 Multi-grid solver in 3D potential field
110.applu 2200 Parabolic/elliptic partial differential equations
125.turb3d 4100 Simulates isotropic, homogeneous turbulence in a cube
141.apsi 2100 Solves problems of temperature, wind, velocity and distribution of pollutants
145.fpppp 9600 Quantum chemistry
146.wave5 3000 Plasma physics; Electromagnetic particle simulation

Some of the benchmark names sound familiar; are these comparable to other programs?

Many of the SPEC benchmarks have been derived from publicly available application programs and all have been developed to be portable to as many current and future hardware platforms as possible. Hardware dependencies have been minimised to avoid unfairly favouring one hardware platform over another. For this reason, the application programs in this distribution should not be used to assess the probable performance of commercially available, tuned versions of the same application. The individual benchmarks in this suite may be similar, but NOT identical to benchmarks or programs with the same name which are available from sources other than SPEC; therefore, it is not valid to compare SPEC CPU95 benchmark results with anything other than other SPEC CPU95 benchmark results. (Note: This also means that it is not valid to compare SPEC CPU95 results to older SPEC CPU benchmarks; these benchmarks have been changed and should be considered different and not comparable.)

 

SPEC METRICS

What metrics can be measured?

The CINT95 and CFP95 suites can be used to measure and calculate the following metrics:

CINT95 (for integer compute intensive performance comparisons):

CFP95 (for floating point compute intensive performance comparisons:

The ratio for each of the benchmarks is calculated using a SPEC-determined reference time and the run time of the benchmark.

What is the difference between a "base" metric and a "non-base" metric?

In order to provide comparisons across different computer hardware, SPEC had to provide the benchmarks as source code. Thus, in order to run the benchmarks, they must be compiled. There was agreement that the benchmarks should be compiled the way users compile programs. But how do users compile programs?

On one side, people might experiment with many different compilers and compiler flags to achieve the best performance. On the other side, people might just compile with the basic options suggested by the compiler vendor. SPEC recognises that it cannot exactly match how everyone uses compilers, but two reference points are possible:

Note that the base metric rules are a subset of the non-base metric rules. For example, a legal base metric is also legal under the non-base rules but a legal non-base metric is NOT legal under the base rules.

A full description of the distinctions and required guidelines can be found in the SPEC CPU95 Run and Reporting Rules available with SPEC CPU95.

What is the difference between a "rate" and a "non-rate" metric?

There are several different ways to measure computer performance. One way is to measure how fast the computer completes a single task; this is a speed measure. Another way is to measure how many tasks a computer can accomplish in a certain amount of time; this is called a throughput, capacity or rate measure.

Which SPEC CPU95 metric should be used to compare performance?

It depends on your needs. SPEC provides the benchmarks and results as tools for you to use. You need to determine how you use a computer or what your performance requirements are and then choose the appropriate SPEC benchmark or metrics.

A single user running a compute-intensive integer program, for example, might only be interested in SPECint95 or SPECint_base95. On the other hand, a person who maintains a machine used by multiple scientists running floating point simulations might be more concerned with SPECfp_rate95 or SPECfp_rate_base95.