Home > R > Now, I am happy

Now, I am happy

Many sources [here and here] have been discussing the performance benefit optimized BLAS/Lapack and I wanted to try directly.

I wanted to give it a try as I am using R for some heavy duty matrix calculations.

I downloaded the MKL Intel Blas/Lapack libraries and installed it. This went very smooth.

I then tried to follow Michael’s suggestions to compile R 2.11.1 (well, it’s suggestions are about 2.10). It compiled fine, but I could not get R to use more than one core (no matter how hard I tried!). To get a multithreaded version of R I had to disable compilation of R as a shared library. It’s probably better not to compile R as a shared library anyway (see here for details).

This is my configuration (including compiler flags and commands)

$ gcc -v
gcc version 4.4.0 20090514 (Red Hat 4.4.0-6) (GCC)
export FFLAGS="-march=core2 -O3"
export CFLAGS="-march=core2 -O3"
export CXXFLAGS="-march=core2 -O3"
export FCFLAGS="-march=core2 -O3"
MKL_LIB_PATH=/opt/intel/mkl/10.2.5.035/lib/em64t
export LD_LIBRARY_PATH=$MKL_LIB_PATH
export LDFLAGS="-L${MKL_LIB_PATH},-Bdirect,--hash-style=both,-Wl,-O1"
MKL="-L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_lapack -lmkl_core -liomp5 -lpthread"
./configure --with-blas="$MKL" --with-lapack
make
make check

Benchmark

The simple R-benchmark-25.R test script is a quick-running survey of general R performance. The Community-developed test consists of three sets of small benchmarks, referred to in the script as Matrix Calculation, Matrix Functions, and Program Control.

See Simon Urbanek’s notes about benchmarking and a reference benchmark.

Below, the results of my new R with MKL:


R Benchmark 2.5
===============
Number of times each test is run__________________________: 3

I. Matrix calculation
---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec): 0.517999999999953
2400x2400 normal distributed random matrix ^1000____ (sec): 0.68966666666673
Sorting of 7,000,000 random values__________________ (sec): 1.0783333333333
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 0.605999999999995
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.260333333333392
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.600455102863175

II. Matrix functions
--------------------
FFT over 2,400,000 random values____________________ (sec): 0.541333333333303
Eigenvalues of a 640x640 random matrix______________ (sec): 0.892333333333378
Determinant of a 2500x2500 random matrix____________ (sec): 0.28133333333335
Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.231333333333358
Inverse of a 1600x1600 random matrix________________ (sec): 0.332666666666645
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.370025581291834

III. Programmation
------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.01633333333333
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.431666666666653
Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.68333333333336
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.709333333333348
Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.58400000000006
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.749491068589986

Total time for all 15 tests_________________________ (sec): 9.85600000000015
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.55016494784761
--- End of test ---

Categories: R Tags:
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.