打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
Speed comparison of various data analysis software
Speed comparison of various number crunching packages (version 2)

Speed of execution is an important aspect in choosing a data analysis software. Since it can vary from a factor 10, or more, on the same computer, this can make the difference between a quick-reacting package and another one that seems to takes hours to calculate!

This is the second version of our benchmark tests, derived from Stephan Steinhaus' benchmark v. 2. You can find a (quite outdated) test with our first version here. The tests in our first version were scaled in such a way than each of them ran in about 1 second on the test machine (a Celeron 500Mhz with 256 Mb RAM under Windows 2000 professional) with our reference software: Matlab 6.0 (R12). For this second version, we decided to change the reference software to a freely available software. This way, everybody can download it and use it also as a reference in its own computer. We chose R version 1.6.2 with a standard, non processor-optimized ATLAS (Rblas.dll) library as our new reference. All tests are scaled in order to run in 1 +/- 0.1 sec in our new test computer: a Pentium IV 1.6 Ghz with 1 Gb RAM under Windows XP professional. Other changes from the original Steinhaus' benchmark are still the same as version 1: (1) we kept only tests that run on all checked software, (2) we ranged them in two categories ("matrix calculation" versus "matrix functions"), (3) we added "programming" category to evaluate how fast the software executes scripts, (4) we adapted or optimized tests to recent versions of the software, and (5) we considered only trimmed geometric means (worst and best results eliminated) inside each category and for the overall index. Note that Stephan Steinhaus' report evaluates also the "richness" of the packages (which functions are present, and which one are absent). Here, we only compare software for speed!

We have compared:

R 1.9.0, the latest version of our reference software, a rich and powerful free 'S language dialect' (R benchmark 2.3 script; text file, 13 Kb). Here, we use the Pentium IV-optimized ATLAS library (provided on CRAN), which gives slightly better results in some tests. We have not tested other optimized libraries, like Goto's one.
S-PLUS 6.1, the commercial equivalent of R (S-PLUS benchmark 2 script; text file, 10 Kb)
Matlab 6.0 (R12), our previous reference (download Matlab benchmark 2 script and accompanying gcd2.m custom function; text file, 10 Kb). Warning! This is not the latest version. At the time I write this (21 April
O-Matrix 5.6, a cheap but very fast package, that can run most Matlab scripts (O-Matrix native mode benchmark 2 & O-Matrix Matlab mode benchmark 2 scripts; text files, 10 Kb each)
Octave 2.1.42, a free "clone" of Matlab 4 (Octave benchmark 2 script; text file, 10 Kb). The version used was compiled with an optimized ATLAS library for the Pentium IV.
Scilab 2.7, a very complete free software, "not unlike" Matlab (Scilab benchmark 2 script; text file, 10 Kb)
Ox 3.30, a very efficient matrix package similar to Gauss and free for academic use (Ox benchmark 2 script; text file, 11 Kb)

Tests are:

I. Matrix calculation: evaluates the ability of performing some common matrix computations.

I.A: creation, transposition, deformation of a 1500x1500 matrix. This test evaluates the ability to create and manipulate matrices.
I.B: creation of a 800x800 normally distributed random matrix and taking the 1000th power of all its elements. Evaluates the speed at which a random matrix is processed element by element.
I.C: sorting of 2,000,000 random values. Tests the speed of a sorting operation.
I.D: 700x700 cross-product matrix (b = a' * a). Evaluates matrix operations.
I.E: linear regression over a 600x600 matrix (b = a \ b'). Tests the speed of execution for linear models evaluation.

II. Matrix functions: evaluates speed of some preprogrammed matrix functions.

II.A: fast Fourier transform over 800,000 values. Fourier transform is a commonly used method in signal processing.
II.B: eigenvalues of a 320x320 random matrix. Eigenvalues are used in multivariate analyses (PCA, ...).
II.C: determinant of a 650x650 random matrix. Calculation of the determinant of a matrix is a common, but unequally optimized, function in matrix calculation packages.
II.D: cholesky decomposition of a 900x900 matrix. Another commonly preprogrammed function.
II.E: inverse of a 400x400 random matrix. A computationally intensive function for which various algorithms exist (with very different performances).

III. Programming: evaluates efficiency to run scripts and custom functions.

III.A: 750,000 Fibonacci numbers calculation. This evaluates the speed of vector calculation.
III.B: creation of a 2250x2250 Hilbert Matrix. Evaluates performances in matrix calculation in scripts.
III.C: grand common divisors of 70,000 pairs. Tests potentials in using recursive functions.
III.D: creation of a 220x220 Toeplitz matrix. Check the speed of execution for loops.
III.E: Escoufier's method on a 37x37 random matrix. Tests various aspects of programming combined in a single test.

Note that tests III.A-E are not most optimized algorithms for each package, but they do test similar features in all of them. For instance, a matrix algorithm for test III.D is often much more efficient, as is a possibly preprogrammed toeplitz() function. Yet, we keep the loop algorithm in all cases... in order to test the speed of loops execution in scripts!

Results

The tests were run three times on a Pentium IV 1.6 Ghz computer with 1 Gb of memory under Windows XP professional and the mean value is recorded. The next table presents results:

Test (sec) R 1.9.0 S-PLUS 6.1 Matlab 6.0 O-Matrix
5.6 Ml mode
O-Matrix 5.6 native Octave 2.1.42 Scilab 2.7 Ox 3.30
I. Matrix calculation
I.A 1.49 3.03 0.48 0.69 0.58 2.01 1.19 0.74
I.B 0.43 1.37 0.42 0.53 0.62 1.22 0.70 0.94
I.C 0.87 2.38 0.89 0.98 0.98 7.77 2.00 1.97
I.D 0.26 0.72 0.73 0.19 0.30 0.35 8.58 0.45
I.E 0.26 1.33 0.24 0.17 0.14 0.78 2.11 1.04
Score 0.46 1.63 0.53 0.41 0.48 1.24 1.71 0.90
II. Matrix functions
II.A 1.01 1.62 0.48 0.99 1.05 0.96 1.78 3.06
II.B 1.25 0.96 0.86 0.41 0.49 2.30 2.44 1.78
II.C 0.30 0.41 0.27 0.13 0.14 1.02 2.27 0.71
II.D 0.24 1.92 0.33 0.11 0.12 0.21 1.96 0.36
II.E 0.14 1.48 0.23 0.07 0.06 0.47 1.67 0.35
Score 0.42 1.35 0.35 0.18 0.20 0.77 2.00 0.77
III. Programming
III.A 0.83 1.68 2.11 0.31 1.84 2.06 0.72 0.69
III.B 1.33 1.14 0.84 0.51 0.64 0.73 0.91 0.79
III.C 0.56 0.71 0.91 0.14 0.17 0.42 1.52 0.72
III.D 0.67 6.62 0.38 0.10 0.10 4.39 1.45 0.05
III.E 0.89 15.10 1.92 0.60 0.56 3.08 3.97 0.31
Score 0.79 3.15 1.14 0.28 0.39 1.67 1.26 0.54
Total 10.52 40.47 11.12 5.93 7.83 27.76 33.27 13.97
Overall 0.53 1.71 0.60 0.27 0.34 1.17 1.63 0.72

Comments

The higher the result (in seconds), the slower the test executes. Low values mean thus higher performances. Results lower than 0.50 (more than twice faster than the reference) are in green; result larger than 2.00 (more than twice slower than the reference) are in violet. We immediately see the progress made in R since version 1.6.2 (about 30% faster, but as much as four to seven times faster for some operations using the optimized libraries).

S-PLUS is a well-recognized standard in statistics, and it is the commercial counterpart of R. As we see here, it is much more slower than R under Windows (it takes four times more to complete all tests)! S-PLUS is well-know for its versatility, and for the ease of exploring statistical models in its environment. It excels in almost all fields of statistics. However, its limits are reached when working with huge datasets. In this case, SAS (not evaluated here) is considered to be faster, and thus more efficient, especially in loops programming where S-PLUS is desperately slow (test III.E)! However, S-PLUS propose alternatives: the For() function for optimized loops, and the apply() family of functions that "vectorize" loops. With middle-size matrices, as in the current test, it is easily outperformed by almost all the other software evaluated here.

Since R offers similar features than S-PLUS, a larger number of additional libraries (more than 300!), and is totally free, it is clearly an excellent choice for statistical analyses. This benchmark shows also that it is also quite good for "number crunching". Moreover, it runs on almost all platforms (Windows, Macintosh, Unix/Linux) and it has not the "loop problem" of S-PLUS (yet it also provides apply() and the like to accelerate loops). However, it does not propose (yet) the same nice user interface with menus and dialog boxes (GUI) as S-PLUS 6.1 does,... (though many professionals do not care about that because they prefer to use scripts and the command line for a finer control on their calculations). R becomes better and better with the successive releases. It is maintained and enriched by a very active community of developers. These are the reasons why we decided to promote it as a reference in our benchmark tests. 

Matlab 6 is a commercial standard in pure matrix calculation. It is significantly poorer in statistical models than S-PLUS or R, but it offers a wide range of high-quality toolboxes for specific applications (although, they increase the cost of this already very expensive software!). Concerning speed, it is about as fast as R 1.9.0. However, we did not tested the latest version, 6.5.1, that seems to provide some substantial increase in speed. As being one of the fastest, the richest, the most commonly used and having one of the best user interface, Matlab 6 deserves its status of leading product in matrix programming.  

Matlab has several contenders that propose a similar matrix language for a lower price (O-Matrix, Octave, Scilab). Among them, only one is fighting also on the performance level with Matlab 6.0: O-Matrix. Overall, O-Matrix is the fastest matrix computation package we have tested. It is much less expensive than Matlab, and it provides reasonable compatibility. However, O-Matrix does not propose the same range of specialized toolboxes and it runs only on Windows.

The two other "Matlab clones" (Octave & Scilab) are free open source software. Their performances are somewhat lower than Matlab 6.0 and better compare with Matlab 5.3 (see version 1 of the test). Octave aims to be fully compatible with the base version of Matlab 4.2. One should note that Octave runs under the cygwin emulation of Unix in Windows, and this has probably some negative impact on its pure performances. The Unix/Linux native version should run comparatively faster. Scilab proposes many more functions than Octave, but it is not 100% compatible with the Matlab language, and it is the slowest package of this comparison if we except the "loop problem" of S-PLUS.

Ox is a little apart. It is the only package that does not claim compatibility with one of the two standards previously cited: Matlab or S-PLUS. However, it is partly compatible with Gauss, another high quality commercial matrix calculation software regarded as a standard in econometry (not evaluated here, but you will find detailed tests in Stephan Steinhaus' report). It is one of the four software (with R 1.9.0, Matlab and O-Matrix) to be faster than R 1.6.2, that is, our reference software and version for this benchmark. It is particularly good for the execution of scripts (tests III). As it is a lightweight console application that can easily run scripts in batch mode, Ox is an excellent choice to shell matrix calculation scripts in various kind of applications. O-Matrix is even faster, but it is restricted to Windows systems.

Conclusions

The choice of a data analysis software is a difficult task. "Matrix languages" (like all the software we evaluated here) are very flexible because they are programmable and they are able to work very efficiently with matrices (by definition!) that are widely used in data analysis. However, they differ from each other in term of price, richness (the number of function provided), usability (including the quality of their user interface, their status of established standard or not, the quality of their support, their availability on different platforms like Windows, Macintosh, Unix or Linux), and finally, in term of their pure performances. We evaluated the latter here by using a benchmark suite of 15 tests. Considering results obtained with our benchmark (but beware of its limits: only few features were tested, and solely on a Windows platform!), one can conclude:

R is one of the fastest open source data analysis packages. Since it is free and provides many additional
packages for all kind of statistics, we warmly recommend it.
S-PLUS is slower and much more expensive, but it still offers a better graphical user interface.
Matlab is equally fast, rich and offers a well-designed user interface, but it is equally expensive.
O-Matrix is the fastest matrix language we have tested on Windows.
Currently, no free "clone" of Matlab is as fast as Matlab 6.0 itself.
Octave is language-compatible with Matlab, but not a top performer on Windows.
Scilab is a free alternative of Matlab for "richness" more than for performance. 
Ox is a very efficient matrix language, especially for batch process of scripts.
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
Feng Ji
计算机英语答案
C#的发展历程及应用范围
How Microsoft Lost the API War
An Introduction to Virtualization
CLR Inside Out: Using concurrency for scalability
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服