The Java (not really) Faster than C++ Benchmark |
Last update: Tue Jun 27 00:28:45 CEST 2006
I‘ve kept the original number of repetitions in all tests.
Time was measured in the same way as in the original results, with one exception: the time used as benchmark result was elapsed real (wall clock) time used by the process. If we stick to the original method of measurement, the results will be incorrect and biased towards Java - because of threads used by the JVM. Wall clock time is better and more accurate if the machine operates under no load - as was the case here.
Java was compiled with standard settings, using Java compiler from Sun in the same version as the JVM that was used later to do the actual benchmark.
To run hash
, heapsort
and strcat
tests I had to increase the heap size. It‘s pretty much standard practice for enterprise software, so I do not see it as a drawback of Java.
When looking at C++ code, I‘ve noticed many performance problems, which probably may go unnoticed for people with strong Java background. In other words, to make this benchmark fair, I had to make some modifications to the original code. In doing so, I tried to make the code as close to the original as possible, even if my personal coding style is completely different.
For C++ compilation, I‘ve used the following options:
-O2 -fomit-frame-pointer -finline-functions -march=athlon-xp
I consider the first two options standard for compilation (all major Linux distributions, Linux kernel and many others use these two for compilation).
-finline-functions
is used to tell the compiler to guess which functions should be inlined (Java Server VM also does that, that‘s one of the reasons for which it performs better than Client VM).
Athlon-XP - well, that‘s my machine. I could use pentium instead (which could be considered more standard), but it would not change the overall result - that, is C++ being the clear winner. So, let‘s get down to business, shall we?
No changes to the code here. The clue to improving performance of C++ lies in Java results. Client VM does not finish the test - stack overflow ocurs. Server VM passes the test and the result is much better than C++. The easiest answer is that Server VM does not recurse as much as Client VM, because of inlining. In GCC you can enable automated inlining by adding -finline-functions
option. Since the recursion is very deep here, this program will benefit a lot. This option was used in all tests, not only here.
Note that it is possible to make this program even faster (up to twice as fast) if you enable more aggressive inlining in GCC. I decided against doing it, but in real life, if you know that your program does a lot of recursion it may be worth a try.
No changes here.
Both of these have the same problem. If you use map in C++, you use a TreeMap equivalent, not HashMap equivalent. TreeMaps are much worse (of course) when it comes to performing (huge amounts) of insertions on them. In C++, if you want hash maps, you have to use hash_map class (obsolete) or unordered_map (soon to be part of C++ standard, already available in GCC 4.x). I‘ve rewritten both of these to use unordered_map just to stay on the bleeding edge ;-) . Performance results are similar with both hash_map and unordered_map. If you look closely at C++ code you will notice that a benchmark called "Java prettier that C++" would make an interesting read.
No changes here.
No changes here.
OK. So what this test actually does is it calls two methods 1 000 000 000 times each. Java uses references to access objects, so I‘ve changed C++ code to use references too.
I also inverted the loop direction, since comparison against 0 is faster - normally I would leave it as it was, but since this was the only benchmark in which Java Server came close to C++, I had to do something ;-) . Oh, and inverting the loop direction has no effect whatsoever on Java, so I left the code as it was.
If you look at the results, you will notice that only the poor Java Client VM actually tried to invoke the methods - both Java Server and GCC inlined the methods, which resulted in a much better performance.
No changes here.
Before I explain the changes, a word of explanation about differences in memory management between Java and C++. (What follows is all very simplified). Java allocates new memory blocks on it‘s internal heap (which is allocated in huge chunks from the OS). In this way, in most of the cases it bypasses memory allocation mechanisms of the underlying OS and is very fast. If you allocate memory on heap in C++, each allocation request will be sent to the operating system, which is slow. That‘s why Java won in the original benchmark.
The thing is, in C++ you don‘t have to use the default allocation algorithm - you have other options. You can either change the allocation behaviour in C++ to allocate memory exactly in the way Java does (which will be very fast), or allocate memory on the stack (which is more or less similar to the way memory management in Java works, if you grossly oversimplify the problem).
If you look at the code which allocates code on stack, you will notice that it is much simpler than the original version. And if you look at the results you will notice that it‘s much faster, too. Actually, so fast, that it didn‘t even register on the graph ;-)
No changes here.
Rewritten. I think that original code was very unfair for C++:
I‘ve replaced vector with an array (like in Java), removed the iterators (obviously) and removed the final count() call. The code is now more or less equivalent to Java version.
I simplified the code, because it basically did what string implementation already does internally. I also removed the reserve() call from code - although keeping it makes C++ faster, I think it‘s cheating. If you actually reserve the string size (the way original code does it), you will get 20% performance increase. You cannot do this in Java. See, I am fair.
I added ios_base::sync_with_stdio() call to make sure that C++ does not keep the stream in sync with C functions all the time - no reason to do it. The code can be made faster by 50% if C routines (fgets) are used to access stdin instead of C++ streams. I decided not to do it to keep the code as close as possible to the original version.
Only sync_with_stdio was added. It does improve the results a bit, but C++ was already fast anyway.
Every test ended with C++ performing better than an equivalent Java program.
The most important conclusion is obvious. (For this set of benchmarks,) C++ is clearly the winner.
Second conclusion - don‘t use Client VM in Java.
Unfortunately, there‘s also a third conclusion. It seems that it‘s much, much easier to create a well performing program in Java. So, please consider it for a moment before you start recoding your Java program in C++ just to make it faster...
If you compare 1.6.0 Beta 1 and 1.6.0 Beta 2, performance improvement is easily noticeable. Beta 2 completed the tests nearly 30 seconds faster than Beta 1. It‘s still slower than 1.5.0, but it‘s getting there.
Please note that I had to increase the stack size in Ackermann test for Java 1.6.0 - otherwise neither of the JVM versions would finish the test. This further supports the theory about limited inlining.
Przemyslaw Bruski, SCJP - mail me at [pbruskispam "at" op "dot" pl]
联系客服