Quant development: HPCC Systems from LexisNexis Breaks World Record on Terasort Benchmark
HPCC Systems 4 nodes cluster sorts 100 gigabytes in 98 seconds and is 25% faster than a 20 nodes Hadoop cluster
that’s hardly surprising for people who have a sense about how computers actually work, but thanks for posting because there is mass of wannabe specialists who need to be reminded that doodling in Java is certainly not future of IT…
But 100GB is not a Terabyte!!!! You can’t claim you broke a record in the Terasort, unless you sort a terabyte.Sorry Jakub not suprising why HPCC and C programmers have big data envy. Us java guys have Hadoop, Cassandra, HBASE, Zookeeper, what you got? ::crickets::Ill be doodling with my multi-tb datasets you c guys keep toying with your 100GB ones!
Unlike Java guys with C, I can actually write whatever I want in Java :))
Just to show you how lame this benchmark is “comprised of one (1) Dell PowerEdge C6100 2U server with Intel® Xeon® processors E5675 series, 48GB of memory, and 6 x 146GB SAS HDD’s. The Dell C6100 houses four nodes inside the 2U enclosure”
HPCC used 192GB of memory to sort 100GB of data in 98 seconds.
If I took a machine with memcached or mysql on it and put 192 GB of ram on it. Lets see I can insert to memcache or mysql. It is sorted in memory and then I can just spill that data do disk.
Hadoop and TeraSort were not really designed to showcase how fast in memory workloads can be.
A hardware vendor (to remain unnamed here, but you can Google for their press release) claimed on October 17, 2011 that they “Establishes New World Record Apache Hadoop Benchmark” (sic). They used 20 nodes of their own make, with 48GB of RAM in each, and sorted 100GB of data in 130 seconds using Hadoop.
We just stepped up to the challenge, and following the instructions in the Sort Benchmark site, we repeated the exact same test using HPCC (flushed disk caches every time, used the correct data generation, etc.) but with only 4 nodes (one Dell c6100 chassis with 4 servers in it) and 48GB RAM in each node (same as the previous “leader”). Our benchmark took just 98 seconds.
We proved that, in 2 rack units, with only 4 nodes, the HPCC Thor platform was 25% faster than Hadoop, on 20 servers, using 40 rack units.
It is worth mentioning that this sort program in ECL is comprised of only 3 lines of code (versus the 700+ lines in the Hadoop examples repository, just for terasort and the serialization and de-serialization in this test).
Interesting fact is that both, Hadoop and HPCC are open source, so at an equivalent price point (free), a platform that runs 25% faster, takes 1/20 of the datacenter space and uses 1/5 of the nodes (1/5 of cooling and power) has a significantly better ROI.
Please let me know if you want to know more about this benchmark (or HPCC in general).
Thanks for explanation, HPCC products seems very interesting. FPGA based reconfigurable high performance systems are quite too expensive yet. So I think this is the way for mainstream today. Not the Hadoop and other monsters.
Now that you mention it, we have done some preliminary work to support CUDA (GPU’s) and are also actively engaged with an academic rooted startup, which is doing very cool HPCC integration with some sort of distributed FPGA/ASIC setup. Unfortunately, I can’t say more about the latter, due to a current NDA.
We are also exploring offloading some processing into Mellanox HCA’s, for those willing to use Infiniband instead of Ethernet interconnects.
Than you might be interested in our reconfigurable computing system, contact me through private message if you want to know more about it. I cannot yet disclose fully publicly.
I have no doubts that HPCC can be faster than Hadoop on many workloads, but this PR sucks. Have you tried Petasort with HPCC? These numbers can be more interesting for BigData community.
Unfortunately we don’t have a system with at least 2-3 petabytes (one petabyte of input, one petabyte of output and some temporary storage) laying around that we could use for that test, but we’ll probably get around to testing, at least, one and ten terabytes (maybe even a larger size) early next year.
On a related topic, sometimes it’s hard to justify detouring resources to just run benchmarks, but we understand that some people take decisions based on them, so we’ll do a bit more in this area too.
I am from military and intelligence industry. I cannot stop laughing on those stupid wannabe big systems. Hadoop is something which is like 30 years obsolete, slow and unusable for real world calculations. It’s yet another scam to pump out money from customers. Some non-sense crap developed people who don’t know nothing about real computing and would need cluster even for running application of complexity of total commander written by them.
Any way out of this madness I will support. I am seriously pissed by this we should start building true products not anymore scammy crap to screw customers. That works maybe for limited time but certainly not forever and not even for considerably long time.
Just because some other vendor also claimed the beat the terasort benchmark sorting less then a terabyte does not make it any better for you guys to do the same.
i remember when hadoop first started gaining traction, green plum, teradata, asterdata all spent a ton of energy producing white papers and marketing BS that hadoop was a “step backwards”. This year at hadoop world I had a large sh*t eating grin on my face to see that some of these guys, along with netapp were gold and platinum sponsors of a hadoop event.I anticipate in 1 year HPCC will be in the same boat, vendor table at hadoop world.
What you could do is tone down the rhetoric and focus on better hadoop Integration. Stop calling yourselves “hadoop killers” and accept the reality that all these other vendors were not able to “kill hadoop” so you have like no shot.
Just looked at the rules at the http://sortbenchmark.org/ site. I am struggling to see how the above discourse correlates with the rules or takes advantage of the advice in the sites FAQ. The site says that a terrabyte sort is now depreciated as it is essentially the same as a minute sort which is defined herehttp://research.microsoft.com/en-us/um/people/gray/alphasort.doc. The Daytona record is 500GB and Indie 1353GB.
The FAQ states:
Can we use assembler?
Yes, but it is a bit surprising if you need to. This is a test of the OS IO system not of the compilers.
C/Java/VM debates were old decades ago. For this class of problem I/O and concurrency strategy are the key factors for system performance – not programming language choice. Further last time I did string comparison benchmarks using both C++ and Java with large datasets on Solaris was that (to my surprise) Java consistently out performed the C++ implementation. Critical loops in most HPC Java implementations get written or analyzed in Assembler anyway – but this is not new.
still pissed off because you got your moderator status removed because of crossing rules :))
Yes those debates are old as those languages itself, yet if you look over, many people have no idea and no knowledge about what for those languages were designed nor that they take it into account when they are deciding what to build in which language. I generally have nothing against Java, actually as well as Small Talk I find them to be a great ideas. But I think that if you told anybody who was designing eg java at that time that it one day will become a major language to build server applications they’d be laughing as hell especially when you will try them make believe it will be on x86*whatever* platform, yet it’s very much reality these days. I think Java is great for web applications, small and easy applications which need to be cheap and in usable state quite fast. Great, no objections. But when you are using Java to solve very complex problems on large cluster… I think that you would agree it’s just generally silly idea.
MapReduce … well yet another thing old as hell and for many problems very good thing. But building some nasty, big, “ultimate” framework and calling it something MapReduce? Why? It’s just simple method you could implement with quite little effort and for the purpose. Not actually bend the purpose for the framework you have, that’s kinda twisted don’t you think?
Altought I acknowledge contribution of STL towards advacing and promoting C/C++. I think that using it today, especially it’s author and keeper (SGI) is no more and seems to not to take care about it is quite well… obsolete. I am sorry, but someone who is not able to implement most of the things found in STL by his own should look for another job. Or it’s just our (programmers) laziness which cause we still stick with things like STL?
btw it’s working quite well for me, we are rebuilding java/.net/whatever beasts all the time, even it’s frustrating work (like “how the hell could someone spent just a penny for some **** like that”), the results are quite satisfying … our average result is 10 times + faster, more efficient applications with generally much better performance and robustness with much, much less downtimes than predecessors. In many cases, the price we ask for those applications is payed just by lowing electric costs due to using much less machines and most usually running their processors with lower average frequency if they have frequency stepping. We commonly lowering latencies from seconds to milliseconds or even lower. So yes, I would say it goes pretty well, altought our work is pretty much cleaning the mess after wannabe programmers who actually got paid more for the work (in absolute numbers, in relative we got more, even the result price is usually much lower) it’s good. And also it’s very good to see results … satisfied customers.
I don’t agree with your positions in the slightest.
I remember fondly watching my colleague bash out the first iterations of what would latter become the Jetty Web Server. He showed my peers the way out of the Smalltalk / C++ wars at the time (’95-97). Server applications in Java was exactly what we considered then. Now I don’t know anyone credible whom would seriously contemplate developing a major system without a VM.
As I actually do solve complex problems on a large cluster using Java, your position is flawed. Having built a multi-threaded MR framework myself, I wouldn’t class its implementation as trivial. Whilst there is a place for micro-optimisation (we a sponsor of the libturbo-jpeg project, now used in Chrome), it is the 0.5% case. For the kinds of Big Data applications we work with, the micro-performance data simply doesn’t support use of anything other than a VM. For large systems development (>$10M+) inherent with Big Data systems, your recommendations are bordering on malpractice due to the waste of money and other resources wasted on programming using inappropriate technologies.
As moderating forum’s is a lot of work, I can understand why Mr Clark might be upset with your behaviour. It appears way out of line.
You are making comments on this forum as if you are a technical expert. Where are your peer-reviewed papers, open source projects, data, industry references etc to support your posts?
If you can’t credibly support your arguments, may I suggest you re-think wasting peoples time and be respectful of people whom know more than you do and are spending their time to support the community.
well regarding the complex applications and VM … you couldn’t be more wrong. World most complex applications or any other programs (operating systems eg.) for that matter are NOT VM based and probably won’t ever be.
As I might accept Java (as any high level language) as a way to help people who are not interested in programming but in fast solving of problems it’s suitable. But I will never agree Java (or any other high level language) is suitable language for building systems for long time perpetual usage or for solving complex problems on very large data sets. Simple reason is – performance vs. programming costs. It’s not actually so much about Java but more about too complex “ecosystem” of framework and libraries built around which usually have all imaginable properties except simplicity resulting in “applications” of complexity of “Hello, World!” programs with size and resource usage of average word processor.
Yes well, I might “arrange” with my close friends and people I know to give each other reference on LinkedIn but then I’d be wasting a time and not only my time. Where are your peer-reviewed paper (which is important for academics, not professionals)? Except what you have written on your profile (like I did) where are your industry references? Well expecting someone to show data which are EVERYTIME confidential and usually subject to NDA and that includes performance results of client systems. So after all, credibility both of us is based on what we say and who support what – here.
I have experience from both military/defense and commercial. My references include biggest companies in our country and on the world. I have built several large data processing systems which are currently helping our customers to achieve their goals. My work is and was mostly about data acquisition, transformation systems (ETL) and complex data analysis systems usually built for large corporations so processing gigabytes to terabytes of data. It’s actually funny if you are trying to attack someone’s credibility while you don’t even have premium profile so no better way to proof what is written on it than me. And you know what, let’s just stick with knowledge and experience based TECHNICAL arguments not personal invectives.
many people complained about him previously, his behavior was inappropriate before and if he didn’t have moderator status removed it would
btw could you explain why do you think that complex project should be on VM (either Java or any other)?
always points out this group is “big data low latency” and a benchmark about sorting 100GB in 96 sec could only be “big data low latency” in the early 90’s. Jakub your wrong VM lanauges are everywhere in high performance systems, think about erlang. You also have to admit the the NSA has open sourced Acumulohttp://www.theregister.co.uk/2011/09/06/nsa_to_open_source_google_bigtable_like_database/print.html so the argument that defense/government can not use a VM language seems defeated.
this comment “Now I don’t know anyone credible whom would seriously contemplate developing a major system without a VM” is an assertion too far. At least for the Java VM, in some common contexts it is almost all downside and would be a poor technical choice. I have used a mix of Java and C/C++ for parallel systems and there is a distinct demarcation of when you might want to use one or the other.
For example, analytical database engines currently mostly written in C++. It is not that database engines could not be written in Java, obviously many are. But you give up enough absolute performance and (for real-time apps) performance variance that the JVM is not something that you would want to contemplate using lightly. You see similar tradeoffs in HPC and for similar reasons.
What these have in common is that they are all memory I/O intensive. The JVM handles that workload poorly and in these cases a well-engineered C++ system offers 2-10x the performance in systems where we could reasonably do a comparison between Java and C++ implementations. That is no small gap. With a lot of code effort and engineers that know the internals of the JVM you can usually get that down to half the performance of C++, a level where you can throw hardware at the difference, but the level of code effort required to get there makes the project more complicated than if it was done in C++ in the first place and it still runs significantly slower.
If what you are doing is CPU or disk intensive then by all means a VM won’t hurt you much and offers benefits. But if the application is memory I/O intensive then it requires unnatural and ugly design of the Java code to come close to a C++ implementation, performance-wise.
Note also that CPUs are becoming fast enough at many operations that some formerly CPU-bound codes are now memory-bound codes on current processors. This has been creating a resurgence of C/C++ usage for apps where Java previously produced good relative results. As always, pick the right tool for the job.
To all adepts of a holy wars
Yes, OS is written in C/C++, database engines in C++, that is why any large software system written in Java, Scala, PHP, Erlang, Python, Ruby etc make use of OS and relational DBs (usually).
I didn’t said that governments are not using Java. I said that high performance systems for defense are not written in Java and they aren’t. You are mistaking it with early research applications to help analyze large sets of data but not considering true performance. Yes I even admit Java and other high level languages are very handy for this purpose (if you wont spoil them by cobbling as much as “frameworks” together and actually do your work and write the program).
But NO I NEVER EVER saw a radar data acquisition system made in Java and I don’t expect I ever see. BUT! I very much believe that vastly the most of algorithm used in that radar real time analysis engine was actually first tested on some university in their cluster written eg. in JAVA, that’s more than possible, it’s actually very probable. But as J. Andrew Rogers stated even those system theoretically could be written eg in Java they would NEVER EVER meet the performance and time deterministic nature these systems need to be. And that’s all what I am saying. I don’t object against languages itself I just acknowledge and know what there languages are meant for and yes I don’t like if someone using some naturally good language for completely twisted, crazy purpose which was never meant for. And even if it is possible, it doesn’t mean it should be. It’s question of simple rationality aka “I will never use excavator to try to build ferrari on speed mile” and “I would never use Ferrari enzo to dig a hole”. That is my point.
In a discussion of applied technology, defining the scope of discussion — and choosing a part of that scope to focus on — is part of the exercise.
With a topic that seeks to maximize bandwidth, or speed X volume, whether one focuses on custom implementations utilizing dedicated silicon hardware, or one looks at VM/MPP uses of cloud resources on demand, criteria for evaluating or assessing the comparative accomplishments or capabilities of platforms or architectures are not exactly black and white, so there is much room for discussion.
However, when someone who might champion a particular part of the solution space feels compelled to disparage the rest of the space as part of advocating their own superiority, the approach is ironically self-destructive. When MSDS behavior is directed at persons known only on forums, it is also possible that a professional and collegial discussion is not the objective of the interaction.
if you do wish to stir things up, there are probably forums or groups on topics that are more within the center of your scope constraints where you will likely get a more informed and amicable reparté in return. If you wish to engage in a discussion, by all means feel free to describe what you do — but please contain the Dr. Strangelove reflex when it comes to laughing at, mocking, and sneering at the work of others — it is a ‘channel changer’.
as always you are right. I really do this 🙂 For a first time is always just “promote” of healthy competition altought it is sometimes beyond healthy. Well but looking back most of my mocking. But also you should admit that mostly I have explained general things and some people identified with them. I am sometimes too rational and simply don’t get this way of personalization. But yes at the end I just explode and switch to my “combat” mode. Because my work is lately a lot about “fighting” with people who stands behind non-sense technologies and implementations mostly because they want to keep job after they messed up a lot… So yes I am sometimes peevish if someone try to challenge what I need to proof to many people almost everyday again and again and what I now consider to be a basic knowledge. Also I have tendencies to be too simplistic when talking about it so some people may see it as general attack regarding eg specific language while it’s about methods, frameworks etc used in bad way usually.
I think we are perhaps in agreement, except our data on implications of memory bandwidth constraints are perhaps different (maybe because of use case). I am not arguing that there should be no assembler or C/C++. I referenced libturbo-jpeg project as an open source example of where we optimised a well known C library (libjpeg) by 4-5x and yet have it work /called from Java. I also use this example, because by using a mix of languages and skills allowed us to identify algorithmic improvements and constraints not perceivable from a single viewpoint.
RE: Memory bandwidth, We are currently running multi-threaded map-reduce jobs on 24 core servers and these are tapping out at 7-8 cores due to memory bandwidth constraints. With these type of job (index search of 10M+ records), even if we moved the computation to C, I doubt any net improvement in performance since we would still be memory bound.
Our C++ testing supports this view.
What we are worrying about is bumping up into Amdahl’s law as we parallelize work up along with other system wide optimisations.
My experience with dealing with Big data / Low latency problems is that multi-layer complexities exist that require data and a range of perspectives to address. My interest in open forums is to gain perspective as there is more we don’t know than do.
As I am sure you know libturbojpeg is written in C with critical parts in assembly using SSE2. That is what I am talking about. Well there is the thing … What I want is to disprove the myth that developing in C is actually more expensive. According to my experience it isn’t. It’s partly because almost everyone see the C language as “printf()” and “scanf()” and some standart logic… But that wasn’t true already like 5 years after C was publicly shown. How big difference is between calling function in C and in Java?
Memory bandwidth is never ending pain of x86 platform, you probably wouldn’t solve it on this hardware, altought I bet you could get some improvements by using tailored memory management. It is actually one of our products (we have multiple memory management systems each pre-built for the type of usage, you could contact me directly if you are interested in details). You could call it from languages which could call C routines (of course C++, Java, Perl and others). And that is it! Rationality in usage of resources … you want to rapidly change logic, developing it still or just have to frequently rebuild because you are in stage of research or because any other reason? Than I fully get why you using Java … but you still need a performance which means C/ASM written high performance memory management. So you wouldn’t be gaining performance by completely overwriting your applications but rather substituting critical parts for “tuned” ones. But I think there could be a problem also in VM’s parallelization which also probably have part in this problem. But there are ways to overcome some weaknesses of x86 and Java VM’s which especially on x86 have problem with memory I/O and thread level parallelization problems.
> What I want is to disprove the myth that developing in C is actually more expensive.
This is a big claim and one that is very hard to prove one way or the other. In the 90’s I spent a lot of time doing OO software metrics and event making comparisons between C++/ Java / Smalltalk. We did things such as use PSP to bring our software process under statistical process control and the like to aid comparison. Even doing this with expert programmers failed to give definitive results. Below is a link to a paper where one such attempt was written up.
The amount of work we spent extending and optimising libturbojpeg for our use, versus the amount of function it represents compared to the rest of our code base, doesn’t anecdotally support your view however. I could probably back solve the detailed metrics but would it is doubtful your view would get up. We are pushing a 4M LOC base of which optimised portions are sub 100KLOC.
I am of the horses for courses viewpoint.
I have teams who have deployed high performance systems for the military and intelligence. This is also public record if you look at who I have worked for in the last 25 years.
Jakub, if you stopped the bs here, you might actually learn something. It might surprise you to find out, given the average experienxe level of the folks commenting here, that we can write c code too.
But the language used doesn’t really have anythng to do with big data and low.latency. the architecture is actually far more important.
I am afraid you have confused the how with the what and it has severely clouded your vision.
Ask yourself one question if you disagree with me. Ask yourself, “has anyone here actually.learned anything constructive?”
well one thing is critical component on which the actual (customer view) preference depends. And yes if you were needed to optimize libturbojpeg on the level of ASM/SSEX code this could never have been as effective as higher level languages but they should be written in that quality you should never touch them again (“never” means probably till next bigger change in eg CPU architecture).
To your document you posted. Was it C++ w/ STL used?
If you use whatever architecture you want and still implement it on VM which has severe limitations regarding I/O, memory usage and simmilar you can come-up with whatever architecture you want and still will be slow and sucks same way like VM do. Even if you have very good VM, you still have one another layer which will use your resources. Yes architecture IS important. But when you limit it by underlying thing (no matter if it’s VM, framework or just library) you still keep problems of that thing no matter what fancy architecture you will come up with. It’s like trying building skyscraper on base for 5+1 suburb home.
I know most of the people here was or is using C/C++. But most of them used combination of C++ and STL without no further special libraries eg. for memory handling etc. Well if I would ended up with that I probably will be using Java too. What I am trying to say all the time is you cannot build “Atlantis flying city” from wood.
About the test: I thing, especially because it was response to sort of challenge, this test is more than adequate, but Flavio if you have a chance to make some tests with other benchmark please do and publish.
And yes the test was more than success because it ran better with much less resources as simple as that it is.
we all get the language deal. Ok? It might surprise.you to find out that moat big data, low latency projects use many of.them. also, if you really know.what you are doing with java, you can get the same resultant as.with other languages. But again, different languages have different use cases.
In my reaponse above, those very high speed systems for intelligence.were written in java.
All you are continuing to do is fly in the face of facts.and public evidence. Why don’t you impress us now with architextures for.processing more data than you can process on one machine in near real time. Thats what this.forum is actually about.
It is difference between calling high perf routine built in C/ASM called from java. Again, no real time or low latency use Java. I will not change this statement before I see a one…
You won’t see a high performance system in Java with your attitude. People who have implemented them will just turn around without talking to you since there is nothing that they are going to learn from somebody with an obviously closed mind.
google “LMAX Disruptor” – Java concurrency framework which is open sourced and used by LMAX in their high volume low latency trading system written entirely in Java.
To my best knowledge, there is nothing in the open source (at least) comparable in other languages (Asm, C, C++).
I know how to write low (and very low) latency applications in Java (I am talking about microseconds latencies). I do this for living and fun. If latency is not enough low – Java has a lot of open source JNI libs which allow programmer to get the maximum from underlying HW/SW.
It is much harder to write low latency app in Java than regular app, of course, but the same is true for C and C++ as well.
1. Zero GC path. Avoid creating objects in critical paths of your application.
2. Tune-profile everything as much as possible.
3. Tune OS for low latencies.
4. Use CPU affinity.
If you need RT – then patch OS and buy Java RT – you will get your RT.
A couple notes on Java vs. Asm/C/C++ holy war. I am absolutely sure that we will never see Hadoop written in C/C++ entirely. Even MapR implementation is 99% java and 1 % native code in DFS layer. The reason is pretty obvious: Its not feasible (unless you have unlimited everything: time, money etc).
2 days ago• Like
Honestly – below are our performance stat’s for a 1000 concurrent connections making a 100 byte request to our Java Web / App server / DB. As you can see the Java implementation provides comparable performance to multiple C implementations, except also provides app server and key-value db functionality.
Apart from extreme cost, it turns out doesn’t really make sense to optimise much further as you end up well exceeding capability of the Wide area network and within a LAN you quickly saturate a 1 GB network with real life workloads.
The PSP work was done using an ancient C++ library (OTC).
[pah@localhost ~]$ ab -k -c 1000 -n 100000 http://10.0.0.201:8033/simpledb/simple/key-1
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.0.0.201 (be patient)
Completed 10000 requests
Completed 100000 requests
Finished 100000 requests
Server Software: iServ/2.1
Server Hostname: 10.0.0.201
Server Port: 8033
Document Path: /simpledb/simple/key-1
Document Length: 100 bytes
Concurrency Level: 1000
Time taken for tests: 6.690 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 100000
Total transferred: 29200000 bytes
HTML transferred: 10000000 bytes
Requests per second: 14948.61 [#/sec] (mean)
Time per request: 66.896 [ms] (mean)
Time per request: 0.067 [ms] (mean, across all concurrent requests)
Transfer rate: 4262.69 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 12 190.6 0 3001
Processing: 1 39 17.4 40 137
Waiting: 1 39 17.4 40 137
Total: 1 51 192.7 40 3089
Percentage of the requests served within a certain time (ms)
100% 3089 (longest request)
Not that more fuel needs to be added to this discussion, but Falcon, the Chicago Mercantile Exchange’s Clearing Engine, is a Java based system. This is the same matching engine which clears all Globex trades. If that is not a real system, I am not clear what is. Falcon supplanted Eagle which was C based. My understanding is that one of the most critical aspects of the design is 1st point.
You know I kinda thought zettaset now holds the record…
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!