Best Alternatives to Hadoop??? Please help for quant analytics

(Last Updated On: March 5, 2012)

Best Alternatives to Hadoop??? Please help for quant analytics



Check out HPCC http://hpccsystems.com/download


more detailed information on the comparison is here: http://hpccsystems.com/Why-HPCC/HPCC-vs-Hadoop/Superior-to-Hadoop



are you looking for Hadoop alternatives OR Map Reduce alternatives OR Big data analytics solutions alternative?
Sure there are alternatives- various Hadoop distributions – Cloudera, MapR, Brisk, EMC Hadoop Appliance, Oracle Big data appliance; various MPP- Aster, Oracle Exadata, EMC Greenplum, HP Vertica, MS SQL 2012 and many more; many alternate solutions- Rainstor, Dryad etc.
And whole list here of open source ones- http://www.quora.com/What-are-some-promising-open-source-alternatives-to-Hadoop-MapReduce-for-map-reduce



it really depends on what you are trying to do. Try to match the compute problem to the best




If the issue is large data analysis and/ or visual representation – take a look.

Not many. In addition to the above look at Apache Cassandra and MongoDB…



maybe a bit more context to the question would help.
For example, sometimes if it’s just a normal data question, then DB2 with it’s Massively Parallel architecture is best suited. If it’s ETML, then maybe Information Server’s DataStage’s with it’s parallel framework might server you better. You can call a program from inside of datastage too, I help one Airline with a 136 node application written in the parallel Orchestrate framework



For data processing you can also try GridGain (http://www.gridgain.com), JPPF (http://www.jppf.org) and Spark (http://spark-project.org) for not-only-map-reduce distributed processing frameworks. On the other hand, be aware that maybe not all of these frameworks can benefit from data locality awareness (i.e., being able moving code to where data is) automatically.

Of course, this would be for easily parallelizable tasks. If that is not the case, maybe you should try instead MPI solutions.

Hope it helps!


I really don’t see why Hadoop is used by anyone. Don’t be fooled by “Google or Facebook” is using it so it must be great for us to use. On our web site, http://www.VelocityDB.com, we have a comparison with Hadoop, where Hadoop takes 13140 seconds to find all possible triangles given 86,220,856 edges (4846609 nodes) while VelocityDB completes the same job in 42 seconds. C# has Parallel.ForEach which is simple to use compared to MapReduce. See: http://www.velocitydb.com/Compare.aspx#triangles



Again as suggested above, it depends on what you want to use it for. I know of at least 3 companies using it:
* EBay – http://www.ebaytechblog.com/2010/10/29/hadoop-the-power-of-the-elephant/

* PointInside – Large scale data analytics

* Amazon.com – Data analytics on the cloud (AWS)

* An online music subscription company – In their music recommendations engine

If you look at some of the job openings in Hadoop that are out there it will also give you some fo the use cases for which it is being deployed. Clearly there is MASSIVE momentum behind Hadoop since it runs on commodity hardware and it is scalable with many successful implementations.

All the best.



Totally agree, it always depends what you want to use it for.
If you are looking for “near real-time” response times, i.e. sub-second response times on large data as well as continuous import of new data and very high query throughput, then you might want to take a look at our product called ParStream.

Not MR like Hadoop, which is great for ETL of unstructured data.
Build from scratch in C++, uses a novel high performance compressed index which is columnar, bitmap based – i.e. flexible for querying and extremly fast becuase it can be processed massively parallel. The secret sauce… the index can be analyzed in compressed format, i.e. no decompression required. That makes it really really fast.

Thanks for asking and happy to get in contact.



Every approach to data processing has advantages and disadvantages. You need to tailor your data processing solution to the problem at hand for best results. Hadoop allows you to avoid pre-processing costs (data cleansing, structuring, loading) and process the data where it resides. In addition, unlike most database systems, you don’t have to think very hard about scaling up. Scaling up with a database system is a very hard problem. The reason Hadoop is getting so much press is that it very neatly solves the scale up problem, and this is no small feat. But I agree it is not a replacement for every database problem. Anyone else promising scale up I would examine their claims closely, are there large scale systems in production that actually show it works, etc…



If real time analytics is your goal than take a look at SQLstream . SQLstream is not a database though it allows real time event processing, performing ETL on “Big Data” from an unlimited amount of diverse sources running against standing queries performed in standard SQL language, enhances the data, delivers BI, alerts and analysis to any device in any format all on the fly.

If you need to persist the data to storage it arrives fully cooked with little to no latency and you can push the stored data through with the real time data from the sources to get historic comparatives and predictive analytics, on the fly, in real time as well.

Additionally SQLstream is “stand-alone” for seamless integration with current systems in place and has unlimited scalability.



As several others have stated it depends on what you are trying to do. As Mats mentioned above for many advanced analytics types of processing trying to do this in Hadoop will take longer to program and not perform nearly as fast as a platform designed for it. At ParAccel we’ve designed a columnar database from the ground up to do analytics extremely fast. We have baked in over 500 advanced analytics functions and also have a way to integrate to Hadoop where we can run MapReduce jobs from within SQL and bring the result set back in parallel giving you the best of both worlds where you use the right tool for the job at hand. That’s why Amazon has invested in ParAccel and Microstrategy runs their Wisdom app on ParAccel.

Check it out: http://www.paraccel.com/blog/tag/paraccel/



Decooda has developed a platform for its social media solutions that supports structured and unstructured data in both streaming and batch modes. It’s also deployable in the cloud and as an appliance. We will be launching it publicly soon as a separate product, but it is available for qualified clients now. Couple of high points: threadless, shared nothing, pure linear scalability, database agnostic, flexible, simple, fault-tolerant, smaller commodity hardware footprint…much more. To be clear, we are jumping in front of the Hadoop train, we refuse to perpetuate the big-lie that Hadoop is “the” answer to big data requirements.


Unlike Hadoop which is more geared around batch-oriented processing, MarkLogic is an enterprise-class real-time database for data at the scale that Hadoop can process. If you’re looking for a transactional data store, lots of enterprise features, and ability to do structured and textual data querying, this might be worth a look.

Website: http://www.marklogic.com



Not sure who is saying that Hadoop is “the” answer. It is a good one, and getting even better, when the problem you are trying to solve matches it’s paradigm. But certainly not the only one.



you are correct, and I didn’t intend to dis other market participants. Of course, there are options. However, if you look at the primary “brand” that big data discussions revolve around, generally speaking, it’s Hadoop. Most certainly there is a significant amount of dis-information about the basics, like what Hadoop is – is it a platform, a database, MapReduce, etc… If you look at the investment community it’s even more absolute, it’s pretty much…Hadoop, Hadoop, Hadoop…

I’d like to chat with you when it might be convenient for you…



one other comment. We love GPFS…for all sorts of reasons. Trying to strike up a conversation with Satish Gupta on the proliferation of GPFS…


NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!
Don't miss out!

You will received instantly the download links.

Invalid email address
Give it a try. You can unsubscribe at any time.


Check NEW site on stock forex and ETF analysis and automation

Scroll to Top