fbpx

Surprise! I thought Matlab Coder toolbox C++ Hadoop Pipes would be a good option vs Java, Hadoop, and R

(Last Updated On: May 8, 2012)

Surprise! I thought Matlab Coder toolbox C++ Hadoop Pipes would be a good option vs Java, Hadoop, and R

Another option is to drop R, replace with C++ generated code from Matlab using the Coder toolbox. You can then pipe through using Hadoop Pipes. This is just an idea but could work but everything depends on the kinds of C++ code generated by Matlab. It does eliminate the need of needing to learn a language like R which could be a huge battle with some undocumented packages. This seems to be the big gripe of R.

Here are some tutorial links with C++ and Hadoop Pipes.

http://developer.yahoo.com/hadoop/tutorial/module4.html

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html

But this lead me to this link:

http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_–_Running_C%2B%2B_Programs_on_Hadoop

The wordcount program in native Java, in Python streaming mode and in C++ pipes mode is run on 6 books from the Gutenberg project:

Damn! C++ Hadoop Pipes seems twice as slow as Java. I can never win.

Method     Real Time
(seconds)     Ratio to Java
Java     2 min 15.7     1.0
C++     5 min 26     0.416
Python     12 min 46.5     0.177

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Subscribe For Latest Updates

Sign up to best of business news, informed analysis and opinions on what matters to you.
Invalid email address
We promise not to spam you. You can unsubscribe at any time.

NOTE!

Check NEW site on stock forex and ETF analysis and automation

Scroll to Top