Tag Archives: Hadoop

Intro to R statistic programming and R with Hadoop

Intro to R statistic programming
This is an introduction by Ram of Dawn Analytics. This was presented in a Meetup group at meetup.com/
File can be found at
Free Intro to R statistic programming with complete download of Microsoft Powerpoint and accompanying R script files #rstats #free

Continue reading

How to parallelize with R and Hadoop tonite! Complete ARIMA source code strategy walkthrough online Meetup Oct 23!

Hi there

Join Ram Venkat tonite at 7PM Eastern Standard Time to learn about how he uses Hadoop and R for his parallel processing with Python. This is on tonite via my GotoMeeting online virtual meeting. Login details:

1.  Please join my meeting, Monday, October 15, 2012 at 7:00 PM Eastern Daylight Time.

2.  Use your microphone and speakers (VoIP) – a headset is recommended.  Or, call in using your telephone.

Dial +1 (647) 497-9373
Access Code: 275-963-877
Audio PIN: Shown after joining the meeting

Meeting ID: 275-963-877

Also, another Meetup is slated for North York Ont Monday 10/22 at 7pm EST.


Lastly, another Premium Membership Meetup is slated for Tues 10/23 on a complete walkthrough of my ARIMA modelling R script. It includes fast data capture as well as a function for automatic best fit.

–> Join now go get access to this Oct/23 event! <–

Got a question,? Let me know.
Thanks Bryan


Online Meetup for Hadoop and R for Parallel Processing

Online Meetup for Hadoop and R for Parallel Processing

Hadoop has been so popular as a parallel processing infrastructure, it is synonymous with ‘big data’ today. This presentation introduces Hadoop as well as executing R routines with Hadoop.”

Ram Venkat is an anlytics consultant in Toronto Area with focus on R, Python, Hadoop and MongoDB technologies. His interest in statistical areas include Customer Analytics with emphasis on Opinions, Trends, Associations and Clustering.



For R connectivity, using NOSQL options for clustering and parallelization using Redis, Cassandra, Couch, MongoDB, MYSQL, Hadoop with HBase

For R connectivity, using NOSQL options for clustering and parallelization using Redis, Cassandra, Couch, MongoDB, MYSQL, Hadoop with HBase

I have a completed my R source code walkthroughs of 14 popular forecasting models for my membership. Now I focus on my cluster to speed up the simulations of the algos. As a result, it always comes down to how R talks to the popular NOSQL options out there. It seems I have narrowed it down to MongoDB and Redis. There are really not decent client R code examples for Hadoop, Couch, or Cassandra. Here are some links that making me lean towards Redis.

Comparing MongoDB and Redis, Part 1





Plus the client coding examples for Redis is much more helpful.

Update: It looks like I am going with MongoDB as I have 3 32 bit Macs. There is a limitation of 2 gb with Mongo but at least they can be used. MYSQL does not support older versions of OSX as well Redis is really Linux only. Too bad on the Redis side because it looked awesome!


Rhadoop with R and Hadoop sort of works

Rhadoop with R and Hadoop sort of works

Finally RHadoop running with R and Hadoop with rmr Map and Reduce bridged thanks to this tutorial

These links made it happens to someone who commented on my last post on what started this whole journey. Thanks to them.

<a href=”https://github.com/jeffreybreen/tutorial-201203-big-data”>https://github.com/jeffreybreen/tutorial-201203-big-data</a>

<a href=”https://github.com/jeffreybreen/tutorial-201203-big-data/blob/master/README”>https://github.com/jeffreybreen/tutorial-201203-big-data/blob/master/README</a>


[youtube_sc url=”http://www.youtube.com/watch?v=uCgrUU02__Q” title=”r%20rhadoop%20hadoop”]