Tag Archives: HBase

For R connectivity, using NOSQL options for clustering and parallelization using Redis, Cassandra, Couch, MongoDB, MYSQL, Hadoop with HBase

For R connectivity, using NOSQL options for clustering and parallelization using Redis, Cassandra, Couch, MongoDB, MYSQL, Hadoop with HBase

I have a completed my R source code walkthroughs of 14 popular forecasting models for my membership. Now I focus on my cluster to speed up the simulations of the algos. As a result, it always comes down to how R talks to the popular NOSQL options out there. It seems I have narrowed it down to MongoDB and Redis. There are really not decent client R code examples for Hadoop, Couch, or Cassandra. Here are some links that making me lean towards Redis.
http://stackoverflow.com/questions/10696463/mongodb-with-redis

Comparing MongoDB and Redis, Part 1

http://openmymind.net/2011/5/8/Practical-NoSQL-Solving-a-Real-Problem-w-Mongo-Red/

http://www.quora.com/What-are-the-advantages-and-disadvantages-of-using-MongoDB-vs-CouchDB-vs-Cassandra-vs-Redis

http://java.dzone.com/articles/should-i-use-mongodb-couchdb

http://stackoverflow.com/questions/5252577/how-much-faster-is-redis-than-mongodb

Plus the client coding examples for Redis is much more helpful.

Update: It looks like I am going with MongoDB as I have 3 32 bit Macs. There is a limitation of 2 gb with Mongo but at least they can be used. MYSQL does not support older versions of OSX as well Redis is really Linux only. Too bad on the Redis side because it looked awesome!

 

Which RDMS or NOSQL database do you use for R? MySQL, Cassandra, HBase, MongoDB, Oracle, PostgreSQL, CouchDB, SQLite?

Which RDMS or NOSQL database do you use for R? MySQL, Cassandra,  HBase, MongoDB, Oracle, PostgreSQL, CouchDB, SQLite?

This R survey is kind of important. It will show a few things:

  1. Which R most users use regardless if they are commercial vs open source vs NOSQL .
  2. This will help us figure out which database is best for R using the scalability and speed depending on the requirements. This includes multiple writes for market tick data from C++ or a Java application and access by various R algorithms for analytics purposes.

Go here for the poll.

Here are some reasonable options with reasons:

MYSQL

I would assume this to be the number one choice since it is open source (or at least they say). It also contains sharding and other scalability needs with clustering. Is this something that people are using for their trading platform requirements? This includes using MYSQL as a tick data repository.

PostgreSQL

Is anyone actually using this open source database for their R needs?

Oracle

This is easily the most popular commercial RDMS for both Linux/Unix and Windows. As Oracle has open R into with a connector into their ecosystem, I wondered if people are actually using this.

SQL Server/DB2/Sybase

I am unsure if there are any R package connectors to any of these databases. I was just curious as I am really not interested in these as a real option.

Cassandra

There seems to be no R package support for this. I once posted something on R-Bloggers.com and it lit up the site, it made me wonder if this is actually more popular than people think. It seems to meet the needs of both quick write and read access.

Redis

Now the doRedis R package looked really hot. It even showcased how to use with a potential financial analytics system. I even saw Java sharding examples which left me excited on the capabilities of this database.

MongoDB

This seems to be strangely the most popular of all. I also found various R packages which seems to support it as well.

HBase which is part of Hadoop

Eh. No support even according to Revolutionary Analytics which their lacking install R package guides. I gave up pretty quickly on these R packages.

All others database options seem fine but the ones listed above seem the most viable for any R user as repository for scaling and clustering.

Go here for the poll.

http://quantlabs.net/surveys/2012/06/19/what-rdms-or-nosql-database-should-a-r-user-focus-on/