Tag Archives: BigData

Open source FREE project demos on GitHub: Matlab Coder to C++, R Hadoop file for big data #cpp #matlab #rstats #hadoop #bigdata #github

Open source FREE project demos on GitHub: Matlab Coder to C++, R Hadoop file for big data #cpp #matlab #rstats #hadoop #bigdata #github

https://github.com/quantlabs

 

—-

https://github.com/quantlabs/r-hadoop-for-big-data

Download Free Associated R open source script files for big data analysis with Hadoop and R 
These are R script source file from Ram Venkat from a past Meetup we did at
http://www.meetup.com/R-Matlab-Users/events/85160532/
Also, there is a long video and  Powerpoint presentation slide PDF with R files at:
https://quantlabs.net/blog/2012/11/how-to-use-hadoop-and-r-for-big-data-parallel-processing-free-download-pdf/
Download source files from 
https://quantlabs.net/blog/2012/11/download-free-associated-r-open-source-script-files-for-big-data-analysis-with-hadoop-and-r-rstats-hadoop/
Bryan at QuantLabs.net
 —

Open Source Code Demo of MATLAB Coder converting Hello World M script to C++ file

This is a simple demo of this powerful to demo the conversion of Matlab’s M script file to a C++ file for your trading or target platform.

Youtube video and Download the ZIP package from https://quantlabs.net/blog/2012/11/rading-or-hft-open-source-code-demo-of-matlab-coder-toolbox-converting-hello-world-m-script-to-c-file-free-opensource/

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Open source FREE project demos on GitHub: C# Matlab NE Builder, R Hadoop file for big data #csharp #matlab #rstats #hadoop #bigdata #github

Open source FREE project demos on GitHub: C# Matlab NE Builder, R Hadoop file for big data #csharp #matlab #rstats #hadoop #bigdata #github

https://github.com/quantlabs

https://github.com/quantlabs/matlab_test_to_csharp

Open Source Code Demo of MATLAB Coder converting Hello World M script to C++ file

This is a simple demo of this powerful to demo the conversion of Matlab’s M script file to a C++ file for your trading or target platform.

Youtube video and Download the ZIP package from https://quantlabs.net/blog/2012/11/rading-or-hft-open-source-code-demo-of-matlab-coder-toolbox-converting-hello-world-m-script-to-c-file-free-opensource/

—-

https://github.com/quantlabs/r-hadoop-for-big-data

Download Free Associated R open source script files for big data analysis with Hadoop and R 
These are R script source file from Ram Venkat from a past Meetup we did at
http://www.meetup.com/R-Matlab-Users/events/85160532/
Also, there is a long video and  Powerpoint presentation slide PDF with R files at:
https://quantlabs.net/blog/2012/11/how-to-use-hadoop-and-r-for-big-data-parallel-processing-free-download-pdf/
Download source files from 
https://quantlabs.net/blog/2012/11/download-free-associated-r-open-source-script-files-for-big-data-analysis-with-hadoop-and-r-rstats-hadoop/
Bryan at QuantLabs.net

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

MongoDB and Hadoop, and BigData for quant development

MongoDB and Hadoop, and BigData for quant development

Any one has much experience with MongoDB and the interfacing of the services with Hadoop?

Just started a gradual upgrade of some of our computation-intensive computational software with the intention of incorporating MongoDB and Hadoop services into the deployment of the software, and hence I am curious.

Any suggestion therefore would be informative and instructive

 

Some folks have reported success running MongoDB on top of MapR’s Hadoop distribution. Basically they took advantage of MapR’s replication and snapshots instead of using MongoDB’s replication as MapR’s re-synchronization on a node failure is very very good. They continued to use Mongo’s sharding.

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

BigData : How data distributed in Hadoop HDFS (Hadoop Distributed File System)

BigData : How data distributed in Hadoop HDFS (Hadoop Distributed File System)
Apache Hadoop framework uses Google MapReduce model and Google File system logic’s. In Hadoop, Data will be split into chunks and distributed across all nodes in cluster. This concept is inherited from Google file system, In hadoop we mention it as HDFS (i.e. Hadoop Distributed File System). While loading data into HDFS, it start distributing to all nodes based on few parameters. Here will see two important parameter need to consider for better performance.

1. Chunk size (dfs.block.size(in bytes)) – 64MB,128MB,256MB or 512MB. its preferable to choose size based on our input data size to be process and power of each node.

2. Replication Factor (dfs.replication=3) – by default its 3. means data will be available in 3 nodes or 3 times around cluster. In case of high chance of failure in nodes, better to increment replication factor value. Need for data replication is, if any node in cluster failed, data in that node cannot be processed, so will not get complete result.

For Example, to process 1TB of data with 1000 nodes. 1TB(1024GB)* 3 replication factor = 3072 GB of data will be available in all 1000 node cluster. we can specify chunk size based on our node capability. if node has more than 2GB memory(RAM), then can specify 512MB chunk size. so one node TaskTracker will process one chunk at a time. If its a dual core processor, one node will process 2 chunks at a same time. so specify chunk size based on memory available in each node. Recommended not to use NameNode(Master) also as a Datanode, else that single node overloaded with task of both TaskTracker and JobTracker.

Will that data distributed equally in hadoop cluster’s node?

No, it’s not distributed like 3GB in each node. some node will have 8GB of data, other node will have 5GB, and 1GB.. and so on. but node will have complete chunk. it wont be distributed like half chunk here and there.

In Upcoming posts we will see about more hadoop parameter to improve cluster performance. If you like this post, please click +1 button below to recommend this page and click ‘like’ button to get updates in facebook(Only once in a week).
http://cloud-computation.blogspot.com/2011/07/bigdata-how-data-distributed-in-hadoop.html

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!