lowest latency data injection to hadoop for quant analytics?
I’ve customer looking for low latency data injection to hadoop . Customer wants to inject 1million records per/sec. Can someone guide me which tools or technology can be used for this kind of data injection to hadoop.
There is a number of solutions for loading data into HDFS: Flume, Scribe, Chukwa. Some teams load data into HBase as fast storage. If data is loaded from a relational database there is a Sqoop.
As usual the devil is in details. What is the size of a record? What are the latency requirements (msec, seconds, minutes)? How many sources of data? Is it a continuous data stream or a batch load?
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!