What is Big Data, Low Latency for quant development?

(Last Updated On: December 7, 2011)
Learn the Secret

Get  our 2 Free Books

Get these now which land directly to their inbox.
Invalid email address

What is Big Data, Low Latency for quant development?

I thought I would throw out my definition just to get things started up here again in light of some recent ramblings…

I define Big Data this way:

1) You’ve got Big Data when you can’t analyze it in time to make a meaningful decision – i.e., it’s no good figuring out the terrorists are going to strike after the fact.

and

2) You’ve got Big Data when you have to take into account the physicality of the data – i.e., this is often referred to as gravity, moving code to data, etc. We see this in Hadoop and other scatter/gather, map/reduce, etc. algorithms.

I define Low Latency as event driven – and as acceptable quality is defined by the use case, so is latency. Because as I define in 1 above, you’ve got to be able to make a decision while it still has value.

Some decisions have more value when you take more latency out of them, some don’t – it’s relative.

But central to all of this is the ability to scale – billions of small messages, millions of big messages, millions of users, etc.

What are your definitions of Big Data, Low Latency?

 

==

may I suggest the following variants:

1) You’ve got Big Data when data is generated faster than can be processed using commodity solutions. (I don’t think you have Big Data if you can solve your problem with MySQL on a commodity server). Further -there are computations other than just analysis e.g. transaction processing.

Whilst Low Latency is event driven it is sometimes directly relates to both direct and indirect Use Cases upon the Big Data. E.g. in Search you need to both Web Crawl fast enough as well as provide a low latency search interface.

Due to scale, Big Data – Low Latency systems typically need to more strongly consider:
a) Economics – otherwise you won’t afford “Big”.

b) Non-Functional Constraints. Privacy, Regulation, Availability, Transactionality become harder with big data. (e.g. Big Data may often imply multi-jurisdiction. This then kicks off issues of, network topology, where data is stored, Privacy Law’s for different States, etc etc. etc.)

c) Scale “Break Points”. As a solution gets “Big”er, approaches that worked OK at one scale, stop working. For example, if a {J}VM is being used to process data in memory to meet latency requirements at some level of scale, “GC Pause” becomes a concern. Invariably these will have implications for Latency. e.g When you can’t process everything in memory, you may get network and I/O issues arising.

d) Mechanical Sympathy – designing systems around how hardware actually works. The blogs from the authors of Disruptor Concurrency Framework are illustrative for the sort of care one must have to process “Big Data Low Latency”.

 

==

What is a real world example of a Big Data, Low Latency problem?

 

==

I’ll list a few here, and we can expand on them as we go.

1) looking for irregular trading practices
2) predicting power grid failures
3) identifying potential terrorist activities
4) behavioral re-targeting (internet advertising)
5) semantic search

For all of these, think event-driven and at scale. You have to receive, store, and process the data all in enough time for the recommended next best action to have as much value as possible. Increasing the value of the result is directly linked to low latency.

(I’m sure others have other use cases – feel free to add please)

 

==

Ok, now I see what you are driving at. This looks like what we used to call real-time decision support. I am pretty sure SGI are doing some of 2,3 and 4. I will see if I can find out for sure.

 

==

6. Mass computer vision.
7. Retail financial transaction processing (inc trading ala LMAX, gambling, tolling etc.)
8. M2M sensor monitoring
9. Large scale GPS tracking

We have found many of these problem are not well suited to traditional supercomputing approaches. Earlier this year we did a performance comparison our infrastructure and a current super computer Roland may know about. we we 100x faster and 7000x cheaper primarily due to our higher memory bandwidth

 

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!
This entry was posted in Quant Development and tagged , , on by .

About caustic

Hi i there My name is Bryan Downing. I am part of a company called QuantLabs.Net This is specifically a company with a high profile blog about technology, trading, financial, investment, quant, etc. It posts things on how to do job interviews with large companies like Morgan Stanley, Bloomberg, Citibank, and IBM. It also posts different unique tips and tricks on Java, C++, or C programming. It posts about different techniques in learning about Matlab and building models or strategies. There is a lot here if you are into venturing into the financial world like quant or technical analysis. It also discusses the future generation of trading and programming Specialties: C++, Java, C#, Matlab, quant, models, strategies, technical analysis, linux, windows P.S. I have been known to be the worst typist. Do not be offended by it as I like to bang stuff out and put priorty of what I do over typing. Maybe one day I can get a full time copy editor to help out. Do note I prefer videos as they are much easier to produce so check out my many video at youtube.com/quantlabs