Quant development: which is the best language to code (C, C++ or JAva) for an app where speed and low latency are important
FPGA stands for Field Programmable Gate Array. You can use the FPGA gates to create serial or parallel logic So there is nothing inherently parallel about FPGA design. Perhaps you’re thinking of CUDA?. Many large FPGAs have sufficient capacity to embed an ARM or PPC CPU as powerful as a desktop CPU and still have capacity to spare.
As a sidebar the important thing about the FPGA approach gives you fine control over what is done in software versus hardware. To use the FPGA to its fullest potential you need firmware engineers who are comfortable hopping back and forth between FPGA (VHDL or Verilog) and traditional software development. As you can imagine these individuals are hard to find especially given financial problem domain knowledge requirements.
I am relatively new to finance but when I worked in telecom we did a lot with FPGAs. I never got to design the VHDL because I was a software guy and the engineering org was heavily “siloed” (separate hardware and software managerial reporting structures). The silo approach makes sense given the labor market realities and the complexities of mastering either FPGA or software design. But it lengthens the design cycle since now you spend time negotiating with the hardware guys to get the important functions embedded. I have an EE background so I was pretty good at explaining to them what I needed. When you have a system with an optimized balance of hardware and software it’s a thing of beauty. Much better than spending CPU cycle to cover up the flaws in the hardware design!
For example in a high frequency trading application you could embed the FIX packet processing in the FPGA and have the FPGA generate a PCI interrupt once certain interesting packets arrived or better yet DMA the interesting packets into circular buffer in host memory. These are the sorts of tricks we did with FPGAs in telecom.
The problem is there seems to be alot of hype about FPGA. Acutally not just FPGA but competing systems in the low latency space. Various systems are quoting figures of 1/3/5 microseconds. Comparing these is not straight forward as some systems are stat arb and some OMS. Even within this category the functionality will be different so comparisons need to factor this in.
I saw an independant study on the web which implemented a solution in software and on an FPGA. The serial solution ran four times faster in software than on the FPGA. For the parallel solution, when the concurrent threads exceeded the number of available cores the FPGA solution was faster.
For an investment bank a low latency DMA/program trading solution needs to trade across multiple exchanges. In Europe there are large differences in behaviour across exchanges. The trading system needs to normalise these differences as well as support custom transformations, franchise protection checks, risk checks (client position limits, restricted stocks etc), exchange checks (eg price tolerance, tick scale etc). Normalisation requires full order management and correct handling of trade busts/cancels as well as amends and IOCs. This is a complex problem, do you (can you?) put all this in an FPGA card ? Do you use some hybrid solution with an FPGA NIC ? With old NICs and app to wire TCP times greater than 15ms I see the potential. But with O/S and BIOS tuning and latest NICs like Solarflare with app to wire times around 2 micros (and Mellanox quoting 1 usec) whats the gain ?
An object orientated system using java may have 70000 lines of app code with 30000 lines of unit test code. While the unit tests dont make the code faster they do provide confidence and help facilitate fast development and bug fixing. Given the complexity of the problem and requirements for fast code changes I believe an OO solution is best. A boundary system with multiple input and output sessions will encode/decode FIX/exchange binary messages into real objects to represent the event (eg NewOrderSingle). How would you use an FPGA NIC to do the event decoding/encoding for a hybrid system ?
I would expect FPGA to have lower outliers than a CPU based solution. But the key figure is the 90th and 95th percentiles. If you have the fastest system at the 95th percentile then all other things being equal you have the best chance of hitting the order book and successfully trading 95% of the time.
Its clear some people are building FPGA and FPGA hybrid systems. All I can say is good luck. It will be interesting to see how this evolves.
The original question for the thread is which is best for building a low latency system C, C++, java (, FPGA). As already stated by other feedback all three can be used to build a low latency system. Which is best to use is clearly debatable !
Yes clearly it is a very complex system and the conversation thread has strayed very far from which language is the best … but as was suggested before that topics been beaten to death I am enjoying this conversation so:
I think the key to using FPGAs effectively is to understand which parts of the application are invariant and which require multiple modalities because of exchange or market rules etc. Depending on complexity these variations can be accommodated as modes within the FPGA or use separate FPGA program binaries if you can dedicate an FPGA resource to each market/exchange. What you want to avoid is a lot of hardware/software interaction because then you lose the speed that the hardware was supposed to deliver. Another principle of optimization is to identify the “hot” code and prioritize this for embedding into the FPGA. I look at as more of a tuning process then a once and done classic waterfall design. That’s why ideally you would have firmware engineers who could work on both sides so functionality could shift from software to hardware in a seamless frictionless way. In this way the hardware/software mix is optimized based on meeting requirements and technical analysis rather than a given team’s technical capabilities which unfortunately is the de-facto method that usually determines the mix in a real world project.
I agree with many of your points and others I disagree. I believe the latency numbers for software you mention are minimums (not even averages). STAC and Solarflare published latency tests here http://www.stacresearch.com/solarflare (free registration required). Under higher loads the latencies are much higher and maximums go into the milliseconds. I have no problem with ignoring the outliers outside of 99% or greater (personally I think this is waiting for a Black Swan to hit it) of the tests but what were the conditions under which the outliers occurred? Heavy trade volume is when things matter most and software does not fare well when compared to hardware under those conditions.
I am curious to see the paper that shows the FPGA being slower than the CPU. The latest Xilinx Journal has a case study where Maxeler and JP Morgan used FPGAs to accelerate financial analytics. FPGAs do not have the overhead of processing instructions that CPUs do (there is a Microsoft Research paper for this posted in our LinkedIn group). We also have some papers which demonstrate where FPGAs can be used very well in financial applications. I do agree that a hybrid approach (FPGA, GPU, CPU) is best.
Comparing an FPGA based NIC to a standard high performance NIC is comparing apples and oranges. The FPGA will have done a lot of processing before it hands off the data to the OS. I think this is a paradigm shift that will take time to adjust.
From a Linked In group discussionNOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!