Tag Archives: speed

Is speed the problem for high frequency trading?

Is speed the problem for high frequency trading?

Over the last two centuries, technological advantages have allowed some traders to be faster than others. In the paper linked below we argue that, contrary to popular perception, speed is not the defining characteristic that sets High Frequency Trading (HFT) apart. HFT is the natural evolution of a new trading paradigm that is characterized by strategic decisions made in a volume-clock metric. Even if the speed advantage disappears, HFT will evolve to continue exploiting Low Frequency Trading’s (LFT) structural weaknesses. However, LFT practitioners are not defenseless against HFT players, and we offer options that can help them survive and adapt to this new environment.The Volume Clock: Insights into the High Frequency Paradigm papers.ssrn.comOver the last two centuries, technological advantages have allowed some traders to be faster than others. We argue that, contrary to popular perception, speed…
==Superior paper! Worth the read.
Check the “pack-hunter” scenario on page 7. One added thought – the packs don’t need to coordinate…cf A Krause, A Guerdjikova, and others on CBDT. i.e. herding without following the herd

 

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

How important is microprocessor speed in the HFT ecosystem?

How important is microprocessor speed in the HFT ecosystem?

Its not the speed of the individual systems but the aggregate data models speed in coming together. This is a simplistic view of a very complex set of problems with latency factors for each layer as well as each delivery channel and application.

Simultaneity and the ability to react in advance of the market-workflows by fast-telemetry and its operations opens a number of potentials and new infrastructure control issues to prevent infrastructure-based frauds. So yes to some extent the speed of the HFT system is key as is the latency of the channels it talks to the exchanges it deals with.

Our solution by the way is to provide NIST UTC Services everywhere the trading platforms are operating. This makes compliance with the OATS and basic risk processing in secured computing much easier to document from a forensic standpoint

HFT will go to GPU and/or FPGA based computers/applications soon. CPU’s cannot keep up.
4
—-
Agreed – that’s why I already have NTP running as a stand alone process in the GPU. The CUDA GPU Direct rocks!

It all depends. If your network connection is not very fast, it makes no sense to invest in sophisticated solutions such as FPGA. Definitely it is better to invest your time and money in the optimization algorithms in C/C++ and possibly fast CPU.

GPU is not a particularly good option in the HFT. With GPU you gain high throughput and the possibility of parallel computing, but if you want to have ultra-low latency definitely a better choice will be a FPGA

do you have any links/papers that compare the benefits/drawbacks of using FPGAs vs GPUs?
—-
No and don’t have time to look for them… or to argue that I’m right. You just can compare the speed single CPU core vs single core on the GPU. Next consider that the communication from the GPU (and threads) consumes a lot of time and even if the GPU is integrated with the CPU it must be manage by the chip (like C204 or C206 for the new Intel CPU’s). Be aware that your software will run on OS, which means that the access to CPU/GPU will be based on the syscalls. Also remember that the OS allocates CPU time based on the scheduler, so you can speed up your code if you isolate the code from the os scheduler. The FPGA does not have the OS and all your code is directly in hardware.. so this is the fastest solution… but as I wrote at the beginning, I don’t have the time to explain everything…

GPU is good for the HPC.. and HPC != HFT 🙂

http://www.infoq.com/presentations/LMAX

Pay your attention what they say about the cpu’s threads.

FPGA’s are the fastest right now but multi-threading is ever improving in GPU’s. I think the eventual winner will still be a GPU. If the logic changes, the FPGA still has to reconfigured. People are already selling FPGA SDK’s fairly cheaply.

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

High-Frequency Trading Leaders Forum 2011, How Speed Traders Leverage Cutting-Edge Strategies in the Post-Flash Crash World

High-Frequency Trading Leaders Forum 2011, How Speed Traders Leverage Cutting-Edge Strategies in the Post-Flash Crash World, May 23-25, New York City

For more information about this style of trading, you might be interested in attending later this May High-Frequency Trading Leaders Forum 2011, “How Speed Traders Leverage Cutting-Edge Strategies in the Post-Flash Crash World”, where a who’s who of the industry will be speaking: Adam Afshar, Hyde Park Global Investments; Alexey Goz, Algo Engineering; Andrew Kumiega, IIT Institute of Technology; Christopher Willox, Fenimore Asset Management; Edgar Perez, The Speed Traders; James Leman, Westwater Corp.; Jitesh Thakkar, Edge Financial Technologies; John Netto, M3 Capital; Jonathan Kinlay, Systematic Strategies; Mike Bellafiore, SMB Capital; Milind Sharma, QuantZ Capital Management; Professor Petter Kolm, New York University; Will Mechem, Pan Alpha Trading; and William Kenney, Knox Capital; among others.

High-Frequency Trading Leaders Forum 2011 hftleadersforum.com

High-Frequency Trading Leaders Forum 2011, ‘How Speed Traders Leverage Cutting-Edge Strategies in the Post-Flash Crash…

 

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

best language to code (C, C++ or JAva) for an app where speed and low latency are importan

 

best language to code (C, C++ or JAva) for an app where speed and low latency are importan

 

The problem is there seems to be alot of hype about FPGA. Acutally not just FPGA but competing systems in the low latency space. Various systems are quoting figures of 1/3/5 microseconds. Comparing these is not straight forward as some systems are stat arb and some OMS. Even within this category the functionality will be different so comparisons need to factor this in.

I saw an independant study on the web which implemented a solution in software and on an FPGA. The serial solution ran four times faster in software than on the FPGA. For the parallel solution, when the concurrent threads exceeded the number of available cores the FPGA solution was faster.

For an investment bank a low latency DMA/program trading solution needs to trade across multiple exchanges. In Europe there are large differences in behaviour across exchanges. The trading system needs to normalise these differences as well as support custom transformations, franchise protection checks, risk checks (client position limits, restricted stocks etc), exchange checks (eg price tolerance, tick scale etc). Normalisation requires full order management and correct handling of trade busts/cancels as well as amends and IOCs. This is a complex problem, do you (can you?) put all this in an FPGA card ? Do you use some hybrid solution with an FPGA NIC ? With old NICs and app to wire TCP times greater than 15ms I see the potential. But with O/S and BIOS tuning and latest NICs like Solarflare with app to wire times around 2 micros (and Mellanox quoting 1 usec) whats the gain ?

An object orientated system using java may have 70000 lines of app code with 30000 lines of unit test code. While the unit tests dont make the code faster they do provide confidence and help facilitate fast development and bug fixing. Given the complexity of the problem and requirements for fast code changes I believe an OO solution is best. A boundary system with multiple input and output sessions will encode/decode FIX/exchange binary messages into real objects to represent the event (eg NewOrderSingle). How would you use an FPGA NIC to do the event decoding/encoding for a hybrid system ?

I would expect FPGA to have lower outliers than a CPU based solution. But the key figure is the 90th and 95th percentiles. If you have the fastest system at the 95th percentile then all other things being equal you have the best chance of hitting the order book and successfully trading 95% of the time.

Its clear some people are building FPGA and FPGA hybrid systems. All I can say is good luck. It will be interesting to see how this evolves.

The original question for the thread is which is best for building a low latency system C, C++, java (, FPGA). As already stated by other feedback all three can be used to build a low latency system. Which is best to use is clearly debatable

Yes clearly it is a very complex system and the conversation thread has strayed very far from which language is the best … but as was suggested before that topics been beaten to death I am enjoying this conversation so:

I think the key to using FPGAs effectively is to understand which parts of the application are invariant and which require multiple modalities because of exchange or market rules etc. Depending on complexity these variations can be accommodated as modes within the FPGA or use separate FPGA program binaries if you can dedicate an FPGA resource to each market/exchange. What you want to avoid is a lot of hardware/software interaction because then you lose the speed that the hardware was supposed to deliver. Another principle of optimization is to identify the “hot” code and prioritize this for embedding into the FPGA. I look at as more of a tuning process then a once and done classic waterfall design. That’s why ideally you would have firmware engineers who could work on both sides so functionality could shift from software to hardware in a seamless frictionless way. In this way the hardware/software mix is optimized based on meeting requirements and technical analysis rather than a given team’s technical capabilities which unfortunately is the de-facto method that usually determines the mix in a real world project.

 

I agree with many of your points and others I disagree. I believe the latency numbers for software you mention are minimums (not even averages). STAC and Solarflare published latency tests herehttp://www.stacresearch.com/solarflare (free registration required). Under higher loads the latencies are much higher and maximums go into the milliseconds. I have no problem with ignoring the outliers outside of 99% or greater (personally I think this is waiting for a Black Swan to hit it) of the tests but what were the conditions under which the outliers occurred? Heavy trade volume is when things matter most and software does not fare well when compared to hardware under those conditions.

I am curious to see the paper that shows the FPGA being slower than the CPU. The latest Xilinx Journal has a case study where Maxeler and JP Morgan used FPGAs to accelerate financial analytics. FPGAs do not have the overhead of processing instructions that CPUs do (there is a Microsoft Research paper for this posted in our LinkedIn group). We also have some papers which demonstrate where FPGAs can be used very well in financial applications. I do agree that a hybrid approach (FPGA, GPU, CPU) is best.

Comparing an FPGA based NIC to a standard high performance NIC is comparing apples and oranges. The FPGA will have done a lot of processing before it hands off the data to the OS. I think this is a paradigm shift that will take time to adjust.

This has been a very useful discussion..I would like to keep this discussion going since FPGA is a HOT Topic of discussion in the industry today. As with any technology, there are always pros and cons and I think it is important to weigh the pros v/s cons and then decide on which suits the best based on the Business Model being approached. Any idea on ay future technology that is in the R&D stages that could beat out the Software & FPGA model…????

e latency numbers I mentioned were taken using 10GE Corvil device with taps on the NICs.

At 125,000 order events per second on a single socket using TCP_NODELAY the trading system 90th percentile internal latency is 6 micros with mean of 5 micros. The app to wire time is 2.5 micros (should be 2 usecs with single port 10GE solarflare). At 99th percentile the app latency increases to 15 usecs with the app to wire time of 3 usecs. At 1000 order events per second the 99th percentile is 8 micros with the app to wire time of 2.5 usecs. Note this is a full DMA FE+OM+MA trading system.

Ofcause to get these timings you have to tune the BIOS, OS (which was redhat 5.5) and jvm.

Regarding the FPGA report I will post a link when I can dig it out

ith all this talk of FPGA design I am wondering if anyone is writing custom VHDL or are most financial FPGA applications relying on vendor supplied binaries with canned off the shelf functionality. Or are people mixing vendor supplied IP for vanilla functionality with their proprietary “secret sauce”.

Having the design ability in house makes a big difference with FPGA designs. Modern FPGAs have a routing step which is actually non-deterministic so once you finalize the VHDL “circuit” the hardware designer has to start the route and verify timing. So you could end up with a technically correct VHDL that results in a binary that fails on timing.

I would imagine that a financial firm that was overly reliant on external expertise would have a hard time working with FPGAs because of issues like these. But I would be curious on any experiences that people working on financial FPGA applications care to relate.

another technologies to be aware of is CUDA – NVIDIA GPU multicore potential challenger to Intel CPU hegemony.

A major issue is cumulative risk checking (position based checking). E.g. short-sell vs loan availability or long sell vs settled position. As soon as you have multiple accounts trading large numbers of securities, and you add different types of booking (cash, CFD, Swap-settled, etc.), the business logic becomes very complex and can’t really be offloaded. Not to mention margin risk checks for derivatives trading.

A lot of solutions address this by doing minimal stateless compliance-required per-order checking in-band (restricted list, price range etc.), but doing the complex cumulative risk-checks in a separate system that simply monitors the flow without delaying orders. But a lot of brokers won’t accept the risk of being momentarily exposed and having to chase-cancel. Also, in Asia, you have to load balance across multiple market connections for all markets, so simple pass-through type systems won’t work.

There are a few very sophisticated clients that are load balancing themselves and want essentially naked access to the market. But the vast majority of electronic trading clients need the gateway to be more intelligent than a simple FIFO queue with some stateless filtering. All these headline latency claims are basically meaningless given the diversity of solutions in the space and the lack of transparency about what is being measured at what rate in what environment with what market.

We’ve looked at FPGAs but our judgment is that iWarp gets you most of what you would get with FPGA acceleration. The core application will still need to be done in software with an OS. Especially if you also add replication & failover to the requirements.

On the original topic, I think Java is doable but requires a lot of work, and unless you need to express your business logic in Java you’re probably better off with C++.

There’s also the issue of FIX. The most demanding clients hate it, and want to use market-native. But you have to enrich that for account/strategy info etc. typically. Also any pass-back parameters (e.g. parent order IDs). So you end up with e.g. OUCH+ or Arrowhead+, which isn’t really any kind of standard, so cue broker lock-in. But if you do minimal message transformation, minimal compliance risk checks, and pass through messages essentially without significant transformation, you can get down to single microseconds, but that’s a very niche solution that’s attractive to a very small number of players. And the more generic solutions are not far behind anyway in terms of latency, but have much richer functionality

 

From a Linked In group discussion

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Quant development: which is the best language to code (C, C++ or JAva) for an app where speed and low latency are important

Quant development: which is the best language to code (C, C++ or JAva) for an app where speed and low latency are important

FPGA stands for Field Programmable Gate Array. You can use the FPGA gates to create serial or parallel logic So there is nothing inherently parallel about FPGA design. Perhaps you’re thinking of CUDA?. Many large FPGAs have sufficient capacity to embed an ARM or PPC CPU as powerful as a desktop CPU and still have capacity to spare.

As a sidebar the important thing about the FPGA approach gives you fine control over what is done in software versus hardware. To use the FPGA to its fullest potential you need firmware engineers who are comfortable hopping back and forth between FPGA (VHDL or Verilog) and traditional software development. As you can imagine these individuals are hard to find especially given financial problem domain knowledge requirements.

I am relatively new to finance but when I worked in telecom we did a lot with FPGAs. I never got to design the VHDL because I was a software guy and the engineering org was heavily “siloed” (separate hardware and software managerial reporting structures). The silo approach makes sense given the labor market realities and the complexities of mastering either FPGA or software design. But it lengthens the design cycle since now you spend time negotiating with the hardware guys to get the important functions embedded. I have an EE background so I was pretty good at explaining to them what I needed. When you have a system with an optimized balance of hardware and software it’s a thing of beauty. Much better than spending CPU cycle to cover up the flaws in the hardware design!

For example in a high frequency trading application you could embed the FIX packet processing in the FPGA and have the FPGA generate a PCI interrupt once certain interesting packets arrived or better yet DMA the interesting packets into circular buffer in host memory. These are the sorts of tricks we did with FPGAs in telecom.

The problem is there seems to be alot of hype about FPGA. Acutally not just FPGA but competing systems in the low latency space. Various systems are quoting figures of 1/3/5 microseconds. Comparing these is not straight forward as some systems are stat arb and some OMS. Even within this category the functionality will be different so comparisons need to factor this in.

I saw an independant study on the web which implemented a solution in software and on an FPGA. The serial solution ran four times faster in software than on the FPGA. For the parallel solution, when the concurrent threads exceeded the number of available cores the FPGA solution was faster.

For an investment bank a low latency DMA/program trading solution needs to trade across multiple exchanges. In Europe there are large differences in behaviour across exchanges. The trading system needs to normalise these differences as well as support custom transformations, franchise protection checks, risk checks (client position limits, restricted stocks etc), exchange checks (eg price tolerance, tick scale etc). Normalisation requires full order management and correct handling of trade busts/cancels as well as amends and IOCs. This is a complex problem, do you (can you?) put all this in an FPGA card ? Do you use some hybrid solution with an FPGA NIC ? With old NICs and app to wire TCP times greater than 15ms I see the potential. But with O/S and BIOS tuning and latest NICs like Solarflare with app to wire times around 2 micros (and Mellanox quoting 1 usec) whats the gain ?

An object orientated system using java may have 70000 lines of app code with 30000 lines of unit test code. While the unit tests dont make the code faster they do provide confidence and help facilitate fast development and bug fixing. Given the complexity of the problem and requirements for fast code changes I believe an OO solution is best. A boundary system with multiple input and output sessions will encode/decode FIX/exchange binary messages into real objects to represent the event (eg NewOrderSingle). How would you use an FPGA NIC to do the event decoding/encoding for a hybrid system ?

I would expect FPGA to have lower outliers than a CPU based solution. But the key figure is the 90th and 95th percentiles. If you have the fastest system at the 95th percentile then all other things being equal you have the best chance of hitting the order book and successfully trading 95% of the time.

Its clear some people are building FPGA and FPGA hybrid systems. All I can say is good luck. It will be interesting to see how this evolves.

The original question for the thread is which is best for building a low latency system C, C++, java (, FPGA). As already stated by other feedback all three can be used to build a low latency system. Which is best to use is clearly debatable !

Yes clearly it is a very complex system and the conversation thread has strayed very far from which language is the best … but as was suggested before that topics been beaten to death I am enjoying this conversation so:

I think the key to using FPGAs effectively is to understand which parts of the application are invariant and which require multiple modalities because of exchange or market rules etc. Depending on complexity these variations can be accommodated as modes within the FPGA or use separate FPGA program binaries if you can dedicate an FPGA resource to each market/exchange. What you want to avoid is a lot of hardware/software interaction because then you lose the speed that the hardware was supposed to deliver. Another principle of optimization is to identify the “hot” code and prioritize this for embedding into the FPGA. I look at as more of a tuning process then a once and done classic waterfall design. That’s why ideally you would have firmware engineers who could work on both sides so functionality could shift from software to hardware in a seamless frictionless way. In this way the hardware/software mix is optimized based on meeting requirements and technical analysis rather than a given team’s technical capabilities which unfortunately is the de-facto method that usually determines the mix in a real world project.

I agree with many of your points and others I disagree. I believe the latency numbers for software you mention are minimums (not even averages). STAC and Solarflare published latency tests here http://www.stacresearch.com/solarflare (free registration required). Under higher loads the latencies are much higher and maximums go into the milliseconds. I have no problem with ignoring the outliers outside of 99% or greater (personally I think this is waiting for a Black Swan to hit it) of the tests but what were the conditions under which the outliers occurred? Heavy trade volume is when things matter most and software does not fare well when compared to hardware under those conditions.

I am curious to see the paper that shows the FPGA being slower than the CPU. The latest Xilinx Journal has a case study where Maxeler and JP Morgan used FPGAs to accelerate financial analytics. FPGAs do not have the overhead of processing instructions that CPUs do (there is a Microsoft Research paper for this posted in our LinkedIn group). We also have some papers which demonstrate where FPGAs can be used very well in financial applications. I do agree that a hybrid approach (FPGA, GPU, CPU) is best.

Comparing an FPGA based NIC to a standard high performance NIC is comparing apples and oranges. The FPGA will have done a lot of processing before it hands off the data to the OS. I think this is a paradigm shift that will take time to adjust.

From a Linked In group discussion

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Automate and speed is the key to generating millions with your trading system

Here is a conversation on reddit.com which pretty well inspired me tostart thinking about blending my knowledge of technology and financial investment. This was pretty inspiring where an indie software engineer launched his own business. He is now clearing millions. Source is http://www.reddit.com/r/IAmA/comments/9s9d7/iama_100_automated_independent_retail_trader_i/

whatswith_this 1 point2 points3 points 6 months ago[+] (3 children)

Top of Form

Top of Form

bolln 1 point2 points3 points 6 months ago[+] (1 child)

Top of Form

locktight 0 points1 point2 points 6 months ago[+] (5 children)

locktight 0 points1 point2 points 6 months ago[-]

Top of Form

If you’re constantly adjusting the strategies, how can this be considered automated?
Do you recognize when it’s time to change or is this automated too?

Bottom of Form

mejalx [S] 2 points3 points4 points 6 months ago[+] (4 children)

mejalx [S] 2 points3 points4 points 6 months ago[-]

Top of Form

Close to everything is automated, but two of my strategies rely on manual parameter fine tuning throughout the day. The rest are completely self sufficient. One thing I haven’t been able to properly implement is dynamic capital allocation to the various systems. It’s something I also do manually.

Bottom of Form

locktight 1 point2 points3 points 6 months ago[+] (3 children)

locktight 1 point2 points3 points 6 months ago[-]

Top of Form

Would your system written in Machine Code be an improvement to performance? Are quants currently operating in machine code?

thank you

Bottom of Form

mejalx [S] 1 point2 points3 points 6 months ago* [+] (2 children)

mejalx [S] 1 point2 points3 points 6 months ago* [-]

Top of Form

I personally haven’t touched Assembly. I usually try to get away with the least amount of development time as possible, which means I’ll start with a high level language like python and then drop to C++ if need be. I’ve spent significant time optimizing my C++ code. I know that some firms will write their own machine code on specialized hardware.

Bottom of Form

locktight 0 points1 point2 points 6 months ago[+] (1 child)

locktight 0 points1 point2 points 6 months ago[-]

Top of Form

Do you hold positions overnight?

Bottom of Form

mejalx [S] 0 points1 point2 points 6 months ago* [+] (0 children)

mejalx [S] 0 points1 point2 points 6 months ago* [-]

Top of Form

Yes on one strategy.

Bottom of Form

cartooncorpse -1 points0 points1 point 6 months ago[+] (8 children)

cartooncorpse -1 points0 points1 point 6 months ago[-]

Top of Form

It’s amazing how many people in this thread are ravenous to become unproductive. go for the ‘easy’ money. just the sort that will get taken by ‘people’ such as the author of this thread.

Bottom of Form

mejalx [S] 6 points7 points8 points 6 months ago[+] (1 child)

mejalx [S] 6 points7 points8 points 6 months ago[-]

Top of Form

It’s anything but easy money.

Bottom of Form

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!