Bye bye GPGPU and GPU? This Nvidia CUDA vs FPGA debate for real time data HFT systems
I always thought they were the same but nope!! FPGA is for real time data systems. Where have I heard this before? Also, it seems I need to thank this person at the end for adding a lot of intelligence and experience to the debate. As you know, I am newbie at all this.
Thankfully I have not made a huge investment of time and dollars into CUDA
This confuses one person with this GPU vs CUDA debate.
” FPGAs are great for realtime systems, where even 1ms of delay might be too long. This does not apply in your case;
” FPGAs can be very fast, espeically for well-defined digital signal processing usages (e.g. radar data) but the good ones are much more expensive and specialised than even professional GPGPUs;
Offcourse the FPGA has also some drawbacks: IO can be one (we had here an application were we needed 70 GB/s, no problem for GPU, but to get this amount of data into a FPGA you need for conventional design more pins than available). Another drawback is the time and money. A FPGA is much more expensive than the best GPU and the development times are very high.
More comments from the previous post (this person sounds the most intelligent I have found with regards to financial HFT use)
It still has some serious issues. The best way to bypass it is by cheating a little bit and using OpenGL and CUDA at the same time. The FBO by using the OpenGL Textures (frame buffer objects) has faster access to the device. Also this is in practice the major reason that graphics running with CUDA and opengl run better than the equivalent of cuda with directx. (see the fluids demo on the nvidia projects sample) (other than directx being heavy as it is).
Even though CUDA offers some beautiful features I would highly recommend not to have your code too much sensitive to the version of your cuda sdk.
Feel free to throw any questions when you start running cuda.
You are welcome. FPGA is much better. In the previous fund I was working (market maker) we did some work on FPGA for price impact analysis on 20 levels of order book just because it is simply faster to access a hash array on a few registers.
Again it depends on the problem and how you approach it. For example solving a typical backprogation neural network (100k patterns+) with gradient descent method is too much annoying. Too many sync_barriers and the neural network is bound to stuck on some local minima,… and you will have to use some method of adaptive learning rate or pruning technique in order to get to an optimal solution. The annoying part is simply because you are using the cuda as a multithreading model. But in practice the cuda threads are so light weight that you can use them in a multi-process fashion (instead of multi-threading) so instead of solving the NN with gradient descent you could solve it with adaptive differential evolution since you would only need a single sync_barrier for every iteration. In a similar fashion you could solve an maximum likehood estimation.
Hardware, at work I have 2×590 and at home I have 3x480s.
The only reason I might move to tesla in the future is simply because they have more ram, otherwise their specs are almost the same with the ones of geforce.
Compiler wise I am using llvm. It is more standardized than the gcc. I still have though many models running on .net (primarily models of genetic programming and multi-expression programming) simply because it is faster than C++.FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!