Tag Archives: GPGPU

Bye bye GPGPU and GPU? This Nvidia CUDA vs FPGA debate for real time data HFT systems

Bye bye GPGPU and GPU? This Nvidia CUDA vs FPGA debate for real time data HFT systems

I always thought they were the same but nope!! FPGA is for real time data systems. Where have I heard this before? Also, it seems I need to thank this person at the end for adding a lot of intelligence and experience to the debate. As you know, I am newbie at all this.

Thankfully I have not made a huge investment of time and dollars into CUDA

This confuses one person with this GPU vs CUDA debate.
http://stackoverflow.com/questions/317731/cuda-vs-fpga
” FPGAs are great for realtime systems, where even 1ms of delay might be too long. This does not apply in your case;
” FPGAs can be very fast, espeically for well-defined digital signal processing usages (e.g. radar data) but the good ones are much more expensive and specialised than even professional GPGPUs;

Offcourse the FPGA has also some drawbacks: IO can be one (we had here an application were we needed 70 GB/s, no problem for GPU, but to get this amount of data into a FPGA you need for conventional design more pins than available). Another drawback is the time and money. A FPGA is much more expensive than the best GPU and the development times are very high.
More comments from the previous post (this person sounds the most intelligent I have found with regards to financial HFT use)
It still has some serious issues. The best way to bypass it is by cheating a little bit and using OpenGL and CUDA at the same time. The FBO by using the OpenGL Textures (frame buffer objects) has faster access to the device. Also this is in practice the major reason that graphics running with CUDA and opengl run better than the equivalent of cuda with directx. (see the fluids demo on the nvidia projects sample) (other than directx being heavy as it is).

Even though CUDA offers some beautiful features I would highly recommend not to have your code too much sensitive to the version of your cuda sdk.

Feel free to throw any questions when you start running cuda.

You are welcome. FPGA is much better. In the previous fund I was working (market maker) we did some work on FPGA for price impact analysis on 20 levels of order book just because it is simply faster to access a hash array on a few registers.

Again it depends on the problem and how you approach it. For example solving a typical backprogation neural network (100k patterns+) with gradient descent method is too much annoying. Too many sync_barriers and the neural network is bound to stuck on some local minima,… and you will have to use some method of adaptive learning rate or pruning technique in order to get to an optimal solution. The annoying part is simply because you are using the cuda as a multithreading model. But in practice the cuda threads are so light weight that you can use them in a multi-process fashion (instead of multi-threading) so instead of solving the NN with gradient descent you could solve it with adaptive differential evolution since you would only need a single sync_barrier for every iteration. In a similar fashion you could solve an maximum likehood estimation.

Hardware, at work I have 2×590 and at home I have 3x480s.
The only reason I might move to tesla in the future is simply because they have more ram, otherwise their specs are almost the same with the ones of geforce.

Compiler wise I am using llvm. It is more standardized than the gcc. I still have though many models running on .net (primarily models of genetic programming and multi-expression programming) simply because it is faster than C++.

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

GPGPU/OpenCL/CUDA and Hadoop for HFT High Frequency Trading

GPGPU/OpenCL/CUDA and Hadoop for HFT High Frequency Trading

After reading http://hgpu.org/?p=7413 (and being interested in Hadoop quite some time), I got curious if more efforts have been done or are under development. It is clear that workers can be sped up a lot with OpenCL and alike techniques, increasing the speed of a cluster. Do note I am an OpenCL-specialist, so somewhat biased.

Do you guys know of any project where Big Data has been combined with OpenCL, CUDA, Aparapi, etc?

 

–I attended the event that @Andrew put together in NY.

@Andrew presented an overview regarding the state of play (citing some of the papers and observations made above) and also talked a little about Aparapi (which obviously made me happy šŸ™‚ ). There was some discussion about the use of Thrust (for those prepared to make the Java->JNI->Thrust->JNI->Java round trip) and of course the option of using JOCL/JCUDA for those who want to write their host code in Java but still use CUDA/OpenCL.

There was also a session from Jack Papas (one of the TidePowerd inventors) discussing his work. The third session was from Tim Childs discussing Map-Reduce + GPU from a database perspective.

I also note that Andrew is presenting a session at AFDS (www.amd.com/afds) entitled ‘CC-4344 – Hadoop and GPU Compute’. AFDS should be fun this year, we have 2 Aparapi sessions and one hands-on-lab.

 

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

what do you’ll think of a CPU, GPGPU, FPGA hybrid computing platform..any body working on something like this?

what do you’ll think of a CPU, GPGPU, FPGA hybrid computing platform..any body working on something like this?
==Have worked with custom IC design before
==Mixed processor technology (uP/FPGA/GPU) systems have the potential for delivering extraordinary performance givine their modest SWaP (size, weight, and power) footprint. The econonmics of these systems is also compelling– multi-TFLOP’s in a $10-$15,000 package is hard to ignore.
But this combination of high performance and small SWaP comes at a price; they are notoriously diffiuclt to program requiring multiple technical specialists on the application development team. Still, new products such as our hprcARCHITECT make using them much easier than with traditional developmnet approaches.
I put together a system for delivery to the US Air Force that combined a motherboard with two Intel i7 uP’s, four Xilinx FPGA’s, and two nVIDIA TESLA GPU’s. To say the very least– it screamed. It also demonstrated that there is no performance difference between Windows and Linux for high-performance applications. But Windows code is significantly easier/faster to get into production giving it an edge in total time to first solution.
==I worked on FPGA board + PC implementation before. One can easily speed up the overall system performance by moving some lower-level processing into FPGA chip. The capability of embedding a CPU inside of the FPGA and having some control program running in the FPGA also provide some further flexibility. However, debugging in this kind of system is very troublesome and it is much more likely to have bugs in a hybrid system than the regular software environment.
==I agree with James– hybrid systems will, initially, be buggy. But this can we dealt-with by choosing the right development tools and working from a good specification to begin with. Hybrid systems are no place for “hackers”– they take solid engineering and well designed tools.
More development projects are buggy due to a poorly engineered software specification (including theĀ ==
Funny that – we filed a patent on this in December of last year…

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!