Tag Archives: AMP

CUDA, MS C++ AMP, Windows 8, Server 2012 updates, Ernie Chan video posted, new membership rate increases set for Jan 14, 2013

Hi there

I have played with the new Windows 8 and Server 2012. I also posted Ernie Chan’s video. I have been busy but I am still looking for my model development breakthrough. I may have found it. I will reveal that within 24 hours.

If this proves itself, expect that QuantLabs.net Premium 50% increase come Jan 14, 2013.
1. 50% increase coming my Quant Membership for HFT and trading model development! Join now for the affordable rates!
2.  Trying out the new Microsoft Windows Server 2012 for potential HFT deployment, Linux too complicated and no standards
Decent but not quite primetime for HFT but it will be at some point!
3. For HFT: Trying out Microsoft Windows 8 trial for AMP capabilities for GPU debug preview within Visual Studio 2012 C++
Stick with CUDA as AMP has been proved to be slow
3. How to use Microsoft AMP for Windows concurrency with Fast Math using Visual Studio 2012 and C++
4.  Back to GPU CUDA again and Visual Studio 2010 as C++ AMP is slow and other 2012 problems
5. Youtube video of Ernie Chan online presentation of Pitfalls of Backtesting during Trading Meetup
So get in on the action now  which will give you access to my private Webinar for Quantlabs.Net members come Dec 18!
Here are those member benefits 
Got questions or comments, let me know
Thanks
Bryan

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Back to GPU CUDA again and Visual Studio 2010 as C++ AMP is slow and other 2012 problems

Back to GPU CUDA again and Visual Studio 2010 as C++ AMP is slow and other 2012 problems

Why do I do this to myself? You get all excited with all this latest technology. It seems CUDA rules supreme as I posted some performance problems of AMP. That sucks as it looks quite easy to implement as compared to CUDA. Also, I even tried to implement the CUDA samples in Visual Studio 2012 but that failed during the conversion for some reason. I am back what I had which was CUDA samples within Visual Studio 2010. It looks like I will not be using Server 2012 or Windows 8 any time soon. Sorry folks but this newer may not be prime time despite what Microsoft claims.

Save time as build this HFT platform by joining my free newsletter

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

How to use Microsoft AMP for Windows concurrency with Fast Math using Visual Studio 2012 and C++

How to use Microsoft AMP for Windows concurrency with Fast Math using Visual Studio 2012 and C++

UPDATE:  C++ AMP is slowest compared to CUDA http://codinggorilla.domemtech.com/?p=1135

Here is a decent description of AMP:

http://msdn.microsoft.com/en-us/library/vstudio/hh265136.aspx

Here are the support Fast Math functions for AMP:

http://msdn.microsoft.com/en-us/library/vstudio/hh553048.aspx

Note that need Windows Server 2012 or Windows 8 to use GPU debugging preview in Visual Studio 2012

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

For HFT: Trying out Microsoft Windows 8 trial for AMP capabilities for GPU debug preview within Visual Studio 2012 C++

For HFT: Trying out Microsoft Windows 8 trial for AMP capabilities for GPU debug preview within Visual Studio 2012 C++

UPDATE:  C++ AMP is slowest compared to CUDA http://codinggorilla.domemtech.com/?p=1135

This is required to take advantage of this feature. As a result, I might as well try it. There is also a decent priced upgrade if this thing works but no way in hell will I upgrade my primary system with an immature Windows 8.

Let’s see what happens.

Ok. This thing is installed but it kind of looks cool but now what. How do I get into the classic desktop. I tried the lower left but nothing happens with the switch between Metro and what I am used. I could not open any of these side Windows. It is pretty crappy if I cannot use it. I think it might be wise to go to Windows Server 2012 instead as it has some stuff I am used to . I think Microsoft has a lot of work ahead to make people really use this thing. I am sure it is matter of time but that I don’t have. At this point, Windows 8 fails from my point of view. Windows Server 2012 has some hope though especially for developers.

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

For quant and HFT: Youtube video demo of Visual Studio 2012 with C++ running AMP for Accelerated Massive Parallelism

For quant and HFT: Youtube video demo of Visual Studio 2012 with C++ running AMP for Accelerated Massive Parallelism

https://quantlabs.net/blog/2012/12/interesting-notes-on-microsofts-amp-aka-accelerated-massive-parallelism-for-visual-c-visual-studio-2012-with-debugging-on-a-gpu/

This is pretty killer! Join my free newsletter as I build out this new custom HFT platform !

 

 

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Interesting notes on Microsoft’s AMP aka Accelerated Massive Parallelism for Visual C++ Visual Studio 2012 with debugging on a GPU!

Interesting notes on Microsoft’s AMP aka Accelerated Massive Parallelism for Visual C++ Visual Studio 2012 with debugging on a GPU!

Start with this easy example:

http://blogs.msdn.com/b/nativeconcurrency/archive/2012/10/01/string-search-sample-with-c-amp.aspx

Moving average demo which started this (also note about the new feature of lambdas in Visual C++)

http://www.drdobbs.com/windows/microsofts-c-amp-unveiled/231600761

Debugging a GPU in this slideshow:

http://www.microsoft.com/en-us/download/details.aspx?id=28114

Some other examples:

http://www.developerfusion.com/article/132336/massive-data-parallelism-on-the-gpu-with-microsofts-c-amp-accelerated-massive-parallelism/

Much easier development and more advanced than Nvidia’s CUDA!

Whoa! This is pretty killer! Join my free newsletter as I build out this new custom HFT platform !

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Shared memory problem in a software AMP configuration

Shared memory problem in a software AMP configuration

 

Greetings All,
I believe this is not the right forum to discuss this issue, but since the problem is interesting to share (and I think will be a good read for the geeks) as well as just to get some ideas as I have ran just out of ideas. I am having issue in reading the shared memory in a software amp configuration. Given is my system,
Its an embedded board PowerPC architecture having a dual core P1022 process (e500 cores – P1022RDK – 512 MB shared memory). I am running the system in soft AMP configuration. I am loading one linux kernel on core 0 and another linux kernel on core 1. All the hardware is perfectly partitioned among both cores. I have allocated initial some MBs of memory to core 0 and rest of some MBs to core 1 kernel, and small portion of memory is shared among both cores for IPC.
Now my problem is when I write (a simple c code which mmap shared memory to userspace) say a simple string from core 1 and read at core 0 and it get what it is supposed to get from core 1 over shared memory. But when I write from core 0 and read from core 1 it did not get what it should, but just empty / garbage memory.
In the above situation I have verified some things which makes this problem more interesting. 1: With core 0 running linux, and core 1 running baremetal application and with the help of probe I have verified core 0 is writing to correct shared memory area as I can see my string in the shared memory from core 1 (probe debugging the core1). So the problem is at reading end of the core 1. 2: My same application (the reader / writer) works perfect on another dev kit of the same processor (& cores – P1022DS – 2GB total memory) in the same amp configuration. 3: The ‘M’ bit of page data is also set at both ends (to ensure memory coherency)4: I also did a sync instruction just after write operation from core 0 just to make sure processor writes it to memory when it should but #1 confirms this also as it has written to the shared memory. 5: I think this is confirmed that the problem lies in the core 1 kernel and after looking into many things I am clueless where the problem can be (i.e in which area of the kernel).
Please share with me any idea came to your mind26 days agoLike CommentFollow Flag More5 comments Follow VijayVijay Anand • You could use a JTAG debugger (EJTAG for MIPS, don’t know the equivalent for PowerPC) to check if core0 write is visible .25 days ago• Like Follow FarrukhFarrukh Arshad • Yes, I have verified with CodeWarrior USB Tap core0 write is visible on the shared memory from core 1, but in this case I am running baremetal application on core 1 and not the linux kernel. So far I am unable to debug the Core1 linux kernel with the probe as the booting of both kernels in this AMP configuration is does not seems to be supported by the probe initialization as well as CodeWarrior IDE.24 days ago• Like Follow VijayVijay Anand • if i understand correctly, when core1 is loaded with bare metal app , core1 is able to view the core0 updates to shared memory , but when core1 is loaded with linux(instead of baremetal app) it is not able to view the core0 updates. if this is correct, then there could be a problem with the shared memory address being referred in the linux side. The mmap maynot be pointing to the correct physical address offset.22 days ago• Like Follow FarrukhFarrukh Arshad • Yes you got it right, and thats my conclusion as well so far.22 days ago• Like Follow Pradeep kumarPradeep kumar Nallimelli • What happens the other way around i.e. core 1 writes and core 0 reads. If this not works, we can get a data point that shared memory is actually not shared atleast the addresses being used.. (references are not correct). I assume when you say shared memory that is the bootmem allocated by the bootloader upfront before the linux comes up. How is the addressing happens here (something like XKPHYS in MIPS for the global memory…). Make use virt_to_phys and phys_to_virt mappings being happened at right places..

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!