Tag Archives: Open MPI

How to build Beowulf cluster aks HPC for Open MPI C++ apps like HFT with R with RCPP RInside package on Debian Linux #rstats #linux

How to build Beowulf cluster aks HPC for Open MPI C++ apps like HFT with R with RCPP RInside  package on Debian Linux #rstats #linux

Install Debian Etch from: and follow first CD install instructions:

http://www.debian.org/CD/http-ftp/

Install Debian Virtual Machines in Virtual Box as usual

http://www.wired.com/geekdad/2012/02/debian-linux-on-oracle-virtualbox/

Go with all default options.  You can always clone  a second  client VM to create a slave.

Set up hostname and configure network via: (choose the static address  not DHCP)

http://www.debian-administration.org/articles/254

Note that you could loose network connectivity soo add an additional Network Adapter with Bridged Adapter for each VM. within the Virtual Box setting. so see next step

Ensure each VM has network adapter setting to Virtual Box Host Only

http://christophermaier.name/blog/2010/09/01/host-only-networking-with-virtualbox

Create NFS for both virtual machines as server and client:

http://www.howtoforge.com/nfs-server-and-client-debian-etch

Note: When  editing exports on the server with:

/home           192.168.0.101(rw,sync,no_root_squash)

/var/nfs        192.168.0.101(rw,sync)

 

Note that 101 is the client!. ALso, I have when I try to auto mount during boot on the client, it would hang. I just created a shell script to do the mounts manually when you boot.  I just called this /etc/mountmaster.sh

 

If you need to add users do:

http://www.debian-administration.org/articles/2

You should be able to proceed with the following set up of your Beowulf cluster:

http://www.debian-administration.org/articles/2

To understand nad configure ssh connections with no passowrd, use:

SSH your Debian servers without password

This could be useful http://wiki.debian.org/SSH but note it uses RSA not DSA

If confused, use this next one but remember the ssh server is the Open MPI slave while client is the Open MPI since it logins into the Open MPI slave.

http://www.howtoforge.com/set-up-ssh-with-public-key-authentication-debian-etch

 

 

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

How to set up a C++ Beowulf cluster for Open MPI for Rinside with VirtualBox and Ubunutu Linux #linux

How to set up a C++ Beowulf cluster for Open MPI for Rinside with VirtualBox and Ubunutu Linux

It looks like to implement the RInside C++ call into the FastFlow will not be fun. As a result, I am revisiting OpenMPI as RInside’s sample code already includes a sample with it implemented. I now focus how build how to build a virtual home network Beowulf cluster with VirtualBox:

1. Set each virtual machine with Ubuntu Server slave nodes and a Ubuntu Desktop for your master node.

2. Set each with a static IP in Virtual Box.

http://www.coding4streetcred.com/blog/post/VirtualBox-Configuring-Static-IPs-for-VMs.aspx

3. Do the rest here:

http://byobu.info/wiki/Building_a_simple_Beowulf_Like_Cluster_with_Ubuntu

Find out more what I do

 

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

HFT C++ larency kiss of death is multi threading blocking in Open MPI while FastFlow uses non blocking

HFT C++ larency kiss of death is multi threading blocking in Open MPI while FastFlow uses non blocking

For those struggling to understand what blocking really when it comes to multithreading, you would want to read this:

. This processor first prints its own greeting, then polls successive processors – waiting to receive a message from each one. Only when the message is received does proccessor 0 move on. Using the MPI_Send and MPI_Recv commands blocks program execution. This blocking is illustrated graphically by inserting a long loop in the code, causing one of the processors to take a long time to complete its tasks. The cost of this structure is added syntax.

http://hamilton.nuigalway.ie/teaching/AOS/NINE/mpi-first-examples.html

This is when you use something like Open MPI so this is add latency when it comes to high frequency trading. As a result, this is the reason why I like FastFlow since it gets around this block issue.

Find out more what I can do with these techniques in my future HFT platform

 

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Reasons to use Fastflow for C++ HFT multihreading over TBB, CILK, and Open MPI,CUDA compared to high end server

Reasons to use Fastflow for C++ HFT multihreading over TBB, CILK, and Open MPI,CUDA compared to high end server

Most of these references came from http://calvados.di.unipi.it/dokuwiki/doku.php?id=ffnamespace:about

Get more development news on this HFT platform as I build this thing

Goto end for them

Most useful tutorial is the last link this post!

http://calvados.di.unipi.it/storage/paper_files/2012_distr_ff_cgsymph.pdf
Despite being very e-
cient on some classes of applications, OpenMP and MPI share a common set of
problems: poor separation of concerns among application and system aspects,
a rather low level of abstraction presented to the application programmers and
poor support for really ne grained applications are all considerations hindering
easy use of MPI and OpenMP. Actually, it is not even clear yet if the mixed
MPI/OpenMP programming model always o
ers the most e
ective mechanisms
for programming clusters of SMP systems [4].

l. A ff_dnode cannot have
both external input and output channels at the same time since the minimal pure
FastFlow application is composed of at least 2 nodes (a pipeline of two sequen-
tial nodes or a farm with an Emitter node and a sequential worker node).

In FastFlow, we used ZeroMQ as the external transport for the ff_dnode
concurrent entity. I

Note Java has garbage collector but:
In general, lock-free
dynamic concurrent data structures that use CAS operations should be sup-
ported by safe memory reclamation techniques in programming environments
without automatic garbage collection
http://calvados.di.unipi.it/storage/paper_files/2012_spsc_europar.pdf
http://calvados.di.unipi.it/storage/paper_files/2011_fastflow_acc_europar.pdf
FastFlow is a C++ parallel programming framework aimed at simplifying the development of ecient ap-
plications for multi-core platforms. The key vision of FastFlow is that ease-of-development and runtime
eciency can both be achieved by raising the abstraction level of the design phase, thus providing devel-
opers with a suitable set of parallel programming patterns that can be eciently compiled onto the target
platforms
….
The word accelerator is often used in the context of hardware accelerators. Usually accelerators feature
a di
erent architecture with respect to standard CPUs and thus, in order to ease exploitation of their
computational power, speci c libraries are developed. In the case of GPGPUs those (low-level) libraries
include Brook [13], NVidia CUDA, and OpenCL. At a higher-level, Ooad [14] enables ooading of parts
of a C++ application, which are wrapped in ooad blocks, onto hardware accelerators for asynchronous
execution; OMPSs [15] enables the ooading of OpenCL and CUDA kernels as an OpenMP extension [16].
FastFlow, in contrast with these frameworks, does not target speci c (hardware) accelerators but realizes a
****virtual accelerator**** running on the main CPUs and thus does not require the development of speci c code.
—-
**** pg 36 on highly IMPRESSIVE performance with Teslsa C0250
http://calvados.di.unipi.it/storage/talks/2012_IPTA_Aldinucci.pdf
—-
http://luca.ntop.org/parco.pdf
The shift from 1 Gbit to 10 Gbit networks has pushed hardware manufacturers to find
new solutions for exploiting multicore architectures. The first goal is to use all the available cores for improving packet receive/transmission. For this reason, modern network
adapters feature multi-queue RX/TX, so that a physical network adapter is logically partitioned into several logical adapters each sharing the same MAC address and Ethernet port. Incoming packets are decoded in hardware, and a hash value based on various
packet fields such as IP address, protocol and port is computed. Based on the hash value,
the network adapter places the packets in a specific queue. This way the kernel can simultaneously poll and transmit packets from each queue, thus maximizing the overall
performance. Unfortunately the operating systems are not mature enough to exploit this
feature, as they do not expose queues to user-space applications thus limiting them to
viewing the network adapter as a single entity. The outcome is that fetching packets in
parallel from the network adapter is not possible unless applications can directly access
the various queues. PF_RING [8] is a packet processing framework that we have developed that implements various mechanisms for enhancing packet processing and that also
allows applications to natively access adapters’ queues. Recently PF_RING has been enhanced with support of 10 Gbit DNA (Direct NIC Access) that allows applications to receive/transmit packets while completely bypassing the kernel, as they can access directly
the queues that have been previously mapped into user space memory. This means that
the cost of receiving/transmitting a packet is basically the cost of a memory read/write,
thus making 10 Gbit wire-rate packet RX/TX now manageable using commodity network adapters.

Important tutorials for Fast Flow
http://calvados.di.unipi.it/storage/paper_files/TR-12-04.pdf
http://calvados.di.unipi.it/storage/paper_files/2011_FF_tutorial-draft.pdf

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

My idea demo video of Open MPI with GPU CUDA on 3000 cores for under $1000? Redis high speed in memory database. R rules!!

HI there

5. All the major components are identified for this potentially potent open source based HFT system. The next big struggle is bringing in the highly expensive strategy and modelling algorithms like Bayesian or Markov Chains Monte Carlo. If you are new to this, watch this video
So you can see I am sending out lots of free research and external presentations that are worth thousands for each person that visits. I am very close to a point where things will go proprietary with my future operation. This will mean the QuantLabs.Net Premium Membership rate will be going up quite a bit. Hey…I may even make it exclusive by invite only for those on my well connected team. Or I may even close it off sometime if this turns into something profitable. In short, join this membership while you can as well as while it is affordable.
Hope this helps you out
Thanks Bryan

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Ultimate secret HFT revealed! Youtube demo of C++ Open MPI possible with Nvidia CUDA GPU boards of 3072 cores priced $999 #hft

Ultimate secret HFT revealed! Youtube demo of C++ Open MPI possible with Nvidia CUDA GPU boards of 3072 cores priced $999 #hft

Who would think I would have reached this point. Video makes reference to the following link:

https://quantlabs.net/blog/2012/11/holy-hft-c-open-mpi-possible-with-nvidia-cuda-boards-of-3072-cores-priced-999-cuda-openmpi-hft/

So? Get more dirty little details through my FREE frequenctly sent newsletter to build this stuff!

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

I hope to get to open source Open MPI demos today today for my upcoming C++ high frequency trading aka HFT platform #opensource #hft #cpp

I hope to get to open source Open MPI demos today today for my upcoming C++ high frequency trading aka HFT platform #opensource #hft #cpp

I want to do some more Youtube videos of this thing. It looks popular but one note I can make is that there are comparisons between this and Hadoop. Is this the way to develop HFT platform you think?

See my developments througm y frequently sent FREE newsletter 

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Youtube video on open mpi open source project vs mpi for C++ and Linux secret sauce for propietary HFT hedge funds

Here are some tutorial and video at end.

Join my march to HFT platform development

Open MP tutorial:

http://bisqwit.iki.fi/story/howto/openmp/

Decent comparison:

http://mc.stanford.edu/cgi-bin/images/7/78/Hybrid_MPI_openMP.pdf

Often hybrid programming (MPI+OpenMP) slower than pure MPI

– why? (pg 5)

–> USE OPEN-MPI.ORG <-

http://stackoverflow.com/questions/2427399/mpich-vs-openmpi

Standard MPI tutorial:

https://computing.llnl.gov/tutorials/mpi/

http://www.lam-mpi.org/tutorials/ <– Includes LAM for clustering machines within Linux and MPI

LAM MPI is abandonded

http://stackoverflow.com/questions/8770005/differences-between-lam-mpi-and-openmpi

OPEN MPI

http://hpcprogrammer.com/mpi-vs-openmp

However, if you are going to launch a single job across multiple nodes,
MPI is the de facto standard for parallelizing on clusters…

. Conversely, with MPI the entire code is launched
on each node and you control what each code executes based its node
number in the MPI universe along with an algorithm that distributes work,
eg, a master/slave model.

http://lists.apple.com/archives/mt-smp/2004/Mar/msg00002.html

 

MPI is based on local memory and message passing, and is intended for problems where moving data around is a key part of the domain. High-performance computing is very much about taking the dataset for a problem, and splitting it up among a host of compute resources. And that is pretty hard work in a message-passing system as data has to be explicitly distributed with balancing in mind. Essentially, MPI can be viewed as a grudging admittance that shared memory does no

http://stackoverflow.com/questions/185444/why-is-mpi-considered-harder-than-shared-memory-and-erlang-considered-easier-wh

**MPI Seems best for our HFT needs

To be really brief, MPI is not a shared memory model and is targeted to very highly parallelized systems. OpenMP is a shared memory model (as simple pthreads) and one of its advantages is that the parallelization process is easier with respect to MPI. So it’s harder to convert a serial program into a MPI parallelized version, but if you’d plan to run the program on thousands of nodes, you’ll probably have better performance with MPI.

http://askubuntu.com/questions/145119/what-is-the-difference-mpi-vs-openmp

 

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Youtube video on how to install RCpp RInside on Ubuntu Linux with Open MPI for your C ++ HFT platform with R quant analytics

Youtube video on how to install RCpp RInside on Ubuntu Linux with Open MPI for your C ++ HFT platform with R quant analytics

Some bonus links:

http://quantlabs.net/r-blog/2012/10/how-to-install-r-with-rcpp-rinside-for-c-hft-with-multithreading-capabilities-for-parallelizing-with-open-mpi-in-ubuntu-linux/

http://quantlabs.net/r-blog/2012/10/how-to-upgrade-to-the-latest-r-package-in-your-ubuntu-linux-environment/

Wow. This easily the most important demo yet for my upcoming HFT platform!

Join my free frequent newsletter to get further updates of this. 

HOW DO YOU START A PROFITABLE TRADING BUSINESS? Read more NOW >>>

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!