Reasons to use Fastflow for C++ HFT multihreading over TBB, CILK, and Open MPI,CUDA compared to high end server

(Last Updated On: November 12, 2012)

Reasons to use Fastflow for C++ HFT multihreading over TBB, CILK, and Open MPI,CUDA compared to high end server

Most of these references came from http://calvados.di.unipi.it/dokuwiki/doku.php?id=ffnamespace:about

Get more development news on this HFT platform as I build this thing

Goto end for them

Most useful tutorial is the last link this post!

Click to access 2012_distr_ff_cgsymph.pdf

Despite being very e-
cient on some classes of applications, OpenMP and MPI share a common set of
problems: poor separation of concerns among application and system aspects,
a rather low level of abstraction presented to the application programmers and
poor support for really ne grained applications are all considerations hindering
easy use of MPI and OpenMP. Actually, it is not even clear yet if the mixed
MPI/OpenMP programming model always o
ers the most e
ective mechanisms
for programming clusters of SMP systems [4].

l. A ff_dnode cannot have
both external input and output channels at the same time since the minimal pure
FastFlow application is composed of at least 2 nodes (a pipeline of two sequen-
tial nodes or a farm with an Emitter node and a sequential worker node).

In FastFlow, we used ZeroMQ as the external transport for the ff_dnode
concurrent entity. I

Note Java has garbage collector but:
In general, lock-free
dynamic concurrent data structures that use CAS operations should be sup-
ported by safe memory reclamation techniques in programming environments
without automatic garbage collection

Click to access 2012_spsc_europar.pdf

Click to access 2011_fastflow_acc_europar.pdf

FastFlow is a C++ parallel programming framework aimed at simplifying the development of ecient ap-
plications for multi-core platforms. The key vision of FastFlow is that ease-of-development and runtime
eciency can both be achieved by raising the abstraction level of the design phase, thus providing devel-
opers with a suitable set of parallel programming patterns that can be eciently compiled onto the target
The word accelerator is often used in the context of hardware accelerators. Usually accelerators feature
a di
erent architecture with respect to standard CPUs and thus, in order to ease exploitation of their
computational power, speci c libraries are developed. In the case of GPGPUs those (low-level) libraries
include Brook [13], NVidia CUDA, and OpenCL. At a higher-level, Ooad [14] enables ooading of parts
of a C++ application, which are wrapped in ooad blocks, onto hardware accelerators for asynchronous
execution; OMPSs [15] enables the ooading of OpenCL and CUDA kernels as an OpenMP extension [16].
FastFlow, in contrast with these frameworks, does not target speci c (hardware) accelerators but realizes a
****virtual accelerator**** running on the main CPUs and thus does not require the development of speci c code.
**** pg 36 on highly IMPRESSIVE performance with Teslsa C0250

Click to access 2012_IPTA_Aldinucci.pdf


Click to access parco.pdf

The shift from 1 Gbit to 10 Gbit networks has pushed hardware manufacturers to find
new solutions for exploiting multicore architectures. The first goal is to use all the available cores for improving packet receive/transmission. For this reason, modern network
adapters feature multi-queue RX/TX, so that a physical network adapter is logically partitioned into several logical adapters each sharing the same MAC address and Ethernet port. Incoming packets are decoded in hardware, and a hash value based on various
packet fields such as IP address, protocol and port is computed. Based on the hash value,
the network adapter places the packets in a specific queue. This way the kernel can simultaneously poll and transmit packets from each queue, thus maximizing the overall
performance. Unfortunately the operating systems are not mature enough to exploit this
feature, as they do not expose queues to user-space applications thus limiting them to
viewing the network adapter as a single entity. The outcome is that fetching packets in
parallel from the network adapter is not possible unless applications can directly access
the various queues. PF_RING [8] is a packet processing framework that we have developed that implements various mechanisms for enhancing packet processing and that also
allows applications to natively access adapters’ queues. Recently PF_RING has been enhanced with support of 10 Gbit DNA (Direct NIC Access) that allows applications to receive/transmit packets while completely bypassing the kernel, as they can access directly
the queues that have been previously mapped into user space memory. This means that
the cost of receiving/transmitting a packet is basically the cost of a memory read/write,
thus making 10 Gbit wire-rate packet RX/TX now manageable using commodity network adapters.

Important tutorials for Fast Flow

Click to access TR-12-04.pdf

Click to access 2011_FF_tutorial-draft.pdf

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!

Subscribe For Latest Updates

Sign up to best of business news, informed analysis and opinions on what matters to you.
Invalid email address
We promise not to spam you. You can unsubscribe at any time.


Check NEW site on stock forex and ETF analysis and automation

Scroll to Top