MOTHER OF ALL GAWD! Fastflow C++ is blazingly fast!
I have easily built this multithreading library for:
Pattern-based multi/many-core parallel programming framework
Just download the various PDFs to read from: http://sourceforge.net/projects/mc-fastflow/files/?source=navbar
There is this as well:
This is mind bending in terms of applying to an high speed trading system. It is fast on a dinky single processor virtual machine! This does support CUDA too!
Where does Redis fit in all this?
Some of the highlights from http://calvados.di.unipi.it/dokuwiki/doku.php/ffnamespace:architecture
generates a compile time a specific streaming network based on core patterns for each pattern. In the case of
parallel_forthis network is a parametric master-worker with active or passive (in memory) task scheduler (more details in the PDP2014 paper).
Notably, FastFlow patterns are C++ class templates and can be extended by end users according to the Object-Oriented methodology.
. In general, no further synchronisation primitives are needed (e.g. locks, semaphores) even thought their usage is not forbidden (they are simply useless and a source of additional overhead). Overall, at this level, FastFlow building blocks make it possible to realise arbitrary streaming networks over lock-less channels.
In summary, the FastFlow building blocks layer realizes the two basic features:
- parallelism exploitation, i.e. the creation, destruction and life cycle control of different flows of controls, and
- asynchronous communication channels, supporting the synchronization of different flows of control.
Fastflow SPSC queues can be directly used to write parallel programs by writing a C++ program that spawns a set of
ff_nodes and orchestrates them in pairs each of them sharing (at least) a SPSC queue descriptor. Each thread in the pair has a fixed role in using the queue, either producer or consumer. The bulk of created threads can then start and eventually synchronise using SPSP queues, for example in pipeline fashion.
…FastFlow supports multiprocessors exploiting any memory consistency, including very weak consistency models. FastFlow implementation is lock-free, and for several memory consistency models is also memory fence-free (e.g., sequential consistency, total store ordering, and the x86 model). On other models (e.g., Itanium and Power4, 5, and 6), a store fence before an enqueue is needed [GMV08]….
Distributed platforms build on top of TCP/IP and Infiniband/OFED protocols are also supported. FPGA support is planned but not yet fully developed.
ALso from http://calvados.di.unipi.it/dokuwiki/doku.php/ffnamespace:faq
Can Fastflow ease the use of hardware accelerators (GPUs, etc.)?
Yes, theoretically. In designing Fastflow we envisage it also as a means of easing the high-level programming of hardware accelerators, which is currently almost a nightmare. To do that, we need to extend the generation strategy from the high-level to the low-level layer in the Fastflow stack by considering an extended low-level layer that includes accelerator instruction (or accelerator access API). Extending that generation strategy while maintaining high-performance is not trivial. FastFlow accelerator is a step ahead in this direction. Using the self-offloading technique, a FastFlow program can be almost automatically transformed into a programmable software accelerator, i.e. a device running on idle CPU cores that can be used as if it were a programmable hardware accelerator. The function to be offloaded on the FastFlow accelerator can be easily derived from pre-existing sequential code
Something trading related somehow: http://softwaretrading.co.uk/2012/03/25/weekend-linkfest-101-algorithmic-trading-blogs-parallelism-and-trend-trading/
Check out my new Fastflow video playlist
FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!