HFT architecture considerations with Fastflow in C++

(Last Updated On: November 21, 2015)

View the video to see my logic instead of potentially bad typing

NOTE: In my video, I do address FPGA/GPU like performance through software accelerator. I also address the secret sauce tricks of Goldman Sachs system wide secDB high performance risk management.

My notes from Fastflow tutorial with Redis

Download from the package at http://sourceforge.net/projects/mc-fastflow/

This is from the fftutorial.pdf

P14 for node management

Figure 3.3 shows farms with feedback (collection) not W is worker, E is emitter while C is collector

Input stream pg 20 (hello_farm2.cpp) i.e. stage 1 for input from Redis, stage 2 for algo, stage 3 for trading decision

Or pg 21 with emitter and collector defined

No collector with the main memory or send them to the next stage (in case the farm is

in a pipeline stage) provided that the next stage is de_ned as ff_minode (i.e.

multi-input node).  Pg 22 hello_farm4.cpp


3.5 Feedback channels p 27


3.6 Mixing farms pipelines and feedbacks

FastFlow pipeline, task-farm skeletons and the feedback pattern modi_er can be

nested and combined in many di_erent ways. Figure 3.4 sketches some of the

possible combinations that can be realised in a easy way.


**** 3.7 Software accelerators like an FPGA

Using FastFlow accelerator mode is not that di_erent from using FastFlow

to write an application only using skeletons (see Fig. 3.5). The skeletons must

be started as a software accelerator, and tasks have to be o_oaded from the

main program. A simple program using the FastFlow accelerator mode is shown

below: see pg 30 accelerator.cpp

Could use img_farm+pipe.cpp or img_pipe+farm.cpp from figure 3.6


On pg 40:

The next step is to reduce the number of resources used. For example the farm

Emitter can be used to read _les from the disk, whereas the farm Collector for

writing _les to the disk. Furthermore, the blur and emboss _lters may be computed

sequentially using a single workers. This is the so called “normal form”

obtained optimising the resource usage. img farm . cpp
*** For fastest processing focus on those patterns that are stateless as map on pg 44 explains

Parallel_for maybe more powerful than how Matlab does it with more options

Pg 48 with ParallelForReduce shows how to use math routines like summary of array

ff_Map for FPGA like

pg 52 Why use the ff_Map instead of using directly a ParallelFor in a sequential


Pg 52 uses matrix multiplication matmul.cpp

P 43 mandel.cpp has image processing

P 57 sobel.cpp uses image


******* P60 ff_mdf uses graph instructions (just FYI: Goldman Sachs system wide on enterprise secDB works the same (hm………..) as in figure 5.1  à creating graph tasks on p61 hello_mdf.cpp

P63 block based matrix multiiplciation on could be used for complex matrix with linear algebra techniques (????)


