How important is file access in supercomputing for HFT?
I am about to develop a new supercomputing architecture and would like to know, how important the load and store operations of files are. E.g. is it more or less unimportant because the datasets are small (short loading time) regarding calculation time ? Or would it be a great benefit if you would have direct access to the harddrives ?
What storage capacity is needed if we go torwards Exaflops Computers ?
Thanks a lot for your inputs …
HPC isn’t one area, with one set of characteristics. Some areas need very little IO. Some need lots, in large gulps. Others need lots, in a jillion smaller gulps. Storage capacity, likewise. Flexibility is key, but if it costs, beware; much HPC is very cost-sensitive (some other parts aren’t).
Could extreme fast access be a unique selling point for a machine? Which parts are cost-sensitive ?
It could be. But you need to do some market analysis, find out what kinds of applications need it, what kind of people buy it, etc. Finance, structural engineering, astrophysics — just to pick 3 — all have different cost sensitivities and requirements. E.g., structural engineers *must* for legal reasons use specific commercial packages, but make money so they aren’t quite as cost sensitive as, e.g., academics. Astrophysicists’ grad students roll their own, and are very cost sensitive.
Hmmm, HPC systems are at cost segments normal students cannot afford, academics use the infrastructure of their university and structural engineers cannot – in most cases – use their commercial packages as they are not HPC aware. So maybe this is a misunderstanding ? I thought of HPC systems as clusters of very powerful computers used to calculate huge amounts of data or very complex data.
Workstations with e.g. graphic cards or so, are very powerful too, but not the focus here.
Re students, the point was that the code is written and rewritten from scratch. No, students don’t buy them. The guys who buy them are the ones sensitive to cost.
Re structural, think crash simulation – repeated from many angles. Bridges. Tall buildings. Wind loads from many angles. Etc. Not done on laptops.
Like I said – market study. Also known as understanding who uses it, for what.
Maybe someone has additional info ?
Speed and load balance are key. Low-latency parallel file access is important if file I/O is a significant fraction of an application’s time. Also, there is the question of synchronization. If you are processing a lot of independent transactions, there is no synchronization, and I/O can occur willy-nilly; if synchronization is required, then you have a large number of file transactions which must be resolved together in a short time. So, between data-intensive versus compute-intensive, independent versus synchronized, you have a scattered set of solutions needed in application space. The ideal parallel I/O solution needs to be tunable to these various characteristics, and intuitive enough so that the user doesn’t need to become an expert to do so.
a small footnote comment / extension on what was said.
Your questions are really too broad to address in this forum; and suggest that you will almost certainly benefit from some in-depth review of HPC use scenarios for a better understanding. (ie, traditional hardware:software architectures; traditional use models; traditional code paradigms – what is the difference between MPI scaled out work load vs batch processing:queue management of single-thread jobs for example).
Asking, “How important is file access in supercomputing” is about as difficult to answer as “how important is file access in computing”. ie, many people use ‘(super)computers’ in ‘many’ different ways. One use scenario has critical requirements which are entirely different from another use scenario.
Some HPC scenarios are purely CPU bound with microscopic IO requirements. Others are more ‘blended’ (ie, both CPU and IO are important, and one constrains the other in terms of performance scalability). Sometimes you really care about ‘low latency interconnect’ between compute nodes; other times you could care less / and are better of spending $ on other aspects of a deployment. Similarly, certain algorithms / or use scenarios can benefit significantly from use of GPGPU type ‘coprocessors’ to further enhance processing capacity; but these are by no means a ‘magic bullet’ that works for all workloads. “Goodness of fit” for a given workload will typically drive design of a function-targeted HPC environment that is optimized to the various aspects of a given workflow / algorithm / use scenario.
At the end of the day, HPC (at least in some use scenarios 🙂 is traditionally about a way to (sort of) provide a compute platform which has usability features of an (unattainably) large SMP system (ie, large ‘shared memory’ and ‘lots / unlimited CPU cores’ – and ‘lots of storage’ to facilitate IO to feed all those CPUs) — but which can be scaled beyond the constraints of SMP server platforms (8 socket ..? N socket, where N<~20?) server platforms / and normally which has better price curve than buying big-iron SMP server hardware.
Ultimately: To develop a ‘new supercomputing architecture’ as your post suggests you are setting out to do – is an ambitious goal – and to be able to do so effectively will require significant guidance / orientation taken from a ‘very well grounded understanding in current ‘standard practices of HPC implementation’
Hope this helps clarify a tiny bit.
try to find information about the San Diego “Gordon” supercomputer. That one has Flash memory for the disk drives, so clearly they thought file access was important. I work at a center where the clusters have the Lustre shared file system. That takes a lot of tuning, and sometimes user misbehave, so again, file access can be very important. Having seen some of the ways in which users can misbehave, I can tell you that you can’t design an architecture that magically solves all those problems. You need to set up something that has enough capacity, and then keep
The key is to recognize that latency and bandwidth are separate parameters. Many HPC
applications require high bandwidth but not low latency, and few require low latency and
high bandwidth. And very few require sustained amounts of both. It’s why the old Cray’s
got along fine with relatively small SSDs that were high bandwidth and low latency. Few
applications required a large amount of both.
Note that the sizes are radically different now — a Cray SSD of 1 GB was huge when memory was 8MW, but in those days a Cray memory was a handful of CPU cycles in latency.
These days 1TB of SSD is huge, but even memory is hundreds or thousands of CPU
cycles in latency. I’d gladly take the balance of a Cray X-MP or Y-MP if I could get
anywhere close to the cycle time of a modern CPU.
1.) When then one gets very close to the bare harddrive(s) this would be sufficient for nearly all applications ?
2.) What’s about caching of data nearly at the processing elements – is this meaningful ?
3.) Talking about safety: is a raid combination needed (raid1, raid5, raid6) ?
4.) What is FAST to you ? What orders of time we are talking about ?
What I didn’t get is the synchronization thing. What does this mean?
NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!