supercomputer user when faced with migrating work to a new HPC system as your present one nears decommissioning?

(Last Updated On: April 12, 2012)

What 2 or 3 questions would be top of mind for you as a supercomputer user (researcher, maybe) when faced with migrating work to a new HPC system as your present one nears decommissioning?Assume the specs are readily available, you know the same models will be there, etc.
==L S • 1) How long is the overlap between the two systems so the issues can be worked out. 2) What is the vendor of the new system and what compilers will be used. 3) How scalable is the new machine, including the i/o system.
==I mostly agree with LS. My first questions would be about porting, although I’d worry about performance afterwards. The compilers would be my first concern. It’s surprisingly common to have non-standard-conforming code in a large, complex, system. But it’s typically not difficult to get early access to a test system that has the compilers of the production system. (the standard-conformance of the compiler may be a concern too, if you’re coding with the latest and greatest language features (e.g. Fortran 2008, C++11) Libraries are a similar concern. If you use vendor-specific libraries, you could be in for a lot of work on a new machine. MPI implementations vary too. That could affect you. Changing software may also trigger bugs in your own code that weren’t a problem before (or that you didn’t know were there). Be prepared to run whatever test cases you have, and possibly spend some time debugging.
Build tools are another concern. (I try to use GNU make as much as possible for this reason). Another question to ask is how filesystem data will be migrated, especially if you already have some kind of archive/migration system. And don’t forget to consider how the new job scheduling system will affect your work flow. If you normally submit hundreds or thousands of jobs a week, you’ve probably developed a pretty sophisticated work flow and the new system may force you to change it.
Once you’re able to get real work done on the new machine, then you can start thinking about performance. Hopefully the new machine will be faster to start with, and will be even faster once you retune your codes. There’s little worse than getting a new machine and having to work just to get back to where you were.

==Thanks for the feedback. My task is to let our users know about how the current system and new system will differ so they can prepare for the changeover. This will be very helpful.

==in gereal one need to know the difference: 1)CPU model, intel vs AMD, core count, memory size 2)interconnect gbe vs IB or 10gbe or other 3)parallel FS 4)compiler 5)work load management system, SGE, torque/maui/moab, slurm 6)OS redhat/suse/deb/window
–Here is my take as an HPC end user* Application benchmark (how much is the speed-up?)* System reliability* Support for easy of maintenance
==As an HPC user, who is also an HPC software developer and HPC cluster hardware architect, these are very interesting answers. My first reaction was: “Yes, but specifically why?” What was (were) the reason(s) behind your answer?
There are all sorts of metrics and implications associated with the answers given (contention, bandwidth, throughput). My imagination is running wild.
==As a user, I’d like to know what are the unique features of the new HPC system that I’m being asked to run my code(s) on. Yes, I’d will ask for access to a test system, but I want to understand those unique features by talking to the architects who will give me some examples. I don’t care if the features are interconnect, core based, etc. I just really want to know the ROI for my investment of time/energy in putting my code(s) there.
==Yes, there are so many metrics for an HPC implementation. But for the end users, all of them come to the bottom line of ROI & TCO:
1) The speed-up would result in productivity gain, short turn-around time, reduced time to market. Is that what the ROI about? 2) System reliability – I want the system to run trouble-free 24×7 in its lifecycle. TCO? 3) Support for easy of maintenance – Whenever there is hardware/software failure I hope I have someone to call for help, with minimum impact on my jobs. Again it’s TCO.

NOTE I now post my TRADING ALERTS into my personal FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!
Don't miss out!

You will received instantly the download links.

Invalid email address
Give it a try. You can unsubscribe at any time.


Check NEW site on stock forex and ETF analysis and automation

Scroll to Top