The International Symposium on Dependable Systems and Networks has named Dr Bo Fang from the University of British Columbia, Canada, winner of the 2020 William C Carter Award for the best PhD Dissertation in Dependability. The Symposium, which was held virtually earlier this month, was jointly sponsored by IFIP and IEEE.
Established in 1997, the WC Carter Award seeks to encourage the emergence and development of future leaders who perform research at an excellent level.
Dr Fang’s dissertation title was “Approaches for Building Error Resilient Applications” and addressed the problem of transient hardware faults in high performance computing (HPC) systems. Starting from the idea that most transient hardware faults have no significant impact at the software layer, Fang’s thesis proposed an error propagation model and a crash model to identify which faults really matter, particularly the ones that may cause silent data corruption and crashes, in order to selectively trigger recovery actions.
Subsequently, he proposed the innovative idea of applying the roll-forward recovery scheme in standard checkpoint/restart system to allow trading confidence in results for efficiency in both performance and energy saving.
Fang’s work has already been making an impact on the design and implementation of HPC systems at two national labs in the US, namely Pacific Northwestern National Labs (PNNL) and Los Alamos National Labs (LANL). He is currently affiliated with PNNL. The award was chosen by a panel of experts and was presented to the winner during a conference function.