Condition numbers of Gaussian random matrices Z Chen, JJ Dongarra SIAM Journal on Matrix Analysis and Applications 27 (3), 603-620, 2005 | 180 | 2005 |

Online-ABFT: An online algorithm based fault tolerance scheme for soft error detection in iterative methods Z Chen ACM SIGPLAN Notices 48 (8), 167-176, 2013 | 144 | 2013 |

Algorithm-based fault tolerance for fail-stop failures Z Chen, J Dongarra IEEE Transactions on Parallel and Distributed Systems 19 (12), 1628-1641, 2008 | 132 | 2008 |

Fault tolerant high performance computing by a coding approach Z Chen, GE Fagg, E Gabriel, J Langou, T Angskun, G Bosilca, J Dongarra Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of …, 2005 | 124 | 2005 |

High performance linpack benchmark: a fault tolerant implementation without checkpointing T Davies, C Karlsson, H Liu, C Ding, Z Chen Proceedings of the international conference on Supercomputing, 162-171, 2011 | 110 | 2011 |

Algorithm-based recovery for iterative methods without checkpointing Z Chen Proceedings of the 20th international symposium on High performance …, 2011 | 107 | 2011 |

Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources Z Chen, J Dongarra Proceedings 20th IEEE International Parallel & Distributed Processing …, 2006 | 98 | 2006 |

Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization D Tao, S Di, Z Chen, F Cappello 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2017 | 83 | 2017 |

Extending the MPI specification for process fault tolerance on high performance computing systems GE Fagg, E Gabriel, G Bosilca, T Angskun, Z Chen, J Pjesivac-Grbovic, ... Proceedings of the International Supercomputer Conference (ICS) 12, 2004 | 83 | 2004 |

Self-adapting software for numerical linear algebra and LAPACK for clusters Z Chen, J Dongarra, P Luszczek, K Roche Parallel Computing 29 (11-12), 1723-1743, 2003 | 81 | 2003 |

Process fault tolerance: Semantics, design and applications for high performance computing GE Fagg, E Gabriel, Z Chen, T Angskun, G Bosilca, J Pjesivac-Grbovic, ... The International Journal of High Performance Computing Applications 19 (4 …, 2005 | 68 | 2005 |

Recovery patterns for iterative methods in a parallel unstable environment G Bosilca, Z Chen, J Dongarra, J Langou | 65* | 2007 |

Correcting soft errors online in LU factorization T Davies, Z Chen Proceedings of the 22nd international symposium on High-performance parallel …, 2013 | 62 | 2013 |

Matrix multiplication on gpus with on-line fault tolerance C Ding, C Karlsson, H Liu, T Davies, Z Chen 2011 IEEE Ninth International Symposium on Parallel and Distributed …, 2011 | 55 | 2011 |

Self-adapting numerical software (SANS) effort J Dongarra, G Bosilca, Z Chen, V Eijkhout, GE Fagg, E Fuentes, J Langou, ... IBM Journal of Research and Development 50 (2.3), 223-238, 2006 | 53 | 2006 |

Highly scalable self-healing algorithms for high performance scientific computing Z Chen, J Dongarra IEEE Transactions on Computers 58 (11), 1512-1524, 2009 | 46 | 2009 |

Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach D Li, Z Chen, P Wu, JS Vetter SC'13: Proceedings of the International Conference on High Performance …, 2013 | 45 | 2013 |

Algorithmic Cholesky factorization fault recovery D Hakkarinen, Z Chen 2010 IEEE International Symposium on Parallel & Distributed Processing …, 2010 | 41 | 2010 |

FT-ScaLAPACK: Correcting soft errors on-line for ScaLAPACK Cholesky, QR, and LU factorization routines P Wu, Z Chen Proceedings of the 23rd international symposium on High-performance parallel …, 2014 | 38 | 2014 |

Numerically stable real number codes based on random matrices Z Chen, J Dongarra International Conference on Computational Science, 115-122, 2005 | 37 | 2005 |