rCUDA: Reducing the number of GPU-based accelerators in high performance clusters J Duato, AJ Pena, F Silla, R Mayo, ES Quintana-Ortí 2010 International Conference on High Performance Computing & Simulation …, 2010 | 255 | 2010 |

An extension of the StarSs programming model for platforms with multiple GPUs E Ayguadé, RM Badia, FD Igual, J Labarta, R Mayo, ES Quintana-Ortí European Conference on Parallel Processing, 851-862, 2009 | 198 | 2009 |

The science of deriving dense linear algebra algorithms P Bientinesi, JA Gunnels, ME Myers, ES Quintana-Ortí, RA Geijn ACM Transactions on Mathematical Software (TOMS) 31 (1), 1-26, 2005 | 196 | 2005 |

Solving stable generalized Lyapunov equations with the matrix sign function P Benner, ES Quintana-Ortí Numerical Algorithms 20 (1), 75-100, 1999 | 188 | 1999 |

Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures E Chan, ES Quintana-Orti, G Quintana-Orti, R Van De Geijn Proceedings of the nineteenth annual ACM symposium on Parallel algorithms …, 2007 | 161 | 2007 |

Programming matrix algorithms-by-blocks for thread-level parallelism G Quintana-Ortí, ES Quintana-Ortí, RA Geijn, FGV Zee, E Chan ACM Transactions on Mathematical Software (TOMS) 36 (3), 14, 2009 | 157 | 2009 |

Solving dense linear systems on platforms with multiple hardware accelerators G Quintana-Ortí, FD Igual, ES Quintana-Ortí, RA Van de Geijn ACM Sigplan Notices 44 (4), 121-130, 2009 | 143 | 2009 |

Representing linear algebra algorithms in code: the FLAME application program interfaces P Bientinesi, ES Quintana-Ortí, RA Geijn ACM Transactions on Mathematical Software (TOMS) 31 (1), 27-59, 2005 | 113 | 2005 |

Evaluation and tuning of the level 3 CUBLAS for graphics processors S Barrachina, M Castillo, FD Igual, R Mayo, ES Quintana-Orti 2008 IEEE International Symposium on Parallel and Distributed Processing, 1-8, 2008 | 111 | 2008 |

Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks E Chan, FG Van Zee, P Bientinesi, ES Quintana-Orti, G Quintana-Orti, ... Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of …, 2008 | 111 | 2008 |

Solving dense linear systems on graphics processors S Barrachina, M Castillo, FD Igual, R Mayo, ES Quintana-Ortí European Conference on Parallel Processing, 739-748, 2008 | 104 | 2008 |

Introducing: The libflame library for dense matrix computations F Van Zee, E Chan, R van de Geijn, E Quintana, G Quintana-Orti Computing in science & engineering, 2009 | 101 | 2009 |

A complete and efficient CUDA-sharing solution for HPC clusters AJ Peña, C Reaño, F Silla, R Mayo, ES Quintana-Ortí, J Duato Parallel Computing 40 (10), 574-588, 2014 | 100 | 2014 |

A note on parallel matrix inversion ES Quintana, G Quintana, X Sun, R van de Geijn SIAM Journal on Scientific Computing 22 (5), 1762-1771, 2001 | 97 | 2001 |

A proposal to extend the openmp tasking model for heterogeneous architectures E Ayguade, RM Badia, D Cabrera, A Duran, M Gonzalez, F Igual, ... International Workshop on OpenMP, 154-167, 2009 | 95 | 2009 |

Extending OpenMP to survive the heterogeneous multi-core era E Ayguadé, RM Badia, P Bellens, D Cabrera, A Duran, R Ferrer, ... International Journal of Parallel Programming 38 (5-6), 440-459, 2010 | 90 | 2010 |

Parallelizing dense and banded linear algebra libraries using SMPSs RM Badia, JR Herrero, J Labarta, JM Pérez, ES Quintana‐Ortí, ... Concurrency and Computation: Practice and Experience 21 (18), 2438-2456, 2009 | 88 | 2009 |

Enabling CUDA acceleration within virtual machines using rCUDA J Duato, AJ Pena, F Silla, JC Fernandez, R Mayo, ES Quintana-Orti 2011 18th International Conference on High Performance Computing, 1-10, 2011 | 83 | 2011 |

Balanced truncation model reduction of large-scale dense systems on parallel computers P Benner, ES Quintana-Ortí, G Quintana-Ortí Mathematical and Computer Modelling of Dynamical Systems 6 (4), 383-405, 2000 | 83 | 2000 |

Tools for power-energy modelling and analysis of parallel scientific applications P Alonso, RM Badia, J Labarta, M Barreda, MF Dolz, R Mayo, ... 2012 41st international conference on parallel processing, 420-429, 2012 | 78 | 2012 |