Scaling language models: Methods, analysis & insights from training gopher JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... arXiv preprint arXiv:2112.11446, 2021 | 768 | 2021 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 548 | 2023 |
Scaling local self-attention for parameter efficient visual backbones A Vaswani, P Ramachandran, A Srinivas, N Parmar, B Hechtman, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 388 | 2021 |
Mesh-tensorflow: Deep learning for supercomputers N Shazeer, Y Cheng, N Parmar, D Tran, A Vaswani, P Koanantakool, ... Advances in neural information processing systems 31, 2018 | 363 | 2018 |
Heterogeneous-race-free memory models DR Hower, BA Hechtman, BM Beckmann, BR Gaster, MD Hill, ... Proceedings of the 19th international conference on Architectural support …, 2014 | 127 | 2014 |
GSPMD: general and scalable parallelization for ML computation graphs Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang, R Joshi, M Krikun, ... arXiv preprint arXiv:2105.04663, 2021 | 88 | 2021 |
QuickRelease: A throughput-oriented approach to release consistency on GPUs BA Hechtman, S Che, DR Hower, Y Tian, BM Beckmann, MD Hill, ... 2014 IEEE 20th International Symposium on High Performance Computer …, 2014 | 86 | 2014 |
Scale mlperf-0.6 models on google tpu-v3 pods S Kumar, V Bitorff, D Chen, C Chou, B Hechtman, HJ Lee, N Kumar, ... arXiv preprint arXiv:1909.09756, 2019 | 36 | 2019 |
Large-scale discrete Fourier transform on TPUs T Lu, YF Chen, B Hechtman, T Wang, J Anderson IEEE Access 9, 93422-93432, 2021 | 35 | 2021 |
Unified scaling laws for routed language models A Clark, D de Las Casas, A Guy, A Mensch, M Paganini, J Hoffmann, ... International conference on machine learning, 4057-4086, 2022 | 34 | 2022 |
Evaluating cache coherent shared virtual memory for heterogeneous multicore chips BA Hechtman, DJ Sorin 2013 IEEE International Symposium on Performance Analysis of Systems and …, 2013 | 31 | 2013 |
Overlap communication with dependent computation via decomposition in large deep learning models S Wang, J Wei, A Sabne, A Davis, B Ilbeyi, B Hechtman, D Chen, ... Proceedings of the 28th ACM International Conference on Architectural …, 2022 | 26 | 2022 |
Automatic cross-replica sharding of weight update in data-parallel training Y Xu, HJ Lee, D Chen, H Choi, B Hechtman, S Wang arXiv preprint arXiv:2004.13336, 2020 | 26 | 2020 |
A flexible approach to autotuning multi-pass machine learning compilers PM Phothilimthana, A Sabne, N Sarda, KS Murthy, Y Zhou, ... 2021 30th International Conference on Parallel Architectures and Compilation …, 2021 | 24 | 2021 |
TPU-KNN: K nearest neighbor search at peak flop/s F Chern, B Hechtman, A Davis, R Guo, D Majnemer, S Kumar Advances in Neural Information Processing Systems 35, 15489-15501, 2022 | 17 | 2022 |
Exploring the limits of Concurrency in ML Training on Google TPUs S Kumar, Y Wang, C Young, J Bradbury, N Kumar, D Chen, A Swing Proceedings of Machine Learning and Systems 3, 81-92, 2021 | 16 | 2021 |
Method for memory consistency among heterogeneous computer components DR Hower, MD Hill, D Wood, SK Reinhardt, BR Gaster, BA Hechtman, ... US Patent 9,361,118, 2016 | 9 | 2016 |
Hierarchical write-combining cache coherence BA Hechtman, BM Beckmann US Patent 9,396,112, 2016 | 8 | 2016 |
General padding support for convolution on systolic arrays DA Majnemer, BA Hechtman, BH Roune US Patent 11,449,739, 2022 | 6 | 2022 |
Data remapping for heterogeneous processor S Che, B Beckmann, B Hechtman US Patent App. 14/055,221, 2015 | 5 | 2015 |