Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... | 1166 | 2023 |
Pythia: A suite for analyzing large language models across training and scaling S Biderman, H Schoelkopf, QG Anthony, H Bradley, K O’Brien, E Hallahan, ... International Conference on Machine Learning, 2397-2430, 2023 | 431 | 2023 |
Crosslingual generalization through multitask finetuning N Muennighoff, T Wang, L Sutawika, A Roberts, S Biderman, TL Scao, ... arXiv preprint arXiv:2211.01786, 2022 | 376 | 2022 |
Starcoder: may the source be with you! R Li, LB Allal, Y Zi, N Muennighoff, D Kocetkov, C Mou, M Marone, C Akiki, ... arXiv preprint arXiv:2305.06161, 2023 | 297 | 2023 |
SantaCoder: don't reach for the stars! LB Allal, R Li, D Kocetkov, C Mou, C Akiki, CM Ferrandis, N Muennighoff, ... arXiv preprint arXiv:2301.03988, 2023 | 134* | 2023 |
Llemma: An open language model for mathematics Z Azerbayev, H Schoelkopf, K Paster, MD Santos, S McAleer, AQ Jiang, ... arXiv preprint arXiv:2310.10631, 2023 | 71 | 2023 |
Emergent and predictable memorization in large language models S Biderman, U PRASHANTH, L Sutawika, H Schoelkopf, Q Anthony, ... Advances in Neural Information Processing Systems 36, 2024 | 54 | 2024 |
Folio: Natural language reasoning with first-order logic S Han, H Schoelkopf, Y Zhao, Z Qi, M Riddell, L Benson, L Sun, E Zubova, ... arXiv preprint arXiv:2209.00840, 2022 | 46 | 2022 |
Bloom+ 1: Adding language support to bloom for zero-shot prompting ZX Yong, H Schoelkopf, N Muennighoff, AF Aji, DI Adelani, K Almubarak, ... arXiv preprint arXiv:2212.09535, 2022 | 32 | 2022 |
A framework for few-shot language model evaluation, 12 2023 L Gao, J Tow, B Abbasi, S Biderman, S Black, A DiPofi, C Foster, ... URL https://zenodo. org/records/10256836 7, 0 | 28 | |
Proofnet: Autoformalizing and formally proving undergraduate-level mathematics Z Azerbayev, B Piotrowski, H Schoelkopf, EW Ayers, D Radev, J Avigad arXiv preprint arXiv:2302.12433, 2023 | 23 | 2023 |
GAIA search: Hugging face and pyserini interoperability for nlp training data exploration A Piktus, O Ogundepo, C Akiki, A Oladipo, X Zhang, H Schoelkopf, ... arXiv preprint arXiv:2306.01481, 2023 | 5 | 2023 |
Explicit Knowledge Transfer for Weakly-Supervised Code Generation Z Azerbayev, A Ni, H Schoelkopf, D Radev arXiv preprint arXiv:2211.16740, 2022 | 3 | 2022 |
Social Choice for AI Alignment: Dealing with Diverse Human Feedback V Conitzer, R Freedman, J Heitzig, WH Holliday, BM Jacobs, N Lambert, ... arXiv preprint arXiv:2404.10271, 2024 | 2 | 2024 |
Suppressing Pink Elephants with Direct Principle Feedback L Castricato, N Lile, S Anand, H Schoelkopf, S Verma, S Biderman arXiv preprint arXiv:2402.07896, 2024 | 2 | 2024 |
Transformer Math 101 Q Anthony, S Biderman, H Schoelkopf https://blog.eleuther.ai/transformer-math/, 2023 | | 2023 |
Attributing Mode Collapse in the fine-tuning of Large Language Models L O'Mahony, L Grinsztajn, H Schoelkopf, S Biderman ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation …, 0 | | |