SM Kravec

Cited by

	All	Since 2019
Citations	2667	2613
h-index	16	15
i10-index	18	17

1500

750

375

1125

201820192020202120222023202414 17 20 21 99 1411 1036

Public access

View all

3 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Brian SwingleAssociate Professor, Brandeis UniversityVerified email at brandeis.edu

SM Kravec

Anthropic

Verified email at anthropic.com - Homepage


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Training a helpful and harmless assistant with reinforcement learning from human feedback Y Bai, A Jones, K Ndousse, A Askell, A Chen, N DasSarma, D Drain, ... arXiv preprint arXiv:2204.05862, 2022	708	2022
Constitutional ai: Harmlessness from ai feedback Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... arXiv preprint arXiv:2212.08073, 2022	600	2022
Language models (mostly) know what they know S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ... arXiv preprint arXiv:2207.05221, 2022	229	2022
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ... arXiv preprint arXiv:2209.07858, 2022	220	2022
Predictability and surprise in large generative models D Ganguli, D Hernandez, L Lovitt, A Askell, Y Bai, A Chen, T Conerly, ... Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022	173	2022
Toy models of superposition N Elhage, T Hume, C Olsson, N Schiefer, T Henighan, S Kravec, ... arXiv preprint arXiv:2209.10652, 2022	145	2022
Discovering language model behaviors with model-written evaluations E Perez, S Ringer, K Lukošiūtė, K Nguyen, E Chen, S Heiner, C Pettit, ... arXiv preprint arXiv:2212.09251, 2022	130	2022
The capacity for moral self-correction in large language models D Ganguli, A Askell, N Schiefer, TI Liao, K Lukošiūtė, A Chen, A Goldie, ... arXiv preprint arXiv:2302.07459, 2023	96	2023
Towards monosemanticity: Decomposing language models with dictionary learning T Bricken, A Templeton, J Batson, B Chen, A Jermyn, T Conerly, N Turner, ... Transformer Circuits Thread, 2, 2023	59	2023
All-fermion electrodynamics and fermion number anomaly inflow S Kravec, J McGreevy, B Swingle Physical Review D 92 (8), 085024, 2015	48	2015
Measuring progress on scalable oversight for large language models SR Bowman, J Hyun, E Perez, E Chen, C Pettit, S Heiner, K Lukošiūtė, ... arXiv preprint arXiv:2211.03540, 2022	42	2022
Towards understanding sycophancy in language models M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ... arXiv preprint arXiv:2310.13548, 2023	41	2023
Nonrelativistic conformal field theories in the large charge sector SM Kravec, S Pal Journal of High Energy Physics 2019 (2), 1-24, 2019	39	2019
Gauge theory generalization of the fermion doubling theorem SM Kravec, J McGreevy Physical Review Letters 111 (16), 161603, 2013	36	2013
Localization of a Hole on an Adenine− Thymine Radical Cation in B-Form DNA in Water SM Kravec, CD Kinz-Thompson, EM Conwell The Journal of Physical Chemistry B 115 (19), 6166-6171, 2011	25	2011
The spinful large charge sector of non-relativistic CFTs: from phonons to vortex crystals SM Kravec, S Pal Journal of High Energy Physics 2019 (5), 1-22, 2019	24	2019
Training a helpful and harmless assistant with reinforcement learning from human feedback. CoRR, abs/2204.05862, 2022a. doi: 10.48550 Y Bai, A Jones, K Ndousse, A Askell, A Chen, N DasSarma, D Drain, ... arXiv preprint arXiv.2204.05862, 0	12
Evaluating and mitigating discrimination in language model decisions A Tamkin, A Askell, L Lovitt, E Durmus, N Joseph, S Kravec, K Nguyen, ... arXiv preprint arXiv:2312.03689, 2023	11	2023
Sleeper agents: Training deceptive llms that persist through safety training E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ... arXiv preprint arXiv:2401.05566, 2024	9	2024
Specific versus general principles for constitutional ai S Kundu, Y Bai, S Kadavath, A Askell, A Callahan, A Chen, A Goldie, ... arXiv preprint arXiv:2310.13798, 2023	9	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors