Neel Nanda

Citado por

	Total	Desde 2019
Citas	2018	2017
Índice h	13	13
Índice i10	15	15

1100

550

275

825

202220232024108 1067 832

Acceso público

Ver todo

1 artículo

0 artículos

disponibles

no disponibles

Basado en requisitos de financiación

Coautores

Lawrence ChanPhD Student, UC BerkeleyDirección de correo verificada de berkeley.edu
Catherine OlssonAnthropicDirección de correo verificada de mit.edu
Tom LieberumGoogle DeepMindDirección de correo verificada de deepmind.com
Christopher OlahAnthropicDirección de correo verificada de google.com
Bilal ChughtaiIndependentDirección de correo verificada de cam.ac.uk

Seguir

Neel Nanda

Research Engineer, Google DeepMind

Dirección de correo verificada de deepmind.com - Página principal

AI ML AI Alignment Interpretability Mechanistic Interpretability


Título Ordenar por citas Ordenar por año Ordenar por título	Citado por Citado por	Año
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback Y Bai, A Jones, K Ndousse, A Askell, A Chen, N DasSarma, D Drain, ... arXiv preprint arXiv:2204.05862, 2022	743	2022
In-context learning and induction heads C Olsson, N Elhage, N Nanda, N Joseph, N DasSarma, T Henighan, ... Transformer Circuits Thread, 2022	326*	2022
A Mathematical Framework for Transformer Circuits N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ... Transformer Circuits Thread, 2021	295*	2021
Progress Measures For Grokking Via Mechanistic Interpretability N Nanda, L Chan, T Liberum, J Smith, J Steinhardt ICLR 2023 Spotlight, 2023	176*	2023
Predictability and surprise in large generative models D Ganguli, D Hernandez, L Lovitt, A Askell, Y Bai, A Chen, T Conerly, ... Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022	176	2022
Finding Neurons in a Haystack: Case Studies with Sparse Probing W Gurnee, N Nanda, M Pauly, K Harvey, D Troitskii, D Bertsimas Transactions on Machine Learning Research, 2023	52	2023
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations B Chughtai, L Chan, N Nanda ICML 2023, 2023	46	2023
Emergent Linear Representations in World Models of Self-Supervised Sequence Models N Nanda, A Lee, M Wattenberg BlackboxNLP at EMNLP 2023, Honourable Mention for Best Paper, 2023	43*	2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla T Lieberum, M Rahtz, J Kramár, N Nanda, G Irving, R Shah, V Mikulik arXiv preprint arXiv:2307.09458, 2023	29	2023
TransformerLens: A Library for Mechanistic Interpretability of Language Models N Nanda, J Bloom https://github.com/neelnanda-io/TransformerLens, 2022	22*	2022
Softmax Linear Units N Elhage, T Hume, C Olsson, N Nanda, T Henighan, S Johnston, ... Transformer Circuits Thread, 2022	16	2022
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods F Zhang, N Nanda ICLR 2024, 2023	15	2023
Attribution Patching: Activation Patching At Industrial Scale N Nanda https://www.neelnanda.io/mechanistic-interpretability/attribution-patching, 2023	15*	2023
Linear Representations of Sentiment in Large Language Models C Tigges, OJ Hollinsworth, A Geiger, N Nanda NeurIPS 2023 Workshop on Attributing Model Behavior at Scale, 2023	13	2023
Copy Suppression: Comprehensively Understanding an Attention Head C McDougall, A Conmy, C Rushing, T McGrath, N Nanda NeurIPS 2023 Workshop on Attributing Model Behavior at Scale, 2023	10	2023
A Comprehensive Mechanistic Interpretability Explainer & Glossary N Nanda https://neelnanda.io/glossary, 2023	6*	2023
Universal Neurons in GPT2 Language Models W Gurnee, T Horsley, ZC Guo, TR Kheirkhah, Q Sun, W Hathaway, ... arXiv preprint arXiv:2401.12181, 2024	5	2024
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching A Makelov, G Lange, N Nanda ICLR 2024, 2023	5*	2023
Neuroscope N Nanda https://neuroscope.io, 2022	5*	2022
Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level N Nanda, S Rajamanoharan, J Kramár, R Shah Alignment Forum, https://www.alignmentforum.org/posts/iGuwZTHWb6DFY3sKB/fact …, 2023	4*	2023

El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.

Artículos 1–20

Citas por año

Citas duplicadas

Citas combinadas

Añadir coautoresCoautores

Seguir

Citado por

Coautores