How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources

University of Edinburgh | [email protected]

work done at Allen Institute for AI

Thank Junxian He @SJTU, Pan Lu @UCLA, **Ruibo Liu** @Dartmouth for insightful initial discussions and suggestions.

Thank Raj Ammanabrolu @AI2, Peter Liu @Google Brain, Brendan Dolan-Gavitt @NYU**, Denny Zhou** @Google Brain, Aman Madaan @CMU for discussions and suggestions after release, which greatly improved the comprehensiveness.

Started writing on Thu Dec 08, 2022, Released on Dec 11, 2022, Last Edit May 16 2023

Other versions: [pdf] [Arxiv] [中文] [bib]

Discuss on twitter with the author

TL; DR

https://embed.notionlytics.com/wt/ZXlKd1lXZGxTV1FpT2lKaU9XRTFOMkZqTUdaalpqYzBaak13WVRGaFlqbGxNMlV6Tm1aaE1XUmpNU0lzSW5kdmNtdHpjR0ZqWlZSeVlXTnJaWEpKWkNJNklrTnlVbFp3WkVOMWEyRnJNblU1U0hWVVdXUjNJbjA9

Recently, the field has been greatly impressed and inspired by OpenAI’s ChatGPT. It is undoubtedly clever, capable, and very fun to talk to. Its multi-faceted abilities are significantly beyond many NLP researchers’ and practitioners’ expectations based on the impression of (not-that-strong) original GPT-3. The natural question is how ChatGPT gets there, and where these fantastic abilities come from. In this post, we try to dissect the emergent abilities and trace them to their sources, hoping to give a comprehensive roadmap about how the GPT-3.5 model family, along with related large language models, evolved to their current forms.

We hope this post can promote the transparency of large language models and serve as the roadmap for the community’s ongoing efforts of reproducing GPT-3.5.

To readers:

leave a message if you feel any part of this article is not supported by strong enough evidence. You can directly comment on the corresponding part/ email me/ comment on my twitter to request clarification.
Please do contact me if you want to translate this article into other languages.

Table of Content