☆ 4.7 Article

PaLM: Scaling Language Modeling with Pathways

JOURNAL OF MACHINE LEARNING RESEARCH (2023)

期刊

JOURNAL OF MACHINE LEARNING RESEARCH

卷 24, 期 -, 页码 -

出版社

MICROTOME PUBL

关键词

Large language models; Few-shot learning; Natural language processing; Scalable deep learning

类别

Automation & Control Systems Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Large language models have shown remarkable performance in various natural language tasks through few-shot learning. This article introduces a 540 billion parameter Transformer language model called PaLM, trained using Pathways, a new ML system. The study demonstrates the benefits of scaling and shows state-of-the-art few-shot learning results on language understanding and generation benchmarks. The PaLM model achieves breakthrough performance on multi-step reasoning tasks and even outperforms human performance on the BIG-bench benchmark. The article also discusses the model's capabilities in multilingual tasks and source code generation, as well as addresses ethical considerations and potential mitigation strategies.

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540 billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM).We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.

PaLM: Scaling Language Modeling with Pathways

期刊

JOURNAL OF MACHINE LEARNING RESEARCH

出版社

MICROTOME PUBL

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

PaLM: Scaling Language Modeling with Pathways

期刊

JOURNAL OF MACHINE LEARNING RESEARCH

出版社

MICROTOME PUBL

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文