《LLM Collection | Prompt Engineering Guide》
原文:地址链接
This section consists of a collection and summary of notable and foundational LLMs.
Models
Model | Release Date | Size (B) | Checkpoints | Description |
---|---|---|---|---|
Falcon LLM(opens in a new tab) | May 2023 | 7, 40 | Falcon-7B(opens in a new tab), Falcon-40B(opens in a new tab) | Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model. |
PaLM 2(opens in a new tab) | May 2023 | – | – | A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. |
Med-PaLM 2(opens in a new tab) | May 2023 | – | – | Towards Expert-Level Medical Question Answering with Large Language Models |
Gorilla(opens in a new tab) | May 2023 | 7 | Gorilla(opens in a new tab) | Gorilla: Large Language Model Connected with Massive APIs |
RedPajama-INCITE(opens in a new tab) | May 2023 | 3, 7 | RedPajama-INCITE(opens in a new tab) | A family of models including base, instruction-tuned & chat models. |
LIMA(opens in a new tab) | May 2023 | 65 | – | A 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. |
Replit Code(opens in a new tab) | May 2023 | 3 | Replit Code(opens in a new tab) | replit-code-v1-3b model is a 2.7B LLM trained on 20 languages from the Stack Dedup v1.2 dataset. |
h2oGPT(opens in a new tab) | May 2023 | 12 | h2oGPT(opens in a new tab) | h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. |
CodeGen2(opens in a new tab) | May 2023 | 1, 3, 7, 16 | CodeGen2(opens in a new tab) | Code models for program synthesis. |
CodeT5 and CodeT5+(opens in a new tab) | May 2023 | 16 | CodeT5(opens in a new tab) | CodeT5 and CodeT5+ models for Code Understanding and Generation from Salesforce Research. |
StarCoder(opens in a new tab) | May 2023 | 15 | StarCoder(opens in a new tab) | StarCoder: A State-of-the-Art LLM for Code |
MPT-7B(opens in a new tab) | May 2023 | 7 | MPT-7B(opens in a new tab) | MPT-7B is a GPT-style model, and the first in the MosaicML Foundation Series of models. |
DLite(opens in a new tab) | May 2023 | 0.124 – 1.5 | DLite-v2-1.5B(opens in a new tab) | Lightweight instruction following models which exhibit ChatGPT-like interactivity. |
Dolly(opens in a new tab) | April 2023 | 3, 7, 12 | Dolly(opens in a new tab) | An instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use. |
StableLM(opens in a new tab) | April 2023 | 3, 7 | StableLM-Alpha(opens in a new tab) | Stability AI’s StableLM series of language models |
Pythia(opens in a new tab) | April 2023 | 0.070 – 12 | Pythia(opens in a new tab) | A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. |
Open Assistant (Pythia Family)(opens in a new tab) | March 2023 | 12 | Open Assistant(opens in a new tab) | OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. |
Cerebras-GPT(opens in a new tab) | March 2023 | 0.111 – 13 | Cerebras-GPT(opens in a new tab) | Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster |
BloombergGPT(opens in a new tab) | March 2023 | 50 | – | BloombergGPT: A Large Language Model for Finance |
PanGu-Σ(opens in a new tab) | March 2023 | 1085 | – | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing |
GPT-4(opens in a new tab) | March 2023 | – | – | GPT-4 Technical Report |
LLaMA(opens in a new tab) | Feb 2023 | 7, 13, 33, 65 | LLaMA(opens in a new tab) | LLaMA: Open and Efficient Foundation Language Models |
ChatGPT(opens in a new tab) | Nov 2022 | – | – | A model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. |
Galactica(opens in a new tab) | Nov 2022 | 0.125 – 120 | Galactica(opens in a new tab) | Galactica: A Large Language Model for Science |
mT0(opens in a new tab) | Nov 2022 | 13 | mT0-xxl(opens in a new tab) | Crosslingual Generalization through Multitask Finetuning |
BLOOM(opens in a new tab) | Nov 2022 | 176 | BLOOM(opens in a new tab) | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
U-PaLM(opens in a new tab) | Oct 2022 | 540 | – | Transcending Scaling Laws with 0.1% Extra Compute |
UL2(opens in a new tab) | Oct 2022 | 20 | UL2, Flan-UL2(opens in a new tab) | UL2: Unifying Language Learning Paradigms |
Sparrow(opens in a new tab) | Sep 2022 | 70 | – | Improving alignment of dialogue agents via targeted human judgements |
Flan-T5(opens in a new tab) | Oct 2022 | 11 | Flan-T5-xxl(opens in a new tab) | Scaling Instruction-Finetuned Language Models |
AlexaTM(opens in a new tab) | Aug 2022 | 20 | – | AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model |
GLM-130B(opens in a new tab) | Oct 2022 | 130 | GLM-130B(opens in a new tab) | GLM-130B: An Open Bilingual Pre-trained Model |
OPT-IML(opens in a new tab) | Dec 2022 | 30, 175 | OPT-IML(opens in a new tab) | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization |
OPT(opens in a new tab) | May 2022 | 175 | OPT-13B(opens in a new tab), OPT-66B(opens in a new tab) | OPT: Open Pre-trained Transformer Language Models |
PaLM(opens in a new tab) | April 2022 | 540 | – | PaLM: Scaling Language Modeling with Pathways |
Tk-Instruct(opens in a new tab) | April 2022 | 11 | Tk-Instruct-11B(opens in a new tab) | Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks |
GPT-NeoX-20B(opens in a new tab) | April 2022 | 20 | GPT-NeoX-20B(opens in a new tab) | GPT-NeoX-20B: An Open-Source Autoregressive Language Model |
Chinchilla(opens in a new tab) | Mar 2022 | 70 | – | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. |
InstructGPT(opens in a new tab) | Mar 2022 | 175 | – | Training language models to follow instructions with human feedback |
CodeGen(opens in a new tab) | Mar 2022 | 0.350 – 16 | CodeGen(opens in a new tab) | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis |
AlphaCode(opens in a new tab) | Feb 2022 | 41 | – | Competition-Level Code Generation with AlphaCode |
MT-NLG(opens in a new tab) | Jan 2022 | 530 | – | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model |
LaMDA(opens in a new tab) | Jan 2022 | 137 | – | LaMDA: Language Models for Dialog Applications |
GLaM(opens in a new tab) | Dec 2021 | 1200 | – | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts |
Gopher(opens in a new tab) | Dec 2021 | 280 | – | Scaling Language Models: Methods, Analysis & Insights from Training Gopher |
WebGPT(opens in a new tab) | Dec 2021 | 175 | – | WebGPT: Browser-assisted question-answering with human feedback |
Yuan 1.0(opens in a new tab) | Oct 2021 | 245 | – | Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning |
T0(opens in a new tab) | Oct 2021 | 11 | T0(opens in a new tab) | Multitask Prompted Training Enables Zero-Shot Task Generalization |
FLAN(opens in a new tab) | Sep 2021 | 137 | – | Finetuned Language Models Are Zero-Shot Learners |
HyperCLOVA(opens in a new tab) | Sep 2021 | 82 | – | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers |
ERNIE 3.0 Titan(opens in a new tab) | July 2021 | 10 | – | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
Jurassic-1(opens in a new tab) | Aug 2021 | 178 | – | Jurassic-1: Technical Details and Evaluation |
ERNIE 3.0(opens in a new tab) | July 2021 | 10 | – | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
Codex(opens in a new tab) | July 2021 | 12 | – | Evaluating Large Language Models Trained on Code |
GPT-J-6B(opens in a new tab) | June 2021 | 6 | GPT-J-6B(opens in a new tab) | A 6 billion parameter, autoregressive text generation model trained on The Pile. |
CPM-2(opens in a new tab) | Jun 2021 | 198 | CPM(opens in a new tab) | CPM-2: Large-scale Cost-effective Pre-trained Language Models |
PanGu-α(opens in a new tab) | April 2021 | 13 | PanGu-α(opens in a new tab) | PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation |
mT5(opens in a new tab) | Oct 2020 | 13 | mT5(opens in a new tab) | mT5: A massively multilingual pre-trained text-to-text transformer |
BART(opens in a new tab) | Jul 2020 | – | BART(opens in a new tab) | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension |
GShard(opens in a new tab) | Jun 2020 | 600 | – | GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding |
GPT-3(opens in a new tab) | May 2020 | 175 | – | Language Models are Few-Shot Learners |
CTRL(opens in a new tab) | Sep 2019 | 1.63 | CTRL(opens in a new tab) | CTRL: A Conditional Transformer Language Model for Controllable Generation |
ALBERT(opens in a new tab) | Sep 2019 | 0.235 | ALBERT(opens in a new tab) | A Lite BERT for Self-supervised Learning of Language Representations |
XLNet(opens in a new tab) | Jun 2019 | – | XLNet(opens in a new tab) | Generalized Autoregressive Pretraining for Language Understanding and Generation |
T5(opens in a new tab) | Oct 2019 | 0.06 – 11 | Flan-T5(opens in a new tab) | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
GPT-2(opens in a new tab) | Nov 2019 | 1.5 | GPT-2(opens in a new tab) | Language Models are Unsupervised Multitask Learners |
RoBERTa(opens in a new tab) | July 2019 | 0.125 – 0.355 | RoBERTa(opens in a new tab) | A Robustly Optimized BERT Pretraining Approach |
BERT(opens in a new tab) | Oct 2018 | – | BERT(opens in a new tab) | Bidirectional Encoder Representations from Transformers |
GPT(opens in a new tab) | June 2018 | – | GPT(opens in a new tab) | Improving Language Understanding by Generative Pre-Training |