Papers
arxiv:2410.17891

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Published on Oct 23
Ā· Submitted by kiaia on Oct 24
Authors:
,
,
,
,
,
,
,
,

Abstract

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to their AR counterparts and lack fair comparison on language modeling benchmarks. Additionally, training diffusion models from scratch at scale remains challenging. Given the prevalence of open-source AR language models, we propose adapting these models to build text diffusion models. We demonstrate connections between AR and diffusion modeling objectives and introduce a simple continual pre-training approach for training diffusion models. Through systematic evaluation on language modeling, reasoning, and commonsense benchmarks, we show that we can convert AR models ranging from 127M to 7B parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts. We release a suite of DLMs (with 127M, 355M, and 7B parameters) capable of generating fluent text, performing in-context learning, filling in the middle without prompt re-ordering, and following instructions https://github.com/HKUNLP/DiffuLLaMA.

Community

Paper author Paper submitter

šŸ”„ By adapting autoregressive models like GPT2 and LLaMA, we've crafted powerful text diffusion models with a fresh training approach, using under 200B tokens.

šŸ“Š Our models shine in language modeling, reasoning, and commonsense benchmarks, showcasing diffusion's potential at scale! āœØ

Check out our paper for more insights: https://arxiv.org/abs/2410.17891
Explore our code: https://github.com/HKUNLP/DiffuLLaMA
Discover the models: https://huggingface.co/diffusionfamily

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 5

Browse 5 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.17891 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.17891 in a Space README.md to link it from this page.

Collections including this paper 4