guowenxiang
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -5,21 +5,45 @@ PyTorch Implementation of [Lumina-t2x](https://arxiv.org/abs/2405.05945)
|
|
5 |
We will provide our implementation and pretrained models as open source in this repository recently.
|
6 |
|
7 |
[![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2305.18474)
|
8 |
-
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/AIGC-Audio/
|
9 |
[![GitHub Stars](https://img.shields.io/github/stars/Text-to-Audio/Make-An-Audio-3?style=social)](https://github.com/Text-to-Audio/Make-An-Audio-3)
|
10 |
|
11 |
## Use pretrained model
|
12 |
We provide our implementation and pretrained models as open source in this repository.
|
13 |
|
14 |
Visit our [demo page](https://make-an-audio-2.github.io/) for audio samples.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
## Quick Started
|
16 |
### Pretrained Models
|
17 |
-
Simply download the weights from [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/Alpha-VLLM/Lumina-T2Music).
|
18 |
-
- Text Encoder: [FLAN-T5-Large](https://huggingface.co/google/flan-t5-large)
|
19 |
-
- VAE: Make-An-Audio 2, finetuned from [Make an Audio](https://github.com/Text-to-Audio/Make-An-Audio)
|
20 |
-
- Decoder: [Vocoder](https://github.com/NVIDIA/BigVGAN)
|
21 |
-
- `Music` Checkpoints: [huggingface](https://huggingface.co/Alpha-VLLM/Lumina-T2Music), `Audio` Checkpoints: [huggingface]()
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
### Generate audio/music from text
|
24 |
```
|
25 |
python3 scripts/txt2audio_for_2cap_flow.py
|
@@ -38,7 +62,7 @@ python3 scripts/txt2audio_for_2cap_flow.py
|
|
38 |
### Generate audio/music from video
|
39 |
```
|
40 |
python3 scripts/video2audio_flow.py
|
41 |
-
--outdir output_dir -r checkpoints_last.ckpt -b configs/
|
42 |
--vocoder-ckpt useful_ckpts/bigvnat --test-dataset vggsound
|
43 |
```
|
44 |
|
@@ -86,7 +110,7 @@ python main.py --base configs/research/text2audio/text2audio-ConcatDiT-ae1dnat_S
|
|
86 |
```
|
87 |
|
88 |
## Evaluation
|
89 |
-
Please refer to [Make-An-Audio](https://github.com/Text-to-Audio/Make-An-Audio?tab=readme-ov-file#evaluation)
|
90 |
|
91 |
|
92 |
## Acknowledgements
|
|
|
5 |
We will provide our implementation and pretrained models as open source in this repository recently.
|
6 |
|
7 |
[![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2305.18474)
|
8 |
+
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/AIGC-Audio/Make-An-Audio-3)
|
9 |
[![GitHub Stars](https://img.shields.io/github/stars/Text-to-Audio/Make-An-Audio-3?style=social)](https://github.com/Text-to-Audio/Make-An-Audio-3)
|
10 |
|
11 |
## Use pretrained model
|
12 |
We provide our implementation and pretrained models as open source in this repository.
|
13 |
|
14 |
Visit our [demo page](https://make-an-audio-2.github.io/) for audio samples.
|
15 |
+
|
16 |
+
## News
|
17 |
+
- June, 2024: **[Make-An-Audio-3 (Lumina-Next)](https://arxiv.org/abs/2405.05945)** released in [Github](https://github.com/Text-to-Audio/Make-An-Audio-3).
|
18 |
+
|
19 |
+
[//]: # (- May, 2024: **[Make-An-Audio-2](https://arxiv.org/abs/2207.06389)** released in [Github](https://github.com/bytedance/Make-An-Audio-2).)
|
20 |
+
[//]: # (- August, 2023: **[Make-An-Audio](https://arxiv.org/abs/2301.12661) (ICML 2022)** released in [Github](https://github.com/Text-to-Audio/Make-An-Audio). )
|
21 |
+
|
22 |
+
## Install dependencies
|
23 |
+
|
24 |
+
Note: You may want to adjust the CUDA version [according to your driver version](https://docs.nvidia.com/deploy/cuda-compatibility/#default-to-minor-version).
|
25 |
+
|
26 |
+
```bash
|
27 |
+
conda create -n Make_An_Audio_3 -y
|
28 |
+
conda activate Make_An_Audio_3
|
29 |
+
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
|
30 |
+
pip install -r requirements.txt
|
31 |
+
pip install flash-attn --no-build-isolation
|
32 |
+
Install [nvidia apex](https://github.com/nvidia/apex) (optional)
|
33 |
+
```
|
34 |
+
|
35 |
## Quick Started
|
36 |
### Pretrained Models
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
+
Simply download the 500M weights from [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/AIGC-Audio/Make-An-Audio-3/tree/main/useful_ckpts)
|
39 |
+
|
40 |
+
Model | Pretraining Data | Path
|
41 |
+
|-----------|--------------------|--------------------------------------------------------------------------------
|
42 |
+
| M (160M) | AudioCaption |[Here](https://huggingface.co/spaces/AIGC-Audio/Make-An-Audio-3/tree/main/useful_ckpts)
|
43 |
+
| L (520M) | AudioCaption |[TBD]
|
44 |
+
| XL (750M) | AudioCaption |[TBD]
|
45 |
+
| 3B | AudioCaption |[TBD]
|
46 |
+
|
47 |
### Generate audio/music from text
|
48 |
```
|
49 |
python3 scripts/txt2audio_for_2cap_flow.py
|
|
|
62 |
### Generate audio/music from video
|
63 |
```
|
64 |
python3 scripts/video2audio_flow.py
|
65 |
+
--outdir output_dir -r checkpoints_last.ckpt -b configs/video2audio-cfm1-cfg-LargeDiT1-moe.yaml --scale 3.0
|
66 |
--vocoder-ckpt useful_ckpts/bigvnat --test-dataset vggsound
|
67 |
```
|
68 |
|
|
|
110 |
```
|
111 |
|
112 |
## Evaluation
|
113 |
+
Please refer to [Make-An-Audio](https://github.com/Text-to-Audio/Make-An-Audio?tab=readme-ov-file#evaluation).
|
114 |
|
115 |
|
116 |
## Acknowledgements
|