guowenxiang commited on
Commit
2009402
·
verified ·
1 Parent(s): f9309e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -8
README.md CHANGED
@@ -5,21 +5,45 @@ PyTorch Implementation of [Lumina-t2x](https://arxiv.org/abs/2405.05945)
5
  We will provide our implementation and pretrained models as open source in this repository recently.
6
 
7
  [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2305.18474)
8
- [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/AIGC-Audio/Lumina-Audio)
9
  [![GitHub Stars](https://img.shields.io/github/stars/Text-to-Audio/Make-An-Audio-3?style=social)](https://github.com/Text-to-Audio/Make-An-Audio-3)
10
 
11
  ## Use pretrained model
12
  We provide our implementation and pretrained models as open source in this repository.
13
 
14
  Visit our [demo page](https://make-an-audio-2.github.io/) for audio samples.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ## Quick Started
16
  ### Pretrained Models
17
- Simply download the weights from [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/Alpha-VLLM/Lumina-T2Music).
18
- - Text Encoder: [FLAN-T5-Large](https://huggingface.co/google/flan-t5-large)
19
- - VAE: Make-An-Audio 2, finetuned from [Make an Audio](https://github.com/Text-to-Audio/Make-An-Audio)
20
- - Decoder: [Vocoder](https://github.com/NVIDIA/BigVGAN)
21
- - `Music` Checkpoints: [huggingface](https://huggingface.co/Alpha-VLLM/Lumina-T2Music), `Audio` Checkpoints: [huggingface]()
22
 
 
 
 
 
 
 
 
 
 
23
  ### Generate audio/music from text
24
  ```
25
  python3 scripts/txt2audio_for_2cap_flow.py
@@ -38,7 +62,7 @@ python3 scripts/txt2audio_for_2cap_flow.py
38
  ### Generate audio/music from video
39
  ```
40
  python3 scripts/video2audio_flow.py
41
- --outdir output_dir -r checkpoints_last.ckpt -b configs/txt2audio-cfm1-cfg-LargeDiT3.yaml --scale 3.0
42
  --vocoder-ckpt useful_ckpts/bigvnat --test-dataset vggsound
43
  ```
44
 
@@ -86,7 +110,7 @@ python main.py --base configs/research/text2audio/text2audio-ConcatDiT-ae1dnat_S
86
  ```
87
 
88
  ## Evaluation
89
- Please refer to [Make-An-Audio](https://github.com/Text-to-Audio/Make-An-Audio?tab=readme-ov-file#evaluation)
90
 
91
 
92
  ## Acknowledgements
 
5
  We will provide our implementation and pretrained models as open source in this repository recently.
6
 
7
  [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2305.18474)
8
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/AIGC-Audio/Make-An-Audio-3)
9
  [![GitHub Stars](https://img.shields.io/github/stars/Text-to-Audio/Make-An-Audio-3?style=social)](https://github.com/Text-to-Audio/Make-An-Audio-3)
10
 
11
  ## Use pretrained model
12
  We provide our implementation and pretrained models as open source in this repository.
13
 
14
  Visit our [demo page](https://make-an-audio-2.github.io/) for audio samples.
15
+
16
+ ## News
17
+ - June, 2024: **[Make-An-Audio-3 (Lumina-Next)](https://arxiv.org/abs/2405.05945)** released in [Github](https://github.com/Text-to-Audio/Make-An-Audio-3).
18
+
19
+ [//]: # (- May, 2024: **[Make-An-Audio-2]&#40;https://arxiv.org/abs/2207.06389&#41;** released in [Github]&#40;https://github.com/bytedance/Make-An-Audio-2&#41;.)
20
+ [//]: # (- August, 2023: **[Make-An-Audio]&#40;https://arxiv.org/abs/2301.12661&#41; &#40;ICML 2022&#41;** released in [Github]&#40;https://github.com/Text-to-Audio/Make-An-Audio&#41;. )
21
+
22
+ ## Install dependencies
23
+
24
+ Note: You may want to adjust the CUDA version [according to your driver version](https://docs.nvidia.com/deploy/cuda-compatibility/#default-to-minor-version).
25
+
26
+ ```bash
27
+ conda create -n Make_An_Audio_3 -y
28
+ conda activate Make_An_Audio_3
29
+ conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
30
+ pip install -r requirements.txt
31
+ pip install flash-attn --no-build-isolation
32
+ Install [nvidia apex](https://github.com/nvidia/apex) (optional)
33
+ ```
34
+
35
  ## Quick Started
36
  ### Pretrained Models
 
 
 
 
 
37
 
38
+ Simply download the 500M weights from [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/AIGC-Audio/Make-An-Audio-3/tree/main/useful_ckpts)
39
+
40
+ Model | Pretraining Data | Path
41
+ |-----------|--------------------|--------------------------------------------------------------------------------
42
+ | M (160M) | AudioCaption |[Here](https://huggingface.co/spaces/AIGC-Audio/Make-An-Audio-3/tree/main/useful_ckpts)
43
+ | L (520M) | AudioCaption |[TBD]
44
+ | XL (750M) | AudioCaption |[TBD]
45
+ | 3B | AudioCaption |[TBD]
46
+
47
  ### Generate audio/music from text
48
  ```
49
  python3 scripts/txt2audio_for_2cap_flow.py
 
62
  ### Generate audio/music from video
63
  ```
64
  python3 scripts/video2audio_flow.py
65
+ --outdir output_dir -r checkpoints_last.ckpt -b configs/video2audio-cfm1-cfg-LargeDiT1-moe.yaml --scale 3.0
66
  --vocoder-ckpt useful_ckpts/bigvnat --test-dataset vggsound
67
  ```
68
 
 
110
  ```
111
 
112
  ## Evaluation
113
+ Please refer to [Make-An-Audio](https://github.com/Text-to-Audio/Make-An-Audio?tab=readme-ov-file#evaluation).
114
 
115
 
116
  ## Acknowledgements