Commit
·
6b56fcd
1
Parent(s):
9264590
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,34 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
datasets:
|
| 4 |
+
- M2UGen/MUCaps
|
| 5 |
+
- M2UGen/MUEdit
|
| 6 |
+
- M2UGen/MUImage
|
| 7 |
+
- M2UGen/MUVideo
|
| 8 |
---
|
| 9 |
+
# M<sup>2</sup>UGen Model with MusicGen-medium
|
| 10 |
+
|
| 11 |
+
The M<sup>2</sup>UGen model is a Music Understanding and Generation model that is capable of Music Question Answering and also Music Generation
|
| 12 |
+
from texts, images, videos and audios, as well as Music Editing. The model utilizes encoders such as MERT for music understanding, ViT for image understanding
|
| 13 |
+
and ViViT for video understanding and the MusicGen/AudioLDM2 model as the music generation model (music decoder), coupled with adapters and the LLaMA 2 model
|
| 14 |
+
to make the model possible for multiple abilities.
|
| 15 |
+
|
| 16 |
+
M<sup>2</sup>UGen was published in [M<sup>2</sup>UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models](https://arxiv.org/abs/2311.11255) by *Atin Sakkeer Hussain, Shansong Liu, Chenshuo Sun and Ying Shan*.
|
| 17 |
+
|
| 18 |
+
The code repository for the model is published in [crypto-code/M2UGen](https://github.com/crypto-code/M2UGen). Clone the repository, download the checkpoint and run the following for a model demo:
|
| 19 |
+
```bash
|
| 20 |
+
python gradio_app.py --model ./ckpts/M2UGen-MusicGen-medium/checkpoint.pth --llama_dir ./ckpts/LLaMA-2 --music_decoder musicgen --music_decoder_path facebook/musicgen-medium
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
## Citation
|
| 24 |
+
|
| 25 |
+
If you find this model useful, please consider citing:
|
| 26 |
+
|
| 27 |
+
```bibtex
|
| 28 |
+
@article{hussain2023m,
|
| 29 |
+
title={{M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models}},
|
| 30 |
+
author={Hussain, Atin Sakkeer and Liu, Shansong and Sun, Chenshuo and Shan, Ying},
|
| 31 |
+
journal={arXiv preprint arXiv:2311.11255},
|
| 32 |
+
year={2023}
|
| 33 |
+
}
|
| 34 |
+
```
|