M2UGen
/

M2UGen-MusicGen-medium

Model card Files Files and versions

crypto-code commited on Jan 2, 2024

Commit

6b56fcd

·

1 Parent(s): 9264590

Update README.md

Files changed (1) hide show

README.md +31 -0

README.md CHANGED Viewed

@@ -1,3 +1,34 @@
 ---
 license: mit
 ---

 ---
 license: mit
+datasets:
+- M2UGen/MUCaps
+- M2UGen/MUEdit
+- M2UGen/MUImage
+- M2UGen/MUVideo
 ---
+# M<sup>2</sup>UGen Model with MusicGen-medium
+The M<sup>2</sup>UGen model is a Music Understanding and Generation model that is capable of Music Question Answering and also Music Generation
+from texts, images, videos and audios, as well as Music Editing. The model utilizes encoders such as MERT for music understanding, ViT for image understanding
+and ViViT for video understanding and the MusicGen/AudioLDM2 model as the music generation model (music decoder), coupled with adapters and the LLaMA 2 model
+to make the model possible for multiple abilities.
+M<sup>2</sup>UGen was published in [M<sup>2</sup>UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models](https://arxiv.org/abs/2311.11255) by *Atin Sakkeer Hussain, Shansong Liu, Chenshuo Sun and Ying Shan*.
+The code repository for the model is published in [crypto-code/M2UGen](https://github.com/crypto-code/M2UGen). Clone the repository, download the checkpoint and run the following for a model demo:
+```bash
+python gradio_app.py --model ./ckpts/M2UGen-MusicGen-medium/checkpoint.pth --llama_dir ./ckpts/LLaMA-2 --music_decoder musicgen --music_decoder_path facebook/musicgen-medium
+```
+## Citation
+If you find this model useful, please consider citing:
+```bibtex
+@article{hussain2023m,
+  title={{M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models}},
+  author={Hussain, Atin Sakkeer and Liu, Shansong and Sun, Chenshuo and Shan, Ying},
+  journal={arXiv preprint arXiv:2311.11255},
+  year={2023}
+}
+```