metadata
license: mit
M4-Audio-LongVA-7B-Qwen2
Enhancing Interactive Capabilities in VideoLLM
M4-Audio-7B is an extension of LongVA-7B, further trained using the M4-IT dataset, which comprises 9,963 visual-audio instruction tuning instances. This training was conducted without any special modifications to the existing training pipeline.
Usage
For more information about the interaction inference pipeline, please visit the M4 GitHub repository.