ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q4 Reinforcement Learning • 8B • Updated Mar 26 • 3.24k • 217
view reply what if we segment the audio first and then transcribe tho its some extra compute to throw in but imo it would resul tin better result !
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Paper • 2412.10302 • Published Dec 13, 2024 • 18 • 10