Pretrain from scratch 4096 context length on 90B tokens Malaysian text, https://huggingface.co/papers/2401.14680

Mesolitica
company
AI & ML interests
We develop Multimodality Artificial Intelligence for South East Asia.
Recent Activity
View all activity
Collections
24
models
256

mesolitica/Malaysian-orpheus-3b-0.1-ft
Text Generation
•
Updated
•
11
•
1

mesolitica/Malaysian-F5-TTS-v2
Updated
•
1

mesolitica/malaysian-whisper-large-v3-turbo-v3
Updated
•
628
•
1

mesolitica/Malaysian-Llama-3.1-8B-Instruct-Marlin
Updated
•
111

mesolitica/Malaysian-Llama-3.2-1B-Instruct-v2
Updated
•
34

mesolitica/Malaysian-Llama-3.2-3B-Instruct-v2
Updated
•
26

mesolitica/Malaysian-Qwen2.5-1.5B-Instruct
Updated
•
193

mesolitica/Malaysian-Llama-3.1-8B-Instruct
Updated
•
137

mesolitica/malaysian-whisper-small-v3
Updated
•
302

mesolitica/malaysian-parler-tts-mini-v1
Text2Text Generation
•
Updated
•
94
datasets
213
mesolitica/Malaysian-Voice-Conversion
Updated
•
438
mesolitica/Malaysian-Emilia
Updated
•
1k
•
2
mesolitica/Malaysian-Emilia-annotated
Viewer
•
Updated
•
1.24M
•
1.09k
•
1
mesolitica/TTS-Combined
Viewer
•
Updated
•
646k
•
76
mesolitica/TTS
Viewer
•
Updated
•
646k
•
2.23k
mesolitica/Malaysian-STT-Whisper-Stage2
Viewer
•
Updated
•
3.29M
•
93
•
1
mesolitica/pseudolabel-mandarin-large-v3-timestamp
Viewer
•
Updated
•
1.69M
•
310
mesolitica/Malaysian-STT-Whisper
Viewer
•
Updated
•
11.9M
•
1.51k
•
2
mesolitica/pseudolabel-science-large-v3-timestamp
Viewer
•
Updated
•
851k
•
133
mesolitica/pseudolabel-tamil-large-v3-timestamp
Viewer
•
Updated
•
1.46M
•
370