NG's picture

139 269

NG

SirRa1zel

·

AI & ML interests

Text-to-Speech, Translation, Object Detection

Recent Activity

liked a model 1 day ago

LiquidAI/LFM2-350M-ENJP-MT

liked a model 8 days ago

nasa-ibm-ai4science/Surya-1.0

liked a model 13 days ago

microsoft/VibeVoice-1.5B

View all activity

Organizations

None yet

upvoted 2 collections 3 months ago

Common Pile v0.1 Filtered Data

An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated Jun 6 • 17

Stable Diffusion 3.5

6 items • Updated Jan 9 • 176

upvoted a paper 3 months ago

Efficient Part-level 3D Object Generation via Dual Volume Packing

Paper • 2506.09980 • Published Jun 11 • 8

upvoted 2 collections 4 months ago

LLaMA-Omni

13 items • Updated May 17 • 16

Voila

Voila: Voice-Language Foundation Models. https://voila.maitrix.org • 7 items • Updated May 6 • 23

upvoted 2 papers 4 months ago

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 86

PixelHacker: Image Inpainting with Structural and Semantic Consistency

Paper • 2504.20438 • Published Apr 29 • 44

upvoted a collection 5 months ago

Orpheus Multilingual Research Release

Beta Release of multilingual models. • 12 items • Updated Apr 10 • 100

upvoted a paper 5 months ago

TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes

Paper • 2503.23461 • Published Mar 30 • 95

upvoted 4 papers 6 months ago

Long-Video Audio Synthesis with Multi-Agent Collaboration

Paper • 2503.10719 • Published Mar 13 • 9

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Paper • 2502.19400 • Published Feb 26 • 49

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Paper • 2502.18364 • Published Feb 25 • 38

MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use

Paper • 2502.15872 • Published Feb 21 • 5

upvoted an article 7 months ago

Article

Open-source DeepResearch – Freeing our search agents

By

and 4 others •

Feb 4

• 1.29k

upvoted a collection 7 months ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 535

upvoted 4 papers 8 months ago

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

Paper • 2501.12909 • Published Jan 22 • 72

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 86

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

Paper • 2501.10045 • Published Jan 17 • 9

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

Paper • 2501.09756 • Published Jan 16 • 19

upvoted a collection 8 months ago

OuteTTS 0.3

4 items • Updated Apr 7 • 17