5 424

Literate Goggles

literate-goggles

AI & ML interests

None yet

Recent Activity

upvoted a paper about 17 hours ago

Reinforcement Pre-Training

upvoted a paper 1 day ago

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

upvoted a paper 1 day ago

Audio-Aware Large Language Models as Judges for Speaking Styles

View all activity

Organizations

None yet

literate-goggles's activity

upvoted a paper about 17 hours ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published 1 day ago • 151

upvoted 2 papers 1 day ago

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 83

Audio-Aware Large Language Models as Judges for Speaking Styles

Paper • 2506.05984 • Published 5 days ago • 14

upvoted an article 4 days ago

Article

ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

5 days ago

• 34

upvoted an article 6 days ago

Article

KV Cache from scratch in nanoVLM

and 4 others •

7 days ago

• 63

upvoted a paper 12 days ago

D-AR: Diffusion via Autoregressive Models

Paper • 2505.23660 • Published 12 days ago • 34

upvoted a paper 15 days ago

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

Paper • 2505.17589 • Published 19 days ago • 3

upvoted a paper 19 days ago

Scaling Diffusion Transformers Efficiently via μP

Paper • 2505.15270 • Published 21 days ago • 32

upvoted a paper 20 days ago

Continuous Speech Tokenizer in Text To Speech

Paper • 2410.17081 • Published Oct 22, 2024 • 1

upvoted a paper 21 days ago

Latent Flow Transformer

Paper • 2505.14513 • Published 21 days ago • 27

upvoted a paper 26 days ago

End-to-End Vision Tokenizer Tuning

Paper • 2505.10562 • Published 26 days ago • 21

upvoted a paper about 1 month ago

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 176

upvoted 3 papers about 2 months ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 128

D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

Paper • 2504.09454 • Published Apr 13 • 12

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Paper • 2504.08736 • Published Apr 11 • 47

upvoted a paper 2 months ago

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8 • 75

upvoted 2 articles 2 months ago

Article

Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC

•

Apr 9

• 26

Article

The NLP Course is becoming the LLM Course!

and 9 others •

Apr 3

• 97

upvoted a paper 2 months ago

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

Paper • 2504.00999 • Published Apr 1 • 92

upvoted a paper 3 months ago

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 25