Lj V. Miranda's picture

In a Training Loop 🔄

Lj V. Miranda PRO

ljvmiranda921

·

https://ljvmiranda921.github.io

AI & ML interests

NLP - multilinguality, data-centric AI

Recent Activity

liked a model about 11 hours ago

Infomaniak-AI/vllm-translategemma-27b-it

liked a model about 12 hours ago

google/translategemma-27b-it

liked a dataset about 15 hours ago

PrimeIntellect/SYNTHETIC-2-SFT-verified

View all activity

Organizations

upvoted a paper 5 days ago

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published 18 days ago • 99

upvoted 2 articles 5 months ago

Article

There is no such thing as a tokenizer-free lunch

Sep 25, 2025

•

95

Article

An Analysis of Multilingual Models on Hugging Face

Sep 18, 2025

•

4

upvoted an article 6 months ago

Article

🇵🇭 FilBench - Can LLMs Understand and Generate Filipino?

+7

Aug 12, 2025

•

23

upvoted a collection 8 months ago

Reward Bench 2

Datasets, spaces, and models for Reward Bench 2 benchmark and paper! • 11 items • Updated Dec 23, 2025 • 16

upvoted a paper 9 months ago

R3: Robust Rubric-Agnostic Reward Models

Paper • 2505.13388 • Published May 19, 2025 • 11

upvoted 2 papers 10 months ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29, 2025 • 98

The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks

Paper • 2504.15521 • Published Apr 22, 2025 • 64

upvoted a collection 11 months ago

SEA-VL: Multicultural VL Dataset for Southeast Asia

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia • 3 items • Updated Apr 12, 2025 • 20

upvoted a paper 11 months ago

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10, 2025 • 101

upvoted 3 papers about 1 year ago

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published Dec 19, 2024 • 10

2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 22

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 377

upvoted 3 collections about 1 year ago

Multilingual LLM Evaluation

Multilingual Evaluation Benchmarks • 8 items • Updated Jul 31, 2025 • 29

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark S

SEACrowd is a community movement project aimed at centralizing and standardizing AI resources for Southeast Asian languages, cultures, and/or regions. • 3 items • Updated Jun 18, 2024 • 8

OLMo 2

Artifacts for the OLMo 2 release. • 35 items • Updated Dec 23, 2025 • 152

upvoted a paper about 1 year ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 67

upvoted a collection about 1 year ago

Tulu 3 Datasets

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated Dec 23, 2025 • 95

upvoted a paper over 1 year ago

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Paper • 2410.19133 • Published Oct 24, 2024 • 11

upvoted a collection over 1 year ago

Multilingual RewardBench (M-RewardBench) [ACL 2025 Main]

Multilingual Reward Model Evaluation Dataset and Results • 3 items • Updated May 15, 2025 • 4