Victor Gallego's picture

Victor Gallego

vicgalle

·

https://github.com/vicgalle

AI & ML interests

Preference fine-tuning, alignment & synthetic data. Building LLMs in general!

Recent Activity

liked a model 5 days ago

ds4sd/SmolDocling-256M-preview

liked a dataset 6 days ago

Mielikki/Erebus-87k

liked a dataset 6 days ago

nvidia/Llama-Nemotron-Post-Training-Dataset-v1

View all activity

Organizations

vicgalle's activity

upvoted a collection 12 days ago

DeepHermes

Preview models of hybrid reasoner Hermes series • 6 items • Updated 12 days ago • 27

upvoted a collection about 1 month ago

DPO

Various useful datasets with preference optimization • 16 items • Updated Jan 23 • 5

upvoted 2 papers about 1 month ago

MetaSC: Test-Time Safety Specification Optimization for Language Models

Paper • 2502.07985 • Published Feb 11 • 3

Agency Is Frame-Dependent

Paper • 2502.04403 • Published Feb 6 • 22

upvoted a collection about 2 months ago

Toxic Commons

Tools for de-toxifying public domain data, especially multilingual and historical text data and data with OCR errors. • 3 items • Updated Oct 31, 2024 • 6

upvoted a collection 2 months ago

Cosmos

The collection of Cosmos models • 31 items • Updated 7 days ago • 275

upvoted a collection 5 months ago

steiner-preview

Reasoning models trained on synthetic data using reinforcement learning. • 3 items • Updated Oct 20, 2024 • 32

upvoted a paper 5 months ago

Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems

Paper • 2410.13334 • Published Oct 17, 2024 • 13

upvoted a paper 6 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 174

upvoted a collection 6 months ago

Llama 3.2 Re-upload

10 items • Updated Sep 25, 2024 • 11

upvoted 2 papers 6 months ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 139

Hermes 3 Technical Report

Paper • 2408.11857 • Published Aug 15, 2024 • 48

upvoted a collection 7 months ago

Hermes 3

The Hermes 3 Series of Models • 12 items • Updated Feb 13 • 112

upvoted a paper 8 months ago

WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Paper • 2408.03837 • Published Aug 7, 2024 • 18

upvoted a collection 8 months ago

Llama 3.1

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 654

upvoted an article 8 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16, 2024

• 344