Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
amang1802 's Collections
ThinkTransformer experiments
Smol-Math
Small model pretraining experiments
PPO experiments
Synthetic Data rewrite (model checkpoints)
Synthetic Data rewrite research (training and eval datasets)
WildeWeb Research

PPO experiments

updated Jan 23

Using PPO with simpler reward functions

Upvote
-

  • amang1802/summary_train

    Viewer • Updated Nov 21, 2024 • 1.28k • 26

  • amang1802/summary_train_med

    Viewer • Updated 6 days ago • 18.4k • 104

  • amang1802/Llama3.2-1B-summary-length-1024-1ep

    Text Generation • Updated Nov 21, 2024 • 43

  • amang1802/Llama3.2-1B-summary-length-exp2

    Text Generation • Updated Nov 21, 2024 • 20

  • amang1802/Llama3.2-1B-summary-length-exp3

    Text Generation • Updated Nov 21, 2024 • 17

  • amang1802/Llama3.2-1B-summary-length-exp4

    Text Generation • Updated Nov 21, 2024 • 16

  • amang1802/Llama3.2-1B-summary-length-exp6

    Text Generation • Updated Nov 25, 2024 • 19

  • amang1802/Llama3.2-1B-summary-length-exp7

    Text Generation • Updated Nov 25, 2024 • 18
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs