Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
CoffeeBliss's picture
2 2 9

CoffeeBliss

CoffeeBliss
Sunny500's profile picture Gargaz's profile picture bird-of-paradise's profile picture
·

AI & ML interests

None yet

Recent Activity

new activity 10 days ago
XiaomiMiMo/MiMo-7B-RL-0530:Please add model knowledge other than code/math
liked a model 10 days ago
XiaomiMiMo/MiMo-7B-RL-0530
replied to lewtun's post 5 months ago
This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
View all activity

Organizations

None yet

models 0

None public yet

datasets 0

None public yet
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs