Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
wujing's picture
2 1

wujing

wjmcat
Β·
http://wj-Mcat.github.io
  • wj_Mcat
  • wj-Mcat

AI & ML interests

student who wants to search for nlp models

Recent Activity

liked a model 1 day ago
baidu/ERNIE-4.5-21B-A3B-Thinking
new activity 16 days ago
deepseek-ai/DeepSeek-V3.1-Base:the pass@1 of deepseek-v3.1-base in lcb benchmark
reacted to yushun0410's post with πŸš€ about 1 year ago
Hi Huggingfacers! Thrilled to introduce Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training. The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers. Feel free to try it out! Try switching to Adam-mini with the same hyperparams of AdamW, it would work with only half memory. Hope Adam-mini can help save time, cost, and energy in your tasks! Paper: "Adam-mini: Use Fewer Learning Rates To Gain More" https://arxiv.org/abs/2406.16793 Code: https://github.com/zyushun/Adam-mini
View all activity

Organizations

PaddleCI's profile picture PaddlePaddle's profile picture

wjmcat 's Spaces 1

Runtime error

Text Similarity

πŸŒ–

Aug 23, 2022
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs