Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
wujing's picture
2 1

wujing

wjmcat
ยท
http://wj-Mcat.github.io
  • wj_Mcat
  • wj-Mcat

AI & ML interests

student who wants to search for nlp models

Recent Activity

liked a model 3 days ago
baidu/ERNIE-4.5-21B-A3B-Thinking
new activity 18 days ago
deepseek-ai/DeepSeek-V3.1-Base:the pass@1 of deepseek-v3.1-base in lcb benchmark
reacted to yushun0410's post with ๐Ÿš€ about 1 year ago
Hi Huggingfacers! Thrilled to introduce Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training. The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers. Feel free to try it out! Try switching to Adam-mini with the same hyperparams of AdamW, it would work with only half memory. Hope Adam-mini can help save time, cost, and energy in your tasks! Paper: "Adam-mini: Use Fewer Learning Rates To Gain More" https://arxiv.org/abs/2406.16793 Code: https://github.com/zyushun/Adam-mini
View all activity

Organizations

PaddleCI's profile picture PaddlePaddle's profile picture

wjmcat 's models 3

wjmcat/opt-350m-paddle

Updated Sep 1, 2022

wjmcat/opt-1.3b-paddle

Updated May 15, 2022

wjmcat/opt-125m-paddle

Updated May 15, 2022
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs