wujing's picture

2 1

wujing

wjmcat

·

http://wj-Mcat.github.io

AI & ML interests

student who wants to search for nlp models

Recent Activity

liked a model 3 days ago

baidu/ERNIE-4.5-21B-A3B-Thinking

new activity 18 days ago

deepseek-ai/DeepSeek-V3.1-Base:the pass@1 of deepseek-v3.1-base in lcb benchmark

reacted to yushun0410's post with 🚀 about 1 year ago

Hi Huggingfacers! Thrilled to introduce Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training. The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers. Feel free to try it out! Try switching to Adam-mini with the same hyperparams of AdamW, it would work with only half memory. Hope Adam-mini can help save time, cost, and energy in your tasks! Paper: "Adam-mini: Use Fewer Learning Rates To Gain More" https://arxiv.org/abs/2406.16793 Code: https://github.com/zyushun/Adam-mini

View all activity

Organizations

wjmcat 's models 3

wjmcat/opt-350m-paddle

Updated Sep 1, 2022

wjmcat/opt-1.3b-paddle

Updated May 15, 2022

wjmcat/opt-125m-paddle

Updated May 15, 2022