Thomas

Totototo

AI & ML interests

None yet

Recent Activity

commented on an article 23 days ago

Efficient LLM Pretraining: Packed Sequences and Masked Attention

updated a model 25 days ago

Totototo/SmolLM2-FT-MyDataset

commented on an article 27 days ago

Efficient LLM Pretraining: Packed Sequences and Masked Attention

View all activity

Organizations

None yet

Totototo's activity

commented on Efficient LLM Pretraining: Packed Sequences and Masked Attention 23 days ago

Few days later, I'll answer my own question based on what I could see in the code. Feel free to complement if I missed something.

From what I see in the current code of SFTTrainer :

the current option of SFTTrainer (packing=True) does not deal with attention masks.
it does not deal either with positionnal encoding.

See here https://github.com/huggingface/trl/blob/64aa06499b2e71537a8e701fad076873b0f3603f/trl/trainer/sft_trainer.py#L351: preparation of dataset with packing option
here https://github.com/huggingface/trl/blob/64aa06499b2e71537a8e701fad076873b0f3603f/trl/trainer/sft_trainer.py#L663: Actual packing with pack_dataset function on input_ids only.
here https://github.com/huggingface/trl/blob/e0dd5250217305f7f8c2f4a153a6939a2f16e2bf/trl/data_utils.py#L475 : pack_dataset function itself.

From what I understand, the "do not attend to tokens out of the current sentence" is infered only from the eos tokens separating each sentence from the other. In that SFTTrainer follows the approach chosen by GPT3 article ("Language Models are Few shot learners") authors. See the following extract :

updated a model 25 days ago

Totototo/SmolLM2-FT-MyDataset

Text Generation • Updated 25 days ago • 14

commented on Efficient LLM Pretraining: Packed Sequences and Masked Attention 27 days ago

Thanks @sirluk for the great article !
One thing unclear to me :

SFTTrainer from TRL contains a "packing" option. Does it deal with both the masked attention and the position ids that you mention above ?
are the previous concerns from the above comment from @shantanuagarwal relevant if one uses SFTTrainer ?

published a model about 1 month ago

Totototo/SmolLM2-FT-MyDataset

Text Generation • Updated 25 days ago • 14

updated a Space about 2 months ago

Pictionnary

⚡

Draw an object and get top guesses

published a Space about 2 months ago

Pictionnary

⚡

Draw an object and get top guesses

updated 2 models about 2 months ago

Totototo/bert-finetuned-squad-accelerate

Updated Apr 25 • 8

Totototo/bert-finetuned-squad

Question Answering • Updated Apr 25 • 6

published 2 models about 2 months ago

Totototo/bert-finetuned-squad-accelerate

Updated Apr 25 • 8

Totototo/bert-finetuned-squad

Question Answering • Updated Apr 25 • 6

updated a model about 2 months ago

Totototo/codeparrot-ds-accelerate

Updated Apr 23 • 6

published a model about 2 months ago

Totototo/codeparrot-ds-accelerate

Updated Apr 23 • 6

updated a model about 2 months ago

Totototo/codeparrot-ds

Text Generation • Updated Apr 23 • 27

published a model about 2 months ago

Totototo/codeparrot-ds

Text Generation • Updated Apr 23 • 27

commented on Investing in Performance: Fine-tune small models with LLM insights - a CFM case study about 2 months ago

Great article : subject is clearly more than hot; overall method looks good and code is neat. Thanks for sharing !

But two things I'd have improved :

test set size seems too small. (LLama 3.1 8b is down 8% compared to whole dataset. There is too much variability in the test : thus it seems to me that saying that you gained 7% on the finetuned small model does not hold).
costs comparison lack inference time between small and LLMs. Thus, unless I'm mistaken, saying direclty it's 80 times cheaper because inferences are 80 times cheaper does not hold. Maybe inference time of a small model on a small capacity endpoint is larger than that of a large model on a large capacity endpoint, and overall the gain would be less than 80x.

With such a great job done, too bad these two points slightly blur the conclusions, maybe it's easy to adapt ?

updated a model 2 months ago

Totototo/marian-finetuned-kde4-en-to-fr-accelerate

Updated Apr 7 • 21

published a model 2 months ago

Totototo/marian-finetuned-kde4-en-to-fr-accelerate

Updated Apr 7 • 21

updated a model 2 months ago

Totototo/marian-finetuned-kde4-en-to-fr

Updated Mar 31 • 6

published a model 2 months ago

Totototo/marian-finetuned-kde4-en-to-fr

Updated Mar 31 • 6

updated a model 3 months ago

Totototo/distilbert-base-uncased-finetuned-imdb-accelerate

Updated Feb 28 • 29