Sequence packing logic

by orrzohar - opened Jan 22

Jan 22

Could you give some details about the sequence packing logic? Besides packing sequences, did you happen to do anything not to apply attention between different sequences?

If yes/no, did you ablate this design choice?

RealZhiqiLi

Jan 28

•

edited Jan 28

We make the packing has no effect to the computing results by using 1) corss-sample attention mask 2) postion ids resetting. So we think our packing strategy won't influence our results compared to nopacking.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment