More training details?
#15
by
elepedus
- opened
Hi,
Thanks so much for the model and very well-written & approachable technical report β It's great to see continuing work on BitNet!
I wonder if you could share any more details about the training, especially regarding cost / resource utilisation and how it compares to un-quantised training runs? Naively, I would expect meaningful efficiency gains at training time as well, but it would be great to get some concrete numbers.
Thanks in advance,
Ed