Trim the text encoder weights
#2
by
ttj
- opened
The encode config says 24 layers but if I understand correctly you can set it to 17 and trim the later layers
You are totally right. This could be done to save some memory and compute... but the highest cost here is the generation. The T5 inference is done just once at the begining, whereas the DiT model is run as many times as steps are done.