Ahmadzei's picture
update 1
57bdca5
raw
history blame
303 Bytes
Attention is only computed within a local window, and the window is shifted between attention layers to create connections to help the model learn better. Since the Swin Transformer can produce hierarchical feature maps, it is a good candidate for dense prediction tasks like segmentation and detection.