Spaces:

nanotron
/

ultrascale-playbook

Running

App Files Files Community

114

How to understand the graph "Tensor parallelism with column linear + row Linear"

#109

by Yihel - opened Apr 26

Discussion

Yihel

Apr 26

In the “Tensor parallelism with column linear + row Linear” diagram, are Y2_0 and Y2_1 calculated incorrectly?

Shouldn't Y1_0 @W2_0 be

Y2_0 = [[200,  600],
      [800, 2400],
      [1400,4200],
      [2000,6000]]

Makrrr

May 8

Ya, but I think they probably just copied and pasted from the previous section.

I think they are trying to say that in a transformer block, the FFN will have 1 hidden layer, which means 2 matrix multiplications. So it should be X * W_1 * W_2.

X * W_1 = Y_1 --> use column parallel (but without alltogether)

Y_1 * W_2 = Y_2 -> use row parallel (now alltogether)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment