How to understand the graph "Tensor parallelism with column linear + row Linear"

#109
by Yihel - opened

In the “Tensor parallelism with column linear + row Linear” diagram, are Y2_0 and Y2_1 calculated incorrectly?

Shouldn't Y1_0 @W2_0 be

Y2_0 = [[200,  600],
      [800, 2400],
      [1400,4200],
      [2000,6000]]

Ya, but I think they probably just copied and pasted from the previous section.

I think they are trying to say that in a transformer block, the FFN will have 1 hidden layer, which means 2 matrix multiplications. So it should be X * W_1 * W_2.

X * W_1 = Y_1 --> use column parallel (but without alltogether)

Y_1 * W_2 = Y_2 -> use row parallel (now alltogether)

Sign up or log in to comment