Joo's picture

Joo

minair

AI & ML interests

None yet

Recent Activity

Organizations

None yet

minair's activity

view reply

Hi, I have a question about the last part of section 2.1.

The example used here is a chess game, which has two players taking turns to make moves and we're assuming that our objective is to derive the best strategy of player 1. So, isn't it assuming that we don't know the policy of player 2 (the opponent)?

For example, we know the probability (policy) that player 1 do action a_0 in s_0 as player 1 is our team. Then, the state will change to s_1 by certain pre-determined probability transition function. However, we don't know the probability (policy) of player 2 will do action a_1 for s_1 as player 2 is not our team.

I'm wondering how can we know the opponent's policy.

It'll be very grateful if you let me know about it or whether I'm misunderstanding something.

Thank you.

image.png