MIXER as Reinforcement Learning

来源:互联网 发布:mac桌面文件 编辑:程序博客网 时间:2024/06/02 15:31

1. Our generative model can be viewed as an agent, which interacts with the external environment (the words and the context vector it sees as input at every time step).

2. The parameters of this agent defines a policy, whose execution results in the agent picking an action. 

3. In the sequence generation setting, an action refers to predicting the next word in the sequence at each time step.

4. After taking an action the agent updates its internal state (the hidden units of RNN).

5. Once the agent has reached the end of sequence, it observes a reward.


0 0
原创粉丝点击