Reward: A reward is a scalar feedback signal it indicates how well the agent is doing at step t. The agent’s sole objective is to maximize the total reward it receives over the long run. The reward signal is the primary basis for altering the policy. R(s) indicates the reward for simply being in the state S.
R(S,a) indicates the reward for being in a state S and taking an action a. R(S, a, S’) indicates the reward for being in a state S, taking an action a and ending up in a state S’.
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.