Fortunately, in reinforcement learning, a model has a very specific meaning: it refers to the different dynamic states of an environment and how these states lead to a reward.
Model-based RL entails constructing such a model. Model-free RL, conversely, forgoes this environmental information and only concerns itself with determining what action to take given a specific state
model-free RL takes.
the urban navigator, are the agent; that the different locations at which you need to make a directional decision are the states; and that the direction you choose to take from these states are the actions. The rewards (the feedback based on the agent’s actions) would most likely be positive anytime an action both got you closer to your destination and avoided the dangerous neighborhood, zero if you avoided the neighborhood but failed to get closer to your destination, and negative anytime you failed to avoid the neighborhood. The policy is whatever strategy you use to determine what action/direction to take based on your current state/location.
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.