Experience replay is a learn-by-experience technique in reinforcement learning used in training artificially intelligent agents with a minimal cost.
In artificial intelligence, we want the agents interacting with the world to be robust and highly performant. To accomplish this, we walk agents through different experiences and tell them how they performed and whether their behavior and action for a given task were right or wrong. One way to train the agents is to artificially generate experiences for them, which comes with a cost of time, effort, and money. The other way is the experience replay, where we don't generate the experiences artificially. With the experience replay technique, we let the agent go through authentic experiences and save those experiences in their memory and learn from them. In short, experience replay comprises of these two:
Experience
Learning
Let's go through them in the section below.
An experience is typically defined as a set of four parameters:
The current state
The action taken
The reward received
The next updated state
experience = (current_state, action, reward, next_state)
The agent interacts with the environment by observing its current state, taking action, receiving a reward, and transitioning to a new state. The goal of the agent is to maximize the total reward.
The experiences experienced by the agent are stored in the agent's memory buffer with a certain capacity, called the experience replay buffer. The experience replay buffer is essentially a data structure that keeps track of the agent's past experiences.
Note: An experience is also called a transition.
In the learning process, the experience replay program in the agent, instead of using the most recent experience to update the agent's knowledge or state, randomly samples a batch of experiences from the experience replay buffer. This random sampling breaks the correlation between consecutive experiences and helps to decorrelate the data, which can stabilize and improve the learning process.
The randomly sampled batch of experiences is then used to update the agent's learning model. This process helps the agent learn from diverse experiences, including rare or infrequent events that might be important for optimal decision-making.
Let's assume we have an artificially intelligent cook in a cafe. Customers come in and order something. The AI cook cooks the ordered food for the customers based on its prior experience, which is served to them. After eating the food, the customers provide feedback about how the food was on a feedback form. The AI cook saves that experience in its memory and learns from the feedback (if negative) to cook better next time. If the feedback is positive, that is also an experience that makes the AI cook more confident in its cooking.
Note: The AI cook aims to maximize positive feedback.
Free Resources