{"id":529172,"date":"2023-07-14T12:28:05","date_gmt":"2023-07-14T10:28:05","guid":{"rendered":"https:\/\/www.scribbr.nl\/?p=529172"},"modified":"2023-08-15T17:06:38","modified_gmt":"2023-08-15T15:06:38","slug":"reinforcement-learning","status":"publish","type":"post","link":"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/","title":{"rendered":"Easy Introduction to Reinforcement Learning"},"content":{"rendered":"<p><strong>Reinforcement learning (RL) <\/strong>is a branch of <a href=index-1587.html><strong>machine learning<\/strong><\/a> that focuses on training computers to make optimal decisions by interacting with their environment. Instead of being given explicit instructions, the computer learns through trial and error: by exploring the environment and receiving rewards or punishments for its actions.<\/p>\n<p>Together with <a href=index-1589.html><strong>supervised<\/strong> and <strong>unsupervised learning<\/strong><\/a>, reinforcement learning is one of three basic machine learning approaches. Reinforcement learning has a wide range of real-world applications, including robotics, game playing, and diagnosing rare diseases.<\/p>\n<p><!--more--><\/p>\n<h2 id=\"definition\">What is reinforcement learning?<\/h2>\n<p>Reinforcement learning (RL) is a way for computers to learn independently by making a series of decisions and learning from the outcomes. Through trial and error, computer programs determine the best actions within a certain context and optimize their performance.<\/p>\n<p>The computer receives positive or negative feedback based on its actions and gradually learns how to complete a task. In other words, RL is about learning the optimal behavior in an environment to obtain maximum reward.<\/p>\n<p><img decoding=\"async\" src=index-4707.html alt=\"What is reinforcement learning?\" width=\"660\" class=\"mb-3 mt-3 aligncenter\"\/><\/p>\n<p>RL is an approach suitable for addressing problems involving a series of decisions that all affect one another.<\/p>\n<p>Training a computer to win at backgammon, for example, involves a whole sequence of good decisions, not just one. In games like this, there are several possible actions and scenarios, and a lot of uncertainty regarding how short-term actions pay off in the long run. RL can also help solve complex problems of control, such as walking robots or self-driving cars.<\/p>\n<p>Unlike the other two learning frameworks, which operate on the basis of an existing dataset, RL gathers data as it interacts with its environment. It allows a piece of software to find the optimal solution by exploring, interacting with, and ultimately learning from the environment.<\/p>\n<figure class=\"kb-textbox border-left yellow\"><figcaption>Note<\/figcaption><strong>Reinforcement learning from human feedback (RLHF)<\/strong> is a <a href=index-1587.html>machine learning<\/a> approach that combines reinforcement learning techniques (i..e., positive and negative feedback) with human guidance to improve the learning process. It is applied to various domains of natural language processing, like ChatGPT.<\/p>\n<p>In RLHF, a pre-trained language model (e.g., a chatbot) is assessed by humans, who score the responses it generates. By incorporating human feedback, experts can direct the model to favor certain outputs over others\u2014for example, those that read more naturally or are more helpful.<\/figure>\n<h3 id=\"elements\">The elements of reinforcement learning<\/h3>\n<p>Reinforcement learning involves the following key elements:<\/p>\n<ul>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Environment <\/strong>is the context in which a computer program operates. This can be virtual, like a video game, or physical, like a house.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Agent <\/strong>refers to the learner or decision-maker (i.e., the computer program) within the environment. The agent explores the environment and interacts with it.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Action<\/strong> refers to moves taken by an agent within the environment.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>State<\/strong> is the current situation of the agent at a specific time. Each action leads to a change of state.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Reward <\/strong>can be <strong>positive<\/strong>, to reinforce good behavior, or <strong>negative<\/strong>, to deter undesirable behavior. It is feedback that the agent receives from the environment for a certain action.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Policy<\/strong> is how an agent behaves at a given time. It defines the mapping or path between different states or situations and actions, guiding the agent on what action to take next based on the current state.<\/li>\n<li aria-level=\"1\"><strong>Value<\/strong> represents how beneficial a particular state is in the long run and helps the agent assess the desirability of different states or actions. The value is determined based on the potential rewards or penalties associated with a state or action. By estimating the value of different states or actions, the agent can define a policy that prioritizes actions and states with higher expected long-term benefits.<\/li>\n<\/ul>\n<p>Additionally, <a href=index-4674.html#algorithm\"><strong>algorithms<\/strong><\/a> are an integral part of the RL process and come into play in various steps. They are used to design the learning agent\u2014i.e, its decision-making process, how it updates its policy, and how it learns from the feedback received.<\/p>\n<figure class=\"kb-textbox border-left blue\"><figcaption>Example of reinforcement learning<\/figcaption>Suppose we want to train a computer to play <em>Pac-Man<\/em>. Here are the key elements of reinforcement learning within this specific context:<\/p>\n<ul>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Environment <\/strong>is everything that constitutes the game\u2019s world: the maze, the enemy (ghosts), the dots, the bonus items (fruits), the power pellets, etc.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Agent <\/strong>refers to the <a href=index-4242.html>eponymous<\/a> character. The agent\u2019s goal is to \u201ceat\u201d all the dots in the maze, avoid the ghosts, and maximize its score.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Action<\/strong> refers to moves Pac-Man can make: moving up, down, right, or left.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>States<\/strong> can be different situations or scenarios, such as being in a corner of the maze with no dots nearby and several ghosts approaching. The positions of Pac-Man and the ghosts, the remaining dots, and whether ghosts are in blue mode (so that Pac-Man can chase them) all constitute states.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Reward <\/strong>can be positive\u2014e.g., receiving points for eating dots\u2014or negative\u2014e.g., losing a \u201clife\u201d if caught by a ghost.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Policy <\/strong>can be a rule like &#8220;if a ghost is nearby, move in the opposite direction&#8221; or &#8220;if a dot is nearby, move towards it.&#8221;<\/li>\n<li aria-level=\"1\"><strong>Value<\/strong> can be understood as the desirability of being in a particular state. The value of the state described above, where Pac-Man is cornered by the ghosts, would be low because Pac-Man is at risk of being caught and has little opportunity to increase the score by eating dots. A state where Pac-Man eats a power pellet and can chase the ghosts would have high value because it allows Pac-Man to collect many dots, eliminate the ghosts, and increase the score.<\/li>\n<\/ul>\n<\/figure>\n<h2 id=\"work\">How does reinforcement learning work?<\/h2>\n<p>At the heart of reinforcement learning lies the concept of reinforcing optimal behavior or action through a reward system. Engineers come up with a method of rewarding desired behaviors and punishing unwanted behaviors.<\/p>\n<p>They also employ various techniques to prevent short-term rewards from stalling the agent, delaying the achievement of the overall objective. This means defining rewards that align with the long-term objective so that the agent learns to prioritize actions that lead to the desired outcome.<\/p>\n<p>Reinforcement learning is an iterative cycle of exploration, feedback, and improvement. The process can be better understood through this workflow:<\/p>\n<ol>\n<li aria-level=\"1\"><strong>Define the problem<\/strong><\/li>\n<li aria-level=\"1\"><strong>Set up the environment<\/strong><\/li>\n<li aria-level=\"1\"><strong>Create an agent\u00a0<\/strong><\/li>\n<li aria-level=\"1\"><strong>Start learning<\/strong><\/li>\n<li aria-level=\"1\"><strong>Receive feedback<\/strong><\/li>\n<li aria-level=\"1\"><strong>Update the policy<\/strong><\/li>\n<li aria-level=\"1\"><strong>Refine<\/strong><\/li>\n<li aria-level=\"1\"><strong>Deploy\u00a0<\/strong><\/li>\n<\/ol>\n<p>As an example, let\u2019s apply the RL workflow to a robotic vacuum cleaner:<\/p>\n<ol>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Define the problem:<\/strong> We want to train a robotic vacuum cleaner to effectively clean a room by itself.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Set up the environment: <\/strong>We create a simulated environment that represents the room in which the robotic vacuum cleaner will operate, including the layout, furniture, and anything else in the room.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Create an agent: <\/strong>The robotic vacuum cleaner is the learner or agent. We equip it with the right technology to sense the environment and move around.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Start learning: <\/strong>At first, the robotic vacuum cleaner explores the room randomly, bumping into furniture or obstacles and cleaning parts of the room without a specific strategy. It gathers information about the room and how its actions affect cleanliness.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Receive feedback: <\/strong>After each action, the vacuum cleaner receives either a positive or a negative reward, shaping its decision-making process over time. For example, if it successfully avoids colliding with furniture, there is a positive reward, while a negative reward is given when the cleaner moves aimlessly or covers the same spot repeatedly.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Update the policy: <\/strong>Based on the received rewards, the robotic vacuum cleaner updates its decision-making strategy or policy to focus more on actions that lead to positive rather than negative rewards.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Refine: <\/strong>The robotic vacuum cleaner keeps exploring the room, taking actions, receiving feedback, and updating the policy. With each iteration, it improves its knowledge of which actions maximize cleaning efficiency and avoid obstacles. Gradually, it also adapts to different room layouts.<\/li>\n<li aria-level=\"1\"><strong>Deploy<\/strong>: Once the robotic vacuum has learned an effective policy, it applies it to clean the room autonomously.<\/li>\n<\/ol>\n<h2 id=\"other-methods\">Reinforcement learning compared to other methods<\/h2>\n<p>Reinforcement learning is a distinct approach to machine learning that significantly differs from the other two main approaches.<\/p>\n<h3 id=\"supervised-vs-reinforcement\">Supervised learning vs. reinforcement learning<\/h3>\n<p>In <a href=index-4708.html#what-is-supervised-learning\">supervised learning<\/a>, a human expert has labeled the dataset, which means that the correct answer is given. For example, the dataset could consist of images of different cars that an expert has labeled with the manufacturer of each car.<\/p>\n<p>The learning agent has a supervisor, who, like a teacher, provides the right answers. Through training with this labeled dataset, the agent receives feedback and learns how to classify new, unseen data (e.g., car photos) in the future.<\/p>\n<p>In reinforcement learning, data is not part of the input but is accumulated by interacting with the environment. Instead of telling the system in advance which actions are optimal to perform a task, reinforcement learning uses rewards and penalties. So the agent gets feedback once it takes an action.<\/p>\n<h3 id=\"unsupervised-vs-reinforcement\">Unsupervised learning vs. reinforcement learning<\/h3>\n<p><a href=index-4708.html#what-is-unsupervised-learning\">Unsupervised learning<\/a> deals with unlabeled data, and there is no feedback involved. The goal is to explore the dataset and find similarities, differences, or clusters in the input data without prior knowledge of the expected output.<\/p>\n<p>Reinforcement learning, on the other hand, involves exploration of the environment, not of a dataset, and the end goal is different: the agent tries to take the best possible action in a given situation to maximize the total reward. With no training dataset, the RL problem is solved by the agent\u2019s own actions with input from the environment.<\/p>\n<p>The following table shows the difference between supervised learning, unsupervised learning and reinforcement learning.<\/p>\n<table class=\"kb-table\">\n<thead>\n<tr>\n<th><\/th>\n<th>Supervised learning<\/th>\n<th>Unsupervised learning<\/th>\n<th>Reinforcement learning<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Input data<\/strong><\/td>\n<td><span style=\"font-weight: 400;\">Labeled: the \u201cright answer\u201d is included<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Unlabeled: no \u201cright answer\u201d specified<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Data are not part of the input, they are collected through trial and error<\/span><\/td>\n<\/tr>\n<tr>\n<td><strong>Problem to be solved<\/strong><\/td>\n<td><span style=\"font-weight: 400;\">Used to make a prediction (e.g., the future value of a stock) or a classification (e.g., correctly identifying spam emails)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Used to explore and discover patterns, structures, or relationships in large datasets (e.g., people who order product A also order product B)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Used to solve reward-based problems<\/span><\/p>\n<p><span style=\"font-weight: 400;\">(e.g., a video game)<\/span><\/td>\n<\/tr>\n<tr>\n<td><strong>Solution <\/strong><\/td>\n<td><span style=\"font-weight: 400;\">Maps input to output<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Finds similarities and differences in input data to classify it into classes<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Finds which states and actions would<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0maximize the total cumulative reward of the agent<\/span><\/td>\n<\/tr>\n<tr>\n<td><strong>General tasks <\/strong><\/td>\n<td><span style=\"font-weight: 400;\">Classification, regression<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Clustering, dimensionality reduction, association learning<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Exploration and exploitation<\/span><\/td>\n<\/tr>\n<tr>\n<td><strong>Examples <\/strong><\/td>\n<td><span style=\"font-weight: 400;\">Image detection, stock market prediction<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Customer segmentation, product recommendation<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Game playing, robotic vacuum cleaners\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td><strong>Supervision<\/strong><\/td>\n<td>Yes<\/td>\n<td>No<\/td>\n<td>No<\/td>\n<\/tr>\n<tr>\n<td><strong>Feedback<\/strong><\/td>\n<td>Y<span style=\"font-weight: 400;\">es. The correct set of actions is provided.<\/span><\/td>\n<td>No<\/td>\n<td><span style=\"font-weight: 400;\">Yes, through rewards and punishments (positive and negative rewards)<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2 id=\"benefits-challenges\">Reinforcement learning benefits and challenges<\/h2>\n<p>Reinforcement learning has several benefits as a training method:<\/p>\n<h3 id=\"benefits\">Benefits<\/h3>\n<ul class=\"kb-checklist\">\n<li class=\"pb-2\" aria-level=\"1\"><strong>Complexity.<\/strong> RL can be used to solve very complex problems involving high uncertainty, in many cases surpassing human performance. For example, an AI program called AlphaGo was the first computer program to defeat a human world champion in the ancient Chinese game of Go and is the strongest Go player in history.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Adaptability. <\/strong>RL can handle environments in which the outcomes of actions are not always predictable. This is handy for real-world applications where the environment may change over time or is uncertain.<\/li>\n<li aria-level=\"1\"><strong>Independent decision-making.<\/strong> With reinforcement learning, intelligent systems can make decisions on their own without human intervention. They can learn from their experiences and adapt their behavior to achieve specific goals.<\/li>\n<\/ul>\n<h3 id=\"challenges\">Challenges and limitations<\/h3>\n<p>However, it also comes with certain challenges and limitations:<\/p>\n<ul class=\"kb-errorlist\">\n<li class=\"pb-2\" aria-level=\"1\"><strong>Large data requirements.<\/strong> Reinforcement learning is \u201cdata-hungry\u201d: it requires even more data than supervised learning, as well as many interactions, to learn effectively. Getting enough training data is time- and resource-consuming. In some cases, testing RL systems like autonomous vehicles solely in a real-world environment can be dangerous.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Complexity of the real world.<\/strong> In real life, feedback might be delayed: for example, it may take months or years to know whether an investment decision paid off or not. Also, in an environment like a game world, the conditions under which the agent repeats its decision process don\u2019t change, which is far from the realities of life.<\/li>\n<li aria-level=\"1\"><strong>Difficulty of designing effective rewards.<\/strong> Data scientists may struggle to mathematically express a reward so that it mirrors how a certain action will help the agent get closer to a final goal. For example, if we want to teach a car to make a turn without hitting the curb, the reward function should take into account factors like the distance between the car and the curb and the start of the steering action. In other words, the closer the car gets to the curb, the lower the reward should be, to minimize the chance of collisions.<\/li>\n<\/ul>\n<h2 id=\"other\">Other interesting articles<\/h2>\n<p>If you want more tips on <a href=index-3668.html>using AI tools<\/a>, <a href=index-1154.html>understanding plagiarism<\/a>, and <a href=index-1775.html>citing sources<\/a>, make sure to check out some of our other articles with explanations, examples, and formats.<\/p>\n<div class=\"container\">\n<div class=\"row\">\n<div class=\"col-md\" style=\"background-color: #e8f2fc; padding: 1.27rem 1.56rem; line-height: 1.8; border-radius: .5rem; margin-right: 0.5rem; margin-top: 0.5rem;\">\n<p><strong><em class=\"fa fa-laptop\" aria-hidden=\"true\"><\/em>Using AI tools<\/strong><\/p>\n<ul style=\"padding: 5px;\">\n<li><a href=index-3670.html>Citing ChatGPT<\/a><\/li>\n<li><a href=index-3748.html>Best grammar checker<\/a><\/li>\n<li><a href=index-3749.html>Best paraphrasing tool<\/a><\/li>\n<li><a href=index-3672.html>ChatGPT in your studies<\/a><\/li>\n<li><a href=index-1585.html>Deep learning<\/a><\/li>\n<\/ul>\n<\/div>\n<div class=\"col-md\" style=\"background-color: #e8f2fc; padding: 1.27rem 1.56rem; line-height: 1.8; border-radius: .5rem; margin-right: 0.5rem; margin-top: 0.5rem;\">\n<p><strong><em class=\"fa fa-university\" aria-hidden=\"true\"><\/em> Plagiarism<\/strong><\/p>\n<ul style=\"padding: 5px;\">\n<li><a href=index-1837.html>Types of plagiarism<\/a><\/li>\n<li><a href=index-1838.html>Self-plagiarism<\/a><\/li>\n<li><a href=index-1792.html>Avoiding plagiarism<\/a><\/li>\n<li><a href=index-1840.html>Academic integrity<\/a><\/li>\n<li><a href=index-1835.html>Best plagiarism checker<\/a><\/li>\n<\/ul>\n<\/div>\n<div class=\"col-md\" style=\"background-color: #e8f2fc; padding: 1.27rem 1.56rem; line-height: 1.8; border-radius: .5rem; margin-right: 0.5rem; margin-top: 0.5rem;\">\n<p><strong><em class=\"fa fa-book\" aria-hidden=\"true\"><\/em> Citing sources<\/strong><\/p>\n<ul style=\"padding: 5px;\">\n<li><a href=index-1052.html>Citation styles<\/a><\/li>\n<li><a href=index-3579.html>In-text citation<\/a><\/li>\n<li><a href=index-3348.html>Footnotes<\/a><\/li>\n<li><a href=index-3707.html>Citation examples<\/a><\/li>\n<li><a href=index-1755.html>Annotated bibliography<\/a><\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n    <h2>Frequently asked questions about reinforcement learning<\/h2>\n<dl class=\"faq-list faq-list--arrows w-100 my-5\" itemscope itemtype=\"https:\/\/schema.org\/FAQPage\">\n            <div itemscope itemprop=\"mainEntity\" itemtype=\"https:\/\/schema.org\/Question\">\n            <dt id=\"what-are-some-real-life-applications-of-reinforcement-learning\" class=\"faq-list__title\">\n                <a href=index-4709.html class=\"faq-list__link qa-faq-anchor\">\n                    <span itemprop=\"name\">\n                        What are some real-life applications of reinforcement learning?\n                    <\/span>\n                <\/a>\n            <\/dt>\n            <dd id=\"what-are-some-real-life-applications-of-reinforcement-learning-answer\" class=\"faq-list__answer qa-faq-answer kb-article-styles\" itemscope itemprop=\"acceptedAnswer\" itemtype=\"https:\/\/schema.org\/Answer\">\n                <div itemprop=\"text\">\n                                        <p>Some real-life applications of <a href=index-1588.html>reinforcement learning<\/a> include:<\/p>\n<ul>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Healthcare. <\/strong>Reinforcement learning can be used to create personalized treatment strategies, known as dynamic treatment regimes (DTRs), for patients with long-term illnesses. The input is a set of clinical observations and assessments of a patient. The outputs are the treatment options or drug dosages for every stage of the patient\u2019s journey.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Education. <\/strong>Reinforcement learning can be used to create personalized learning experiences for students. This includes tutoring systems that adapt to student needs, identify knowledge gaps, and suggest customized learning trajectories to enhance educational outcomes.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Natural language processing (NLP)<\/strong>. Text summarization, question answering, machine translation, and predictive text are all NLP applications using reinforcement learning.<\/li>\n<li aria-level=\"1\"><strong style=\"font-size: 1rem;\">Robotics. <\/strong><span style=\"font-size: 1rem;\">Deep learning and reinforcement learning can be used to train robots that have the ability to grasp various objects\u200a, even objects they have never encountered before. This can, for example, be used in the context of an assembly line.<\/span><\/li>\n<\/ul>\n\n                <\/div>\n            <\/dd>\n        <\/div>\n            <div itemscope itemprop=\"mainEntity\" itemtype=\"https:\/\/schema.org\/Question\">\n            <dt id=\"what-is-deep-reinforcement-learning\" class=\"faq-list__title\">\n                <a href=index-4710.html class=\"faq-list__link qa-faq-anchor\">\n                    <span itemprop=\"name\">\n                        What is deep reinforcement learning?\n                    <\/span>\n                <\/a>\n            <\/dt>\n            <dd id=\"what-is-deep-reinforcement-learning-answer\" class=\"faq-list__answer qa-faq-answer kb-article-styles\" itemscope itemprop=\"acceptedAnswer\" itemtype=\"https:\/\/schema.org\/Answer\">\n                <div itemprop=\"text\">\n                                        <p><strong>Deep reinforcement learning<\/strong> is the combination of <a href=index-1585.html><strong>deep learning<\/strong><\/a> and <a href=index-1588.html><strong>reinforcement learning<\/strong><\/a>.<\/p>\n<ul>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Deep learning<\/strong> is a collection of techniques using artificial neural networks that mimic the structure of the human brain. With deep learning, computers can recognize complex patterns in large amounts of data, extract insights, or make predictions, without being explicitly programmed to do so. The training can consist of <a href=index-4708.html#what-is-supervised-learning\">supervised learning<\/a>, <a href=index-4708.html#what-is-unsupervised-learning\">unsupervised learning<\/a>, or reinforcement learning.<\/li>\n<li class=\"pb-2\" aria-level=\"1\"><strong>Reinforcement learning (RL)<\/strong> is a learning mode in which a computer interacts with an environment, receives feedback and, based on that, adjusts its decision-making strategy.<\/li>\n<li aria-level=\"1\"><strong>Deep reinforcement learning<\/strong> is a specialized form of RL that utilizes deep neural networks to solve more complex problems. In deep reinforcement learning, we combine the pattern recognition strengths of deep learning and neural networks with the feedback-based learning of RL.<\/li>\n<\/ul>\n\n                <\/div>\n            <\/dd>\n        <\/div>\n            <div itemscope itemprop=\"mainEntity\" itemtype=\"https:\/\/schema.org\/Question\">\n            <dt id=\"what-is-the-exploration-vs-exploitation-trade-off-in-reinforcement-learning\" class=\"faq-list__title\">\n                <a href=index-4711.html class=\"faq-list__link qa-faq-anchor\">\n                    <span itemprop=\"name\">\n                        What is the exploration vs exploitation trade off in reinforcement learning?\n                    <\/span>\n                <\/a>\n            <\/dt>\n            <dd id=\"what-is-the-exploration-vs-exploitation-trade-off-in-reinforcement-learning-answer\" class=\"faq-list__answer qa-faq-answer kb-article-styles\" itemscope itemprop=\"acceptedAnswer\" itemtype=\"https:\/\/schema.org\/Answer\">\n                <div itemprop=\"text\">\n                                        <p>A key challenge that arises in <strong><a href=index-1588.html>reinforcement learning<\/a> (RL)<\/strong> is the trade-off between <strong>exploration and exploitation<\/strong>. This challenge is unique to RL and doesn\u2019t arise in <a href=index-1589.html>supervised or unsupervised learning<\/a>.<\/p>\n<p><strong>Exploration<\/strong> is any action that lets the agent discover new features about the environment, while <strong>exploitation<\/strong> is capitalizing on knowledge already gained. If the agent continues to exploit only past experiences, it is likely to get stuck in a suboptimal policy. On the other hand, if it continues to explore without exploiting, it might never find a good policy.<\/p>\n<p>An agent must find the right balance between the two so that it can discover the optimal policy that yields the maximum rewards.<\/p>\n\n                <\/div>\n            <\/dd>\n        <\/div>\n    <\/dl>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Reinforcement learning (RL) is a branch of machine learning that focuses on training computers to make optimal decisions by interacting with their environment. Instead of being given explicit instructions, the computer learns through trial and error: by exploring the environment and receiving rewards or punishments for its actions. Together with supervised and unsupervised learning, reinforcement [&hellip;]<\/p>\n","protected":false},"author":157,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":""},"categories":[50399],"tags":[],"acf":[],"yoast_head":"<title>Easy Introduction to Reinforcement Learning<\/title>\n
<!-- Mirrored from www.scribbr.com/wp-json/wp/v2/posts/529172 by HTTrack Website Copier/3.x [XR/YP'2000] -->
<!-- Added by HTTrack --><meta http-equiv="content-type" content="text/html;charset=UTF-8" /><!-- /Added by HTTrack -->
<meta name=\"description\" content=\"Reinforcement learning is a machine learning method that trains computers to make independent decisions by interacting with the environment.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=index-1588.html \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Easy Introduction to Reinforcement Learning\" \/>\n<meta property=\"og:description\" content=\"Reinforcement learning (RL) is a branch of machine learning that focuses on training computers to make optimal decisions by interacting with their\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"Scribbr\" \/>\n<meta property=\"article:published_time\" content=\"2023-07-14T10:28:05+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-08-15T15:06:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.scribbr.com\/wp-content\/uploads\/2023\/08\/the-general-framework-of-reinforcement-learning.webp\" \/>\n<meta name=\"author\" content=\"Kassiani Nikolopoulou\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kassiani Nikolopoulou\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/\"},\"author\":{\"name\":\"Kassiani Nikolopoulou\",\"@id\":\"https:\/\/www.scribbr.com\/#\/schema\/person\/d6d5d7294039e1bd4527716983a6a4c5\"},\"headline\":\"Easy Introduction to Reinforcement Learning\",\"datePublished\":\"2023-07-14T10:28:05+00:00\",\"dateModified\":\"2023-08-15T15:06:38+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/\"},\"wordCount\":2163,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.scribbr.com\/#organization\"},\"articleSection\":[\"Using AI tools\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/\",\"url\":\"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/\",\"name\":\"Easy Introduction to Reinforcement Learning\",\"isPartOf\":{\"@id\":\"https:\/\/www.scribbr.com\/#website\"},\"datePublished\":\"2023-07-14T10:28:05+00:00\",\"dateModified\":\"2023-08-15T15:06:38+00:00\",\"description\":\"Reinforcement learning is a machine learning method that trains computers to make independent decisions by interacting with the environment.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.scribbr.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Easy Introduction to Reinforcement Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.scribbr.com\/#website\",\"url\":\"https:\/\/www.scribbr.com\/\",\"name\":\"Scribbr\",\"description\":\"The checkpoint for your thesis\",\"publisher\":{\"@id\":\"https:\/\/www.scribbr.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.scribbr.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.scribbr.com\/#organization\",\"name\":\"Scribbr\",\"url\":\"https:\/\/www.scribbr.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.scribbr.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.scribbr.com\/wp-content\/uploads\/2019\/08\/scribbr-logo.png\",\"contentUrl\":\"https:\/\/www.scribbr.com\/wp-content\/uploads\/2019\/08\/scribbr-logo.png\",\"width\":902,\"height\":212,\"caption\":\"Scribbr\"},\"image\":{\"@id\":\"https:\/\/www.scribbr.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.instagram.com\/scribbr_\",\"http:\/\/www.linkedin.com\/company\/scribbr\",\"https:\/\/www.youtube.com\/c\/Scribbr-us\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.scribbr.com\/#\/schema\/person\/d6d5d7294039e1bd4527716983a6a4c5\",\"name\":\"Kassiani Nikolopoulou\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.scribbr.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e94cdba0c3753e26a91ed7b282602703?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e94cdba0c3753e26a91ed7b282602703?s=96&d=mm&r=g\",\"caption\":\"Kassiani Nikolopoulou\"},\"description\":\"Kassiani has an academic background in Communication, Bioeconomy and Circular Economy. As a former journalist she enjoys turning complex scientific information into easily accessible articles to help students. She specializes in writing about research methods and research bias.\",\"url\":\"https:\/\/www.scribbr.com\/author\/kassianin\/\"}]}<\/script>","yoast_head_json":{"title":"Easy Introduction to Reinforcement Learning","description":"Reinforcement learning is a machine learning method that trains computers to make independent decisions by interacting with the environment.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/","og_locale":"en_US","og_type":"article","og_title":"Easy Introduction to Reinforcement Learning","og_description":"Reinforcement learning (RL) is a branch of machine learning that focuses on training computers to make optimal decisions by interacting with their","og_url":"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/","og_site_name":"Scribbr","article_published_time":"2023-07-14T10:28:05+00:00","article_modified_time":"2023-08-15T15:06:38+00:00","og_image":[{"url":"https:\/\/www.scribbr.com\/wp-content\/uploads\/2023\/08\/the-general-framework-of-reinforcement-learning.webp"}],"author":"Kassiani Nikolopoulou","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kassiani Nikolopoulou","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/#article","isPartOf":{"@id":"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/"},"author":{"name":"Kassiani Nikolopoulou","@id":"https:\/\/www.scribbr.com\/#\/schema\/person\/d6d5d7294039e1bd4527716983a6a4c5"},"headline":"Easy Introduction to Reinforcement Learning","datePublished":"2023-07-14T10:28:05+00:00","dateModified":"2023-08-15T15:06:38+00:00","mainEntityOfPage":{"@id":"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/"},"wordCount":2163,"commentCount":0,"publisher":{"@id":"https:\/\/www.scribbr.com\/#organization"},"articleSection":["Using AI tools"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/","url":"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/","name":"Easy Introduction to Reinforcement Learning","isPartOf":{"@id":"https:\/\/www.scribbr.com\/#website"},"datePublished":"2023-07-14T10:28:05+00:00","dateModified":"2023-08-15T15:06:38+00:00","description":"Reinforcement learning is a machine learning method that trains computers to make independent decisions by interacting with the environment.","breadcrumb":{"@id":"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.scribbr.com\/ai-tools\/reinforcement-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.scribbr.com\/"},{"@type":"ListItem","position":2,"name":"Easy Introduction to Reinforcement Learning"}]},{"@type":"WebSite","@id":"https:\/\/www.scribbr.com\/#website","url":"https:\/\/www.scribbr.com\/","name":"Scribbr","description":"The checkpoint for your thesis","publisher":{"@id":"https:\/\/www.scribbr.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.scribbr.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.scribbr.com\/#organization","name":"Scribbr","url":"https:\/\/www.scribbr.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.scribbr.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.scribbr.com\/wp-content\/uploads\/2019\/08\/scribbr-logo.png","contentUrl":"https:\/\/www.scribbr.com\/wp-content\/uploads\/2019\/08\/scribbr-logo.png","width":902,"height":212,"caption":"Scribbr"},"image":{"@id":"https:\/\/www.scribbr.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.instagram.com\/scribbr_","http:\/\/www.linkedin.com\/company\/scribbr","https:\/\/www.youtube.com\/c\/Scribbr-us\/"]},{"@type":"Person","@id":"https:\/\/www.scribbr.com\/#\/schema\/person\/d6d5d7294039e1bd4527716983a6a4c5","name":"Kassiani Nikolopoulou","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.scribbr.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e94cdba0c3753e26a91ed7b282602703?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e94cdba0c3753e26a91ed7b282602703?s=96&d=mm&r=g","caption":"Kassiani Nikolopoulou"},"description":"Kassiani has an academic background in Communication, Bioeconomy and Circular Economy. As a former journalist she enjoys turning complex scientific information into easily accessible articles to help students. She specializes in writing about research methods and research bias.","url":"https:\/\/www.scribbr.com\/author\/kassianin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.scribbr.com\/wp-json\/wp\/v2\/posts\/529172"}],"collection":[{"href":"https:\/\/www.scribbr.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.scribbr.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.scribbr.com\/wp-json\/wp\/v2\/users\/157"}],"replies":[{"embeddable":true,"href":"https:\/\/www.scribbr.com\/wp-json\/wp\/v2\/comments?post=529172"}],"version-history":[{"count":5,"href":"https:\/\/www.scribbr.com\/wp-json\/wp\/v2\/posts\/529172\/revisions"}],"predecessor-version":[{"id":564703,"href":"https:\/\/www.scribbr.com\/wp-json\/wp\/v2\/posts\/529172\/revisions\/564703"}],"wp:attachment":[{"href":"https:\/\/www.scribbr.com\/wp-json\/wp\/v2\/media?parent=529172"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.scribbr.com\/wp-json\/wp\/v2\/categories?post=529172"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.scribbr.com\/wp-json\/wp\/v2\/tags?post=529172"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}