{"id":524324,"date":"2023-07-06T12:13:32","date_gmt":"2023-07-06T10:13:32","guid":{"rendered":"https:\/\/www.scribbr.nl\/?p=524324"},"modified":"2023-08-21T12:26:25","modified_gmt":"2023-08-21T10:26:25","slug":"supervised-vs-unsupervised-learning","status":"publish","type":"post","link":"https:\/\/www.scribbr.com\/ai-tools\/supervised-vs-unsupervised-learning\/","title":{"rendered":"Supervised vs. Unsupervised Learning: Key Differences"},"content":{"rendered":"
There are two main approaches to machine learning: supervised<\/strong> and unsupervised learning<\/strong>. The main difference between the two is the type of data used to train the computer. However, there are also more subtle differences.<\/p>\n Machine learning<\/a> is the process of training computers using large amounts of data so that they can learn how to independently complete tasks associated with human intelligence (e.g., translating, making recommendations).\u00a0<\/strong><\/p>\n Two key aspects of machine learning are data<\/strong> and algorithms<\/strong>. Any type of information that can be used as an input by a computer (text, images, audio etc.) is data. An algorithm is a set of instructions given to a computer so that it processes the data and learns from it. Data and algorithms (combined through training) make up the machine learning model<\/strong>.<\/p>\n <\/a><\/p>\n <\/p>\n Supervised learning<\/strong> involves a human \u201cteacher\u201d or \u201csupervisor.\u201d Their role is to feed the computer with labeled data or examples consisting of a combination of problems and solutions.<\/p>\n With supervised learning, a human expert would go through a database of images and label each one of them as either \u201ccat\u201d or \u201cdog.\u201d Then, the expert would feed this labeled dataset into the computer, and the computer would process the images one by one to learn by itself which characteristics constitute a cat and which ones constitute a dog (similar to how toddlers learn).<\/p>\n Once the training is done, the computer is able to recognize new images of cats and dogs.<\/figure>\n In supervised learning, the aim is to make sense of data within the context of a specific question or problem (such as \u201cidentify images of cats\u201d). By giving the computer lots of examples (in this case, images) along with the correct answers (i.e., whether it\u2019s a dog or a cat in the image), the computer learns to correctly identify new data.<\/p>\n Supervised machine learning is used for two types of problems or tasks:<\/p>\n Both classification and regression are used for prediction and work with labeled datasets. However, their difference lies in the nature of the output they aim to predict. For example, predicting whether an employee will get a raise is a classification problem, while predicting how much their salary ought to increase is a regression problem.<\/p>\n Classification <\/strong>is used to categorize input data into predefined classes or categories. By training with labeled data, the computer learns to recognize and differentiate various features or characteristics associated with each class.<\/p>\n For example, in image classification, the goal may be to identify objects in an image. Similarly, classification can be used to predict discrete outcomes, like determining whether it will rain on a given day.<\/p>\n Examples of classification problems include:<\/p>\n Regression<\/strong> is a type of classification where we forecast a number instead of a category.<\/p>\n With regression, the predicted outcomes are real values, such as the expected price of a house (based on information like square footage or location). Regression can assist companies with sales predictions by considering variables such as weather, social media presence, or inbound tourists.<\/p>\n Examples of regression problems include:<\/p>\n In both regression and classification, the goal is to find specific relationships or patterns in the input data that allow the computer to effectively generate correct output data.<\/p>\n For example, if we want to predict customer satisfaction with a certain product, we can assign each satisfaction level a number from 1 to 10, or we can create specific categories, such as \u201cvery satisfied,\u201d \u201csatisfied,\u201d etc.<\/p>\n In other words, we could approach this either as a regression task (i.e., predicting the numerical satisfaction rating), or a classification task (i.e., assigning each customer to the appropriate satisfaction category based on their predicted satisfaction level).<\/p>\n The choice between the two depends on the nature of the data, the problem formulation, and the type of output or solution we want.<\/figure>\n Unsupervised learning<\/strong> is used when there is no labeled data or instructions for the computer to follow. Instead, the computer tries to identify the underlying structure or patterns in the data without any assistance.<\/p>\n The company can take this raw data and apply an unsupervised learning algorithm to discover hidden patterns and similarities within the data.<\/p>\n The algorithm can group similar customers together based on shared characteristics, allowing for the identification of distinct segments that can inform future marketing campaigns (e.g., personalized recommendations).<\/figure>\n Unsupervised learning is valuable for exploratory analysis<\/strong>, where the goal is to automatically discover hidden patterns in data.<\/p>\n Unsupervised learning is used for three main tasks:<\/p>\n In each of these tasks, we want to discover the inherent structure of our data for which no predefined categories or labels exist.<\/p>\n Clustering<\/strong> is a machine learning technique for grouping unlabeled data based on their similarities or differences. Clustering helps us find patterns in the data even if we don\u2019t know what we are looking for.<\/p>\n Sorting customers into different segments, for example, is a clustering problem: it involves discovering inherent groups in the data. Clustering is like dividing a pile of books per genre or topic, without knowing anything about those books in advance. You go through the books one by one and, if they are similar, put them in the same group.<\/p>\n Examples of clustering problems include:<\/p>\n Association<\/strong> focuses on identifying co-occurrence or dependencies between items without the presence of predefined labels or outcomes.<\/p>\n Association analysis is commonly used to find interesting associations or rules (e.g., in market basket analysis where the goal is to identify frequently co-purchased items, such as goods commonly bought together in a grocery store).<\/p>\n The result or output of association analysis is usually in the form of \u201cif X, then Y,\u201d indicating that when product X appears (e.g., cappuccino), there is a high likelihood of Y also being present (e.g., a muffin).<\/p>\n Examples of association problems include:<\/p>\n Dimensionality reduction<\/strong> is a technique used in machine learning when we have a lot of information to handle. It helps by reducing the number of inputs or features while still keeping the important parts of the data. This makes the data easier to work with and understand. For example, it can clean up images to make them look better.<\/p>\n Examples of dimensionality reduction problems include:<\/p>\n The differences between supervised and unsupervised learning are summarized in the table below:<\/p>\nWhat is supervised learning?<\/h2>\n
Supervised machine learning methods<\/h2>\n
\n
Classification<\/h3>\n
\n
Regression<\/h3>\n
\n
What is unsupervised learning?<\/h2>\n
Unsupervised machine learning methods<\/h2>\n
\n
Clustering<\/h3>\n
\n
Association<\/h3>\n
\n
Dimensionality reduction<\/h3>\n
\n
Differences between supervised and unsupervised learning<\/h2>\n