Visualization of decision boundaries to make your classification reasonable and orderly 04/16 Update SLTechnology News&Howtos

Visualization of decision boundaries to make your classification reasonable and orderly

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Author-Navoneel Chakrabarty

In the field of data science, classification is a very common and important problem. For example: diabetic retinopathy, emotion analysis, digital recognition, cancer type prediction (malignant or benign), etc. These problems are often solved by machine learning or deep learning. In programs such as diabetic retinopathy or glaucoma detection, texture analysis is often used rather than traditional image processing or deep learning. Although according to the research paper, deep learning has a strong advantage in dealing with diabetic retinopathy.

Supplement: texture analysis refers to the process of extracting texture feature parameters to obtain quantitative or qualitative description of texture. according to its properties, texture analysis methods can be divided into two categories: statistical analysis methods and structure analysis methods. Texture analysis is widely used in remote sensing images, X-ray, cells and processing. There is no unified mathematical model for texture. It originates from the concept of texture which characterizes the surface properties of textiles and can be used to describe the arrangement of any material components, such as the lithologic texture in medical X-ray photos, vascular texture, aerospace (or aviation) topographic photos, and so on. The visual texture in image processing is usually understood as the repeated arrangement of some basic pattern (tone primitive).

Now, let's get back to the subject. The related research papers are as follows:

Link to "Deep Learning methods for the Detection of Diabetic Retinopathy": https://ieeexplore.ieee.org/document/8596839

In the problem of classification, the prediction of a particular class often involves multiple classes. In other words, it can also be built in such a way that specific instances (data points in feature space geometry) are stored under a specific area (one class) and separated from other regions (other classes). This "phenomenon" of separation from other regions is called the visualization of decision boundaries. The visualization of the decision boundary in the feature space is completed on the scatter graph. Each point represents a data point of the data set, and the axis represents the feature. The decision boundary divides the data point into multiple areas that are the classes to which the data point belongs (as I mentioned earlier).

Importance / significance of decision boundaries:

After using datasets to train machine learning models, we usually need to visualize the classes of data points in the feature space. The decision boundary on the scatter graph is for this purpose. The scatter chart contains data points belonging to different categories (represented by colors or shapes), and decision boundaries can be drawn through a variety of different strategies:

Single-line decision boundary: the basic strategy for drawing decision boundaries on scatter maps is to find a single line that divides data points into different regions. Now, use the trained model to find the parameters related to the machine learning algorithm, and then find the straight line. Then the straight line coordinates are found by using the obtained parameters and machine learning algorithm. If you don't know how the ML algorithm works, you won't be able to continue.

Contour-based decision boundaries: another strategy is to draw contours, which surround the area of the data point with matched or closely matched colors-- depicting the class to which the data point belongs, and outlining the prediction class. This is the most commonly used strategy because it does not use the parameters and related calculations of the machine learning algorithm obtained after model training. But on the other hand, we can not use a straight line to separate the data points, that is to say, the straight line can only be calculated from the parameters and their coordinates after training.

An example of a single-line decision boundary exercise:

Here, I will demonstrate the single-line decision boundary of the machine learning model based on logical regression.

Enter the logical regression hypothesis

Where z is defined as:

Theta_1 、 theta_2 theta_3,.... Among them, theta_n is the parameter of logical regression. , xanthn is the characteristic.

Therefore, h (z) is a Sigmoid function that ranges from 0 to 1 (inclusive).

When drawing decision boundaries, take h (z) = the threshold used in Logistic regression, which is usually 0.5. That is:

Then:

Now, when drawing the decision boundary, you need to consider two features and draw along the x-axis and y-axis of the scatter plot. So,

That is to say

Where xroom1 is the original feature of the dataset

Therefore, two values of x'_ 1 and two corresponding values of x'_ 2 are obtained. X'1 is the x extreme value of the single line decision boundary, and x'2 is the y extreme value of the single line decision boundary.

Application to fictitious datasets:

The dataset contains the scores of 100 students in two exams and a label (1d0) indicating whether the student will be admitted to the university (represented by 1s and zeros). Related dataset link: https://github.com/navoneel1092283/logistic_regression.git

Problem statement: "based on the scores obtained in the two exams, logical regression is used to predict whether the student will be admitted to the university."

Here, the scores of the two exams will be two characteristics to be considered.

Here are the specific implementation methods:

Related link: https://hackernoon.com/logistic-regression-in-python-from-scratch-954c0196d258

Logical regression of the dataset:

Get (parameter) vector

Get the prediction or prediction class of the data point:

Draw a single-line decision boundary:

Get the boundary of single line decision

In this way, any machine learning model based on logical regression can draw a single-line decision boundary. For models based on other machine learning algorithms, it is necessary to know the corresponding assumptions and so on.

Example walkthrough of contour-based decision boundaries:

Using the data set and training model just now, the decision boundary based on contour is drawn.

Get the contour-based decision boundary, in which yellow-> allows, blue-> disallows

This method is obviously more convenient because machine learning algorithms do not need assumptions or mathematics. All you need is a knack for advanced Python programming.

Therefore, it is a general method to draw the decision boundary of any machine learning model.

In real life and in some advanced projects, many features are involved. So, how to draw the decision boundary in the two-dimensional scatter diagram?

In the face of this situation, I think there are a variety of solutions:

1. The random forest classifier is used to score the importance of the features, and the two most important features are obtained, and then the decision boundary is drawn on the scatter map.

two。 Dimensionality reduction techniques such as principal component analysis (PCA) or linear discriminant analysis (LDA) can be used to embed N features into 2 features, so that the information of N features can be interpreted or reduced to 2 features (n_components = 2). Then the decision boundary is drawn on the scatter graph based on these two features.

This is the visualization of decision boundaries.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.