Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Simplified method of how to use Python

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article shares with you a simplified way to use Python. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

What is the goal of logical regression?

In logical regression, we want to model the dependent variable (Y) based on one or more independent variables (X). This is a method of classification. This algorithm is used to classify dependent variables. Y is modeled using a function that provides output between 0 and 1 for all values of X. In logical regression, the Sigmoid (aka Logistic) function is used.

Model Evaluation using obfuscation Matrix

After training the logical regression model for some training data, we will evaluate the performance of the model on some test data. To do this, we use obfuscation matrix (Confusion Matrix). An obfuscation matrix is a table that is often used to describe the performance of a classification model on a set of test data with known real values. The confusion matrix is given below.

TP represents a real example (True Positive), that is, a situation where we predict "yes" and the actual value is "true". TN stands for true negative example (True Negative), that is, when we predict "no" and the actual value is "false". FP stands for False Positive, which is what we predict as "yes" and the actual value is "false". FN stands for false negative example (False Negative). This is a case where we predict "no" and the actual value is "true".

What do we infer from the confusion matrix?

The confusion matrix helps us to determine whether the model prediction is correct, or in other words, the accuracy of the model. Through the table above, it gives:

(TP+TN) / Total = 100,50,165 = 0.91,

This means that the accuracy of the model is 91%. The confusion matrix is also used to measure the error rate, which is given by the following formula:

(FP+ FN) / Total=15/165 = 0.09

There are 9% errors in the model.

In this article, we will deal with very simple steps in python to simulate logical regression.

Python code is explained in detail

We will observe the data, analyze the data, visualize it, clean up the data, build a logical regression model, divide it into training and test data, predict and finally evaluate it. All of this will be done step by step, and the data we are going to deal with is the Titanic data set provided by kaggle.com. This is a very famous dataset and is usually the first step for students to learn machine learning based on classification. We're trying to predict categories: survival or death.

First, we will import the numpy and pandas libraries:

Let's do a visual import:

We will continue to import the Titanic dataset into pandas data frames. After that, we will check the header of the data box to clearly understand all the columns in the data box.

Most of the data we encounter is lack of data. We will examine the missing data and visualize it to get better ideas and delete them.

Here, we find the Boolean value. True indicates that the value is null,False represents a negative value, and vice versa. Because of the large amount of data, we use the seaborn library to display null values. In this case, our task becomes easier.

Age (Age) and cabin (Cabin) columns have null values. I dealt with the problem of dealing with na values in my previous blog. If you are interested, you can check it.

It is a good practice to use data and make full use of visualization libraries to obtain data.

This is a count chart showing the number of survivors, which is our target variable. In addition, we can draw counting charts according to gender (SEX) and passenger (train) categories.

Here, we see a trend that women survive more than men.

As can be seen from the picture above, passengers belonging to level 3 have the highest number of deaths.

We can visualize the data in more ways. However, I am not discussing them here because we need to go into the steps of model building.

Data cleaning

We want to fill in the missing Age data instead of just deleting the missing Age data rows. One way is to fill in the average age of all passengers (train) (estimated). However, we can check the average age by passenger (train) level more wisely. For example:

We can see that the wealthier passengers (train) in the higher classes tend to be older, which is reasonable. We will use these average age values to estimate the Pclass of the age.

Now apply this feature!

Now let's check the heat map again.

Fine! Let's continue to look at the Cabin train.

Conversion classification function

We need to use the pandas library to convert classification features into virtual variables! Otherwise, our machine learning algorithm will not be able to directly take these features as input.

Here, we are screening the gender and listing the columns. After filtering, we will discard other unwanted columns.

We will connect the new gender and import the columns into the data box.

Now, the data box looks like this:

Test and training division

Training and forecasting

Evaluation

We can use classification reports to check accuracy, recall rates, and F1 scores

Thank you for reading! This is the end of this article on "how to simplify the use of Python". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report