What is the method of Python data analysis? 04/19 Update SLTechnology News&Howtos

What is the method of Python data analysis?

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "what is the method of Python data analysis". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

01 index disassembly

In the TGI formula, there are three key points that need to be further disassembled: a feature, a population, and a target group.

Take any chestnut, suppose we want to study the TGI index of hair loss of Company A:

A certain characteristic is a certain behavior or state that we want to analyze, here is hair loss (or suffering from hair loss).

In general, it is all the subjects we study, that is, the owners of Company A.

The target group is a group that we are interested in as a whole, assuming that the group we focus on is the data department, then the target group is the data department.

Therefore, the molecule in the formula "the proportion of people with certain characteristics in the target group" can be understood as "the proportion of hair loss in the data department". Suppose that there are 15 people in the data department and 9 people suffer from hair loss, the proportion of hair loss in the data department is 9x15, which is equal to 60%.

On the other hand, the denominator "the proportion of groups with the same characteristics" is equivalent to "the proportion of people suffering from hair loss in the total number of people in the company". Assuming that there are 500 people in the company and 120 people suffer from hair loss, the proportion is 24%.

Therefore, the data department hair loss TGI index can be used as 60 / 24% * 100 = 250. the calculation logic of the hair loss TGI index of other departments is the same, using the proportion of hair loss in this department / the proportion of hair loss in the company * 100.

A TGI index greater than 100 indicates that a certain type of users have more corresponding tendencies or preferences, and the higher the value, the stronger the tendency and preference; less than 100 means that the correlation tendency of this kind of users is weak (compared with the average); and equal to 100 means that the correlation tendency of this kind of users is in the average level.

In the example we just made up, the TGI index of hair loss in our data department is 250, which is much higher than 100. it seems that the risk of hair loss is very high, and the data is the real driver of the hairline.

Next, we use a case to consolidate our conceptual understanding and fight with Master Pandas by the way.

02 TGI example analysis

Project background

BOSS tossed a detail of the order, "Xiao Z, we are going to launch a product with a higher guest order recently, and we intend to try it out in some cities first. Take a look at this data. People in which cities have a preference for high passenger orders, please help me screen five."

Xiao Z quickly opened the table to see what the data looked like:

Order data includes brand name, buyer name, payment time, order status and region and other fields, a total of 28832 pieces of data, there is no blank value.

After a cursory glance at the source data, Xiao Z quickly made clear the data requirements: "Leader, what is the definition of the higher guest list?"

According to our product line and historical data, a single purchase of more than 50 yuan is considered a high customer order.

After confirming the high order, our goal is very clear: to rank the cities according to their preference. The preference here can be measured by the TGI index, so let's review the three core points of TGI again:

Characteristic, high order, that is, customers buy more than 50 yuan at a time.

The target group is each city, where we can calculate the high order preference of customers in all cities.

As for the overall, it is very straightforward, and all the customers involved in the calculation are the total.

The key to solving the problem is to calculate the number and proportion of high passenger orders in different cities.

Single user marking

In the first step, we first determine whether each user belongs to a high-order group, so we first group according to the user's nickname to see the average amount paid by each user. The average is used here because some customers buy many times, and the amount of order issued each time is not the same, so it is average.

Next, define a judgment function. If the average amount paid by a single user is greater than 50, type the category of "high order", otherwise it is a low order, and then call it with the apply function:

The preliminary marking of the users here based on the high and low orders has been completed.

Match the city

The amount of each user and the ticket tag have been fixed, and the next step is to add each user's region field, which can be done with a pd.merge function. Since the source data is not duplicated, we have to deduplicate it by nickname first, otherwise there will be a lot of duplicate data in the matching result:

Calculation of TGI Index of High passenger order

In order to calculate the TGI index of high ticket in each city, we need to get the number of people in high order and low order in each city. If you use the EXCEL PivotTable report is very simple, directly drag the province and city to the location of the row, guest list category to the location of the column, the value of any field, as long as it is statistics.

Don't panic, this set of operations is easy to implement in Python, and the pivot_table PivotTable function can be done in one line:

The result includes a hierarchical index, which will not be discussed due to space constraints. As long as we know that to get the "high guest list" column, we need to index the "buyer nickname" first, and then the "high guest list":

In this way, the number of people with high orders for each province and city, and then the number of people with low orders, are merged horizontally:

Let's take a look at the proportion of the total number of people in each city and the number of high passengers to complete the calculation of the "proportion of groups with certain characteristics in the target group":

In some very minority cities, the number of high-order or low-order is equal to 1 or none, and these values, especially null values, will affect the calculation of the results. We need to check the data in advance:

Sure enough, both high and low orders have null values (which can be understood as 0), resulting in a null value for the total number of passengers, and the TGI index is not meaningful for null values, so we eliminate the rows with null values:

Then count the proportion of the high passenger single population in the total number of people to match the denominator "the proportion of groups with the same characteristics" in the standard formula:

The last step is to calculate the TGI index, by the way:

As a result, Xiao Z cheerfully planned to report to the boss as soon as possible, and glanced at the data before pressing enter, and found a serious problem: the city with the top TGI index of high passenger order, the total number of customers was hardly more than 10, which was totally unconvincing.

The TGI index can show the strength of preferences, but it is easy to ignore the specific sample size, which requires special attention.

What should I do? In order to enhance the overall reliability of the data, Xiao Z decided to screen the total number of people first, using the average of the total number as the threshold, and retaining only the cities where the total number is greater than the average:

This is the end of the content of "what is the method of Python data analysis". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.