An example Analysis of TGI Index in Database 07/06 Update SLTechnology News&Howtos

An example Analysis of TGI Index in Database

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "database TGI index example analysis". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Introduction

There are often professional data analysis reports that mention the TGI index, such as "based on the so-and-so TGI index, we find that certain types of users prefer XX." For students who are not familiar with the definition of TGI, it must be cloudy to see similar words. This time, let's talk about what the TGI index is and how to use case data to implement simple TGI preference analysis.

For the TGI index, the encyclopedia is explained in this way-the TGI index, the full name Target Group Index, can reflect the strength or weakness of the target group within a particular scope of research.

Well, the official explanation reveals the profession, which is full of obscurity and obscurity. Roughly translated, the TGI index is an indicator of preference. This is still not clear enough, let's understand it with the formula.

TGI Index calculation Formula = the proportion of groups with certain characteristics in the target group / the proportion of groups with the same characteristics in the population * standard number 100

Are you getting more dizzy? Just be dizzy! What are we talking about if we don't feel dizzy?

Index disassembly

In the TGI formula, there are three key points that need to be further disassembled: a feature, a population, and a target group.

Take any chestnut, suppose we want to study the TGI index of hair loss of Company A:

A certain characteristic is a certain behavior or state that we want to analyze, here is hair loss (or suffering from hair loss).

In general, it is all the subjects we study, that is, the owners of Company A.

The target group is a group that we are interested in as a whole, assuming that the group we focus on is the data department, then the target group is the data department.

As a result, the proportion of the molecular target group in the formula with a certain characteristic can be understood as the proportion of hair loss in the data department. Assuming that there are 15 people in the data department and 9 people suffer from hair loss, the proportion of hair loss in the data department is 9pm 15, which is equal to 60%.

On the other hand, the proportion of people with the same characteristics in the denominator is equivalent to the proportion of the total number of people suffering from hair loss in the company. If there are a total of 500 people in the company, 120 people suffer from hair loss, then the proportion is 24%.

Therefore, the data department hair loss TGI index can be used as 60 / 24% * 100 = 250. the calculation logic of the hair loss TGI index of other departments is the same, using the proportion of hair loss in this department / the proportion of hair loss in the company * 100.

A TGI index greater than 100 indicates that a certain type of users have more corresponding tendencies or preferences, and the higher the value, the stronger the tendency and preference; less than 100 means that the correlation tendency of this kind of users is weak (compared with the average); and equal to 100 means that the correlation tendency of this kind of users is in the average level.

In the example we just made up, the TGI index of hair loss in our data department is 250, which is much higher than 100. it seems that the risk of hair loss is very high, and the data is the real driver of the hairline.

Next, we use a case to consolidate the concept understanding, by the way proficient in Pandas.

TGI case analysis

Background: recently, we are going to launch a product with a high guest order, and we plan to try it out in some cities first. Take a look at this data, which cities have a high order preference, and help me screen 5.

See what the data looks like:

Order data includes brand name, buyer name, payment time, order status and region and other fields, a total of 28832 pieces of data, there is no blank value.

The definition of relatively high guest order = product line and historical data, a single purchase of more than 50 yuan is considered a high customer order.

After confirming the high order, our goal is very clear: to rank the cities according to their preference. The preference here can be measured by the TGI index, so let's review the three core points of TGI again:

Characteristic, high order, that is, customers buy more than 50 yuan at a time.

The target group is each city, where we can calculate the high order preference of customers in all cities.

As for the overall, it is very straightforward, and all the customers involved in the calculation are the total.

The key to solving the problem is to calculate the number and proportion of high passenger orders in different cities.

Single user marking

In the first step, we first determine whether each user belongs to a high-order group, so we first group according to the user's nickname to see the average amount paid by each user. The average is used here because some customers buy many times, and the amount of order issued each time is not the same, so it is average.

Next, define a judgment function. If the average amount paid by a single user is greater than 50, you will enter the category with a high order, otherwise it will be a low order, and then call it with the apply function:

Def if_high (x): if x > 50:return 'high order' else:return 'low order'

The preliminary marking of the users here based on the high and low orders has been completed.

Match the city

The amount of each user and the ticket tag have been fixed, and the next step is to add each user's region field, which can be done with a pd.merge function. Since the source data is not duplicated, we have to deduplicate it by nickname first, otherwise there will be a lot of duplicate data in the matching result:

Df_dup = df.loc [df.duplicated ('buyer nickname') = = False,:] df_merge = pd.merge (gp_user,df_dup,left_on=' buyer nickname, right_on=' buyer nickname, how='left') df_merge.head ()

Calculation of TGI Index of High passenger order

In order to calculate the TGI index of high ticket in each city, we need to get the number of people in high order and low order in each city. If you use the EXCEL PivotTable report is very simple, directly drag the province and city to the location of the row, guest list category to the location of the column, the value of any field, as long as it is statistics.

Don't panic, this set of operations is easy to implement in Python, and the pivot_table PivotTable function can be done in one line:

Df_merge = df_merge [['buyer nickname', 'guest list category', 'province', 'city']] result = pd.pivot_table (df_merge,index= ['province', 'city'], columns=' ticket category', aggfunc='count') result.head ()

The result includes a hierarchical index, which is not discussed due to space constraints. As long as we know that to get the "high guest order" column, we need to index the buyer's nickname first, and then index the high guest order:

Result ['buyer nickname'] ['high order']. Reset_index (). Head ()

In this way, the number of people with high orders for each province and city, and then the number of people with low orders, are merged horizontally:

Tgi = pd.merge (result ['buyer nickname'] ['high order'] .reset_index (), result ['buyer nickname'] ['low order'] .reset_index (), left_on= ['province', 'city'], right_on= ['province', 'city'], how='inner') tgi.head ()

Let's look at the proportion of the total number of people in each city and the number of high passengers to complete the molecular calculation of the proportion of groups with certain characteristics in the target group:

Tgi ['total number'] = tgi ['high order'] + tgi ['low order'] tgi ['high order ratio'] = tgi ['high order'] / tgi ['total number'] tgi.head ()

In some very minority cities, the number of high-order or low-order is equal to 1 or none, and these values, especially null values, will affect the calculation of the results. We need to check the data in advance:

Sure enough, both high and low orders have null values (which can be understood as 0), resulting in a null value for the total number of passengers, and the TGI index is not meaningful for null values, so we eliminate the rows with null values:

Tgi = tgi.dropna ()

Then count the proportion of the high passenger single population in the total number of people to match the proportion of the denominator population with the same characteristics in the standard formula:

The last step is to calculate the TGI index, by the way:

Tgi ['high passenger single TGI index'] = tgi ['high passenger single share'] / total_percentage * 100tgi = tgi.sort_values ('high passenger single TGI index', ascending=False) tgi.head (10)

Found a serious problem: high single TGI index ranked among the top cities, the total number of customers is almost no more than 10, such a high single population proportion, completely unconvincing. The TGI index can show the strength of preferences, but it is easy to ignore the specific sample size, which requires special attention.

What should I do? In order to enhance the overall reliability of the data, we first screen the total number of people, using the average of the total number of people as the threshold, and only retain the cities where the total number of people is greater than the average:

Tgi.loc [tgi ['Total number'] > tgi ['Total number'] .mean (),:] .head (10)

This is the end of the content of "Database TGI Index example Analysis". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.