Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Click-through rate model AUC

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Share

Shulou(Shulou.com)06/01 Report--

A background

First of all, let me give an example:

Positive (90) negative (10)

Model 1 predicts positive (90) positive (10)

Model 2 predicts positive (70) negative (20) positive (5) negative (5)

Conclusion:

Model 1 is 90% accurate.

Model 2 is 75% accurate.

Considering the prediction ability of positive and negative sample pairs, it is obvious that model 2 is better than model 1, but for this kind of data with uneven distribution of positive and negative samples, the accuracy can not measure whether the classifier pair is good or bad, so the index auc is needed to solve the evaluation problem of inclined samples.

Binary confusion matrix

Forecast\ actual 1 0

1 TP FP

0 FN TN

How much is correct in TPR=TP/P=TP/TP+FN intuition 1?

How many wrong guesses in FPR=FP/N=FP/FP+TN intuitive 0?

The horizontal and vertical coordinates of Auc are FPR and TPR respectively, and the classifier performance is better when the straight line is near the upper left corner, so model 2 is better.

TPR FPR

Model 1 90, 90, 10, 10, 10, 1, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1, 10, 10, 1, 10, 10, 1, 10, 10, 10, 1, 10, 10, 10, 10, 1, 10, 10, 10, 10, 1, 10, 10, 10, 10, 1, 10, 10, 1, 10, 10, 1, 10, 10, 1, 10, 10, 1, 10, 10, 10, 1, 10, 10, 1, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1, 10, 10, 10, 1, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,

Model 2, 70, 90, 90, 0. 78, 5, 10, 0. 5.

The auc points of models 1 and 2 are shown in the following figure, respectively. It is obvious that model 1 is better:

Second, the current research situation.

AUC intuitive concept, randomly take a pair of positive and negative samples, the positive sample score is greater than the negative sample pair probability.

Calculation method: positive sample and negative sample pair pair, auc= estimated that positive sample score is greater than negative sample score pair logarithm / total pair logarithm.

E.g. Model 1 and 2 are calculated respectively for auc?

The label of the four samples are y1, y2, y3, and y4 respectively.

The prediction of model 1 is y1 = 0.9, y2 = 0.5, y3 = 0.2, y4 = 0.6

The prediction of model 2 is y1-0.1, y2-0.9, y3-0.8, y4-0.2

Solution:

Model 1: the pair in which the score of positive samples is greater than that of negative samples includes (y1, y3), (y1, y4), (y2, y3), and the auc is 3.

Model 2: the pair in which the score of positive samples is greater than that of negative samples includes (y2, y3), (y2, y4), and the auc is 2max 40.5.

Calculation reference paper: "An introduction to ROC analysis" (Tom Fawcett)

Methods:

1 sort the samples according to score

2 for each sample in turn, the label score increased by 1 for TP, otherwise FP increased by 1. Calculate the area of each trapezoid.

3 accumulate all samples and calculate auc

Code:

= sorted (range (len (probs)), key=lambda I: probs [I] Reverse== = probs [I _ sorted []] + I last_prob! = + = (TP+TP_pre) * (FP-FP_pre) / = = labels [I _ sorted [I]] = = TP+ = FP + + = (TP+TP_pre) * (FP-FP_pre) / = auc_temp / (TP * = = line = line.strip (). Split (= (line [= (line [len (sys.argv)! = read_file (sys.argv [=% _ name__==)

Auc calculation method of three-click rate model

As shown in the above figure, take two sub-buckets as an example, and the AUC calculated by each sub-bucket is the shaded part of the figure. The global AUC part needs to supplement the area of the P3 part, which is equal to the sum (click) of the previous 1 barrel multiplied by the noclick per I barrel.

The overall AUC is the area under the curve divided by the area of the anchoring moment at the beginning and end of the curve.

Steps

1 aggregate sum_show and sum_clk according to pctr

2 samples are sorted according to pctr

(3) for each sample in turn, the areas of small trapezoidal pairs surrounded by noclk and clk were calculated.

Code:

Import sys#init auc dictparams_auc_dict = {"last_ctr": 1.1, "slot_show_sum": 0, "slot_click_sum": 0,\ "auc_temp": 0.0, "click_sum": 0.0, "old_click_sum": 0.0, "no_click": 0.0 \ "no_click_sum": 0.0} # init Q distributeq_bucket = 1000params_Q_dict = {"count_list": [0] * (q_bucket+1)} for line in sys.stdin: lineL = line.strip () .split ('\ t') if len (lineL)

< 3: continue pctr = float(lineL[0]) #print lineL[0] #pctr = float(lineL[0])/1e6 show = int(float(lineL[1])) click = int(float(lineL[2])) slot_info = '-' ### calculate auc params_auc_dict["slot_show_sum"] += show params_auc_dict["slot_click_sum"] += click if params_auc_dict["last_ctr"] != pctr: params_auc_dict["auc_temp"] += (params_auc_dict["click_sum"] + \ params_auc_dict["old_click_sum"]) * params_auc_dict["no_click"] / 2.0 params_auc_dict["old_click_sum"] = params_auc_dict["click_sum"] params_auc_dict["no_click"] = 0.0 params_auc_dict["last_ctr"] = pctr params_auc_dict["no_click"] += show - click params_auc_dict["no_click_sum"] += show - click params_auc_dict["click_sum"] += click ### calculate Q distribution index = int(pctr / (1.0/q_bucket)) #interval [0, 0.001) left close, right open count_list = params_Q_dict["count_list"] count_list[index] += show# last instance for aucparams_auc_dict["auc_temp"] += (params_auc_dict["click_sum"] + \ params_auc_dict["old_click_sum"]) * params_auc_dict["no_click"] / 2.0if params_auc_dict["auc_temp"] >

Auc = params_auc_dict ["auc_temp"] / (params_auc_dict ["click_sum"] * params_auc_dict ["no_click_sum"]) else: auc = 0print "AUC:%s\ tshow_sum:%s\ tclk_sum:%s"% (auc, params_auc_dict ["slot_show_sum"] Params_auc_dict ["slot_click_sum"]) # print Q distribution resultfor item in params_Q_dict: count_list = params_Q_dict ["count_list"] print "Max bucket num:% s"% (sum (count_list)) for i in range (q_bucket+1): if I < (q_bucket-1): print str ((iTun1) * (1.0/q_bucket)) +' \ t'+ str (count_ list [I]) else: print '1.0\ t' + str (count_ list [I] + count_ list [I + 1]) break

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Network Security

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report