In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
Xiaobian to share with you how pytorch calculates kl divergence, I believe most people still do not know how, so share this article for your reference, I hope you have a lot of harvest after reading this article, let's go to understand it together!
First attach official documentation description: pytorch.org/docs/stable/nn.functional.html
torch.nn.functional.kl_div(input, target, size_average=None, reduce=None, reduction='mean')
Parameters
input - Tensor of arbitrary shape
target - Tensor of the same shape as input
size_average (bool, optional) - Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool, optional) - Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string, optional) - Specifies the reduction to apply to the output: 'none' | 'batchmean' | 'sum' | 'mean'. 'none': no reduction will be applied 'batchmean': the sum of the output will be divided by the batchsize 'sum': the output will be summed 'mean': the output will be divided by the number of elements in the output Default: 'mean'
Then see how it works:
The first parameter is passed in a logarithmic probability matrix, and the second parameter is passed in a probability matrix. This is important because otherwise the divergence of kl might be negative.
For example, now I have two matrices X and Y. Because kl divergence is asymmetric, there is a relationship between directing and being directed, so the order of the inputs to the matrix needs to be determined.
For example:
If you now want to direct X with Y, you pass X for the first argument and Y for the second argument. That is, the guided ones are put in front, and then the corresponding probabilities and logarithmic probabilities can be obtained.
import torch.nn.functional as F#Define two matrices x = torch.randn((4, 5))y = torch.randn((4, 5))#Since y guides x, find the logarithmic probability of x, and the probability of y logp_x = F.log_softmax (x, dim=-1)p_y = F.softmax (y, dim=-1) kl_sum = F.kl_div (logp_x, p_y, reduction='sum')kl_mean = F.kl_div(logp_x, p_y, reduction='mean') print(kl_sum, kl_mean) >>> tensor(3.4165) tensor(0.1708)
Supplement: kl divergence in pytorch, why kl divergence is negative?
F.kl_div() or nn.KLDivLoss() is a function in pytroch that calculates kl divergence, and its usage has many details to note.
input
The first parameter is passed in a logarithmic probability matrix, and the second parameter is passed in a probability matrix. And because kl divergence is asymmetric, there is a relationship between directing and being directed, so the order of the inputs to the matrix needs to be determined. If you now want to direct X with Y, you pass X for the first argument and Y for the second argument. That is, the guided ones are put in front, and then the corresponding probabilities and logarithmic probabilities can be obtained.
So, for example, if we initialize a tensor randomly, for the first input we need to softmax the tensor (making sure the sum of the dimensions is 1) and then log it; for the second input we need to softmax the tensor.
import torchimport torch.nn.functional as Fa = torch.tensor ([[0,0,1.1,2,0,10,0],[0,0,1,2,0,10,0]])log_a =F.log_softmax(a)b = torch.tensor ([[0,0,1.1,2,0,7,0],[0,0,1,2,0,10,0]])softmax_b =F.softmax(b,dim=-1)kl_mean = F.kl_div(log_a, softmax_b, reduction='mean') print(kl_mean) Why KL divergence calculates to be negative
First make sure that softmax+log is applied to the first input and softmax is applied to the second argument. It can be negative without softmax.
Then check whether your input has many digits after the decimal point. When there are many digits after the decimal point, softmax under pytorch will produce the phenomenon that the sum of each dimension is not 1, resulting in kl divergence being negative, as shown below:
a = torch.tensor([[0., 0,0.000001,0.0000002,0,0.0000007,0]])log_a =F.log_softmax(a,dim=-1)print("log_a:",log_a)b = torch.tensor([[0., 0,0.00001,0.000002,0,0.000007,0]])softmax_b =F.softmax(b,dim=-1)print("softmax_b:",softmax_b)kl_mean = F.kl_div(log_a, softmax_b,reduction='mean') print("kl_mean:",kl_mean) Above is "pytorch how to calculate kl divergence" All the contents of this article, thank you for reading! I believe that everyone has a certain understanding, hope to share the content to help everyone, if you still want to learn more knowledge, welcome to pay attention to the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.