How to deal with the imbalance of BCEWithLogitsLoss samples 07/12 Update SLTechnology News&Howtos

How to deal with the imbalance of BCEWithLogitsLoss samples

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "BCEWithLogitsLoss sample imbalance how to deal with," interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "BCEWithLogitsLoss sample imbalance how to deal with"!

Try to increase the weight of loss for positive samples, see BCEWithLogitsLoss source code Examples:: >>> target = torch.ones([10, 64], dtype=torch.float32) # 64 classes, batch size = 10 >>> output = torch.full([10, 64], 0.999) # A prediction (logit) >>> pos_weight = torch.ones([64]) # All weights are equal to 1 >>> criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight) >>> criterion(output, target) # -log(sigmoid(0.999)) tensor(0.3135) Args: weight (Tensor, optional): a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size `nbatch`. size_average (bool, optional): Deprecated (see :attr:`reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field :attr:`size_average` is set to ``False``, the losses are instead summed for each minibatch. Ignored when reduce is ``False``. Default: ``True`` reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss per batch element instead and ignores :attr:`size_average`. Default: ``True`` reduction (string, optional): Specifies the reduction to apply to the output: ``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction will be applied, ``'mean'``: the sum of the output will be divided by the number of elements in the output, ``'sum'``: the output will be summed. Note: :attr:`size_average` and :attr:`reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override :attr:`reduction`. Default: ``'mean'`` pos_weight (Tensor, optional): a weight of positive examples. Must be a vector with length equal to the number of classes.

There is doubt about the use of pos_weight, pos_weight = torch.ones([64]) # All weights are equal to 1, I don't understand why there are 64 classes, because BCEloss is the loss for the two-classification problem, after searching, I know that there are multi-label classifications.

Multi-label classification is multiple labels, each label has two labels (0 and 1), this kind of task can also use BCEloss.

Now let's talk about how to use pos_weight in BCEWithLogitsLoss

For example, we have positive and negative samples, the number of positive samples is 100, and the number of negative samples is 400. We want to weight the loss of positive and negative samples and amplify the loss weight of positive samples by 4 times. In this way, we can alleviate the sample imbalance problem.

criterion = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([4])) # pos_weight (Tensor, optional): a weight of positive examples.# Must be a vector with length equal to the number of classes.

pos_weight is a tensor list, which needs to have the same number of labels. For example, we are now a binary classification, so we only need to write the weight of the positive sample loss.

If it is a multi-label classification with 64 labels, then

Examples:: >>> target = torch.ones([10, 64], dtype=torch.float32) # 64 classes, batch size = 10 >>> output = torch.full([10, 64], 0.999) # A prediction (logit) >>> pos_weight = torch.ones([64]) # All weights are equal to 1 >>> criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight) >>> criterion(output, target) # -log(sigmoid(0.999)) tensor(0.3135)

Pytorch -- BCEWithLogitsLoss()

1. Equivalent expression 1. pytorch: torch.sigmoid() + torch.nn.BCELoss()2. Write def ce_loss(y_pred, y_train, alpha=1) yourself: p = torch.sigmoid(y_pred) # p = torch.clamp(p, min=1e-9, max=0.99) loss = torch.sum(- alpha * torch.log(p) * y_train - torch.log(1 - p) * (1 - y_train))/len(y_train) return loss~3, Verification import torch import torch.nn as nntorch.cuda.manual_seed(300) #Set random seed for current GPU torch.manual_seed(300) #Set random seed def ce_loss(y_pred, y_train, alpha=1) for CPU: Calculate loss p = torch.sigmoid(y_pred) # p = torch.clamp(p, min=1e-9, max=0.99) loss = torch.sum(- alpha * torch.log(p) * y_train - torch.log(1 - p) * (1 - y_train))/len(y_train) return losspy_lossFun = nn.BCEWithLogitsLoss()input = torch.randn ((10000,1), requires_grad=True)target = torch.ones ((10000,1))target.requires_grad_(True)py_loss = py_lossFun (input, target)py_loss.backward()print ("*********BCEWithLogitsLoss***********")print ("loss: ")print (py_loss.item())print("Gradient: ")print (input.grad)input = input.detach()input.requires_grad_(True)self_loss = ce_loss (input, target)self_loss.backward()print("*******SelfCELoss******")print("loss: ")print(self_loss.item())print("ladder: ")print(input.grad)

- As you can see from the above results, the loss I wrote is basically the same as the j provided in pytorch.

- But is that enough? NO! Here's how powerful BCEWithLogitsLoss() is:

- BCEWithLogitsLoss() has a good ability to process nan. For the code I wrote (four-layer neural network, the activation function between layers is ReLU, and the output layer activation function is sigmoid(). Due to the problem of data processing, it will cause the loss of CE we wrote to appear nan: The reasons are as follows:

- First, if the pre_target output of the neural network is large, p after sigmoid will be 1, and torch.log (1 - p) will be nan;

- Although using clamp(function will release this nan, but in the iterative process, the network output may become larger and larger (between layers is ReLU), which causes the loss we write to fall into a certain value and cannot be optimized. But BCEWithLogitsLoss() handles nan well in this case, resulting in better results.

- The purpose of my experiment is to compare the difference between CE and FL, write FL yourself, you must also write CE yourself, can not use BCEWithLogitsLoss().

II. Use scenario

binary + sigmoid()

Classification problems that use sigmoid as a nonlinear representation of the output layer (although it can handle multi-classification problems, it is generally used for two-classification, and only one node is placed in the last layer)

III. Precautions

input format

Input and target required to be float type

At this point, I believe everyone has a deeper understanding of "BCEWithLogitsLoss sample imbalance," so let's actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.