8 Bug commonly used in computer Vision Deep Learning 10/19 Update SLTechnology News&Howtos

8 Bug commonly used in computer Vision Deep Learning

2025-10-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

2019-12-10 20:40:25

Author: Arseny Kravchenko

Compilation: ronghuaiyang

Guide reading

Give you a summary of 8 common bug in computer vision deep learning, I believe we have encountered more or less, I hope to help you avoid some problems.

People are not perfect. We often make mistakes in software. Sometimes these errors are easy to find: your code doesn't work at all, your application crashes, and so on. But some bug are hidden, which makes them even more dangerous.

When solving deep learning problems, due to some uncertainty, it is easy to have this type of bug: it is easy to see if the web application routes the request correctly, but it is not easy to check whether your gradient descent step is correct. However, there are many mistakes that can be avoided.

I would like to share some of my experience about the mistakes I have seen or made in my computer vision work in the past two years. I talked about this topic (https://datafest.ru/ia/)), and a lot of people told me after the meeting, "Yes, I have a lot of bug like this, too." I hope my article can help you avoid at least some of these problems.

1. Flip the picture and key points.

Suppose that on the issue of key point detection. The data looks like a pair of images and a series of key tuples. Each of these keys is a pair of x and y coordinates.

Let's make a basic enhancement to this data:

Def flip_img_and_keypoints (img: np.ndarray, kpts: sequence [img]):

Img = np.fliplr (img)

H, w, * _ = img.shape

Kpts = [(y, w-x) for y, x in kpts]

Return img, kpts

Looks like the right thing, huh? We visualize it.

Image = np.ones ((10,10), dtype=np.float32)

Kpts = [(0,1), (2,2)]

Image_flipped, kpts_flipped = flip_img_and_keypoints (image, kpts)

Img1 = image.copy ()

For y, x in kpts:

Img1 [y, x] = 0

Img2 = image_flipped.copy ()

For y, x in kpts_flipped:

Img2 [y, x] = 0

_ = plt.imshow (np.hstack ((img1, img2)

Asymmetry, it looks strange! What if we check the extreme value?

Image = np.ones ((10,10), dtype=np.float32)

not good! This is a typical off-by-one error. The correct code goes like this:

Def flip_img_and_keypoints (img: np.ndarray, kpts: sequence [img]):

Img = np.fliplr (img)

H, w, * _ = img.shape

Kpts = [(y, w-x-1) for y, x in kpts]

Return img, kpts

We discovered this problem visually, but using the "x = 0" point for unit testing can also be helpful. An interesting fact is that there are three people on a team (including myself) who independently made almost the same mistake.

two。 Continue to be related to the key points.

Even after the above function has been fixed, there is still a danger. Now it's more about semantics than just a piece of code.

Suppose you need to use two palms to enhance the image. It looks safe: the hand is turned left and right.

But wait! We don't know much about the semantics of key points. If this point means something like this:

Kpts = [

(20, 20), # left pinky

(20,200), # right pinky

...

]

This means that the enhancement actually changes the semantics: left becomes right, right becomes left, but we don't exchange key point indexes in the array. It will bring a lot of noise and worse measurements to the training.

We should learn a lesson:

Before applying enhancements or other fancy features, understand and consider data structures and semantics to maintain your experimental atomicity: add a small change (such as a new transformation), check how it works, and add it only if the score increases. 3. Write your own loss function

People who are familiar with semantic segmentation may know IoU metrics. Unfortunately, we can't optimize it directly with SGD, so the common method is to approximate it with a differentiable loss function.

Def iou_continuous_loss (y_pred, y_true):

Eps = 1e-6

Def _ sum (x):

Return x.sum (- 1) sum (- 1)

Numerator = (_ sum (y_true * y_pred) + eps)

Denominator = (_ sum (y_true * * 2) + _ sum (y_pred * * 2)

-_ sum (y_true * y_pred) + eps)

Return (numerator / denominator). Mean ()

It looks good. Let's do a little test first:

In [3]: 3,10,10))

...: X1 = iou_continuous_loss (ones * 0.01, ones)

...: x2 = iou_continuous_loss (ones * 0.99, ones)

In [4]: x1, x2

Out [4]: (0.010099999897990103,0.9998990001020204)

In x1, we calculate the loss of something completely different from ground truth, while x2 is the result of something very close to ground truth. We expect x1 to be large because the prediction is wrong and x2 should be close to zero. What happened?

The above function is a good approximation of metric. Metric is not a loss: it is usually (including in this case) as high as possible. When we use SGD to minimize loss, we should use something opposite:

Def iou_continuous (y_pred, y_true):

Eps = 1e-6

Def _ sum (x):

Return x.sum (- 1) sum (- 1)

Numerator = (_ sum (y_true * y_pred) + eps)

Denominator = (_ sum (y_true * * 2) + _ sum (y_pred * * 2)

-_ sum (y_true * y_pred) + eps)

Return (numerator / denominator). Mean ()

Def iou_continuous_loss (y_pred, y_true):

Return 1-iou_continuous (y_pred, y_true)

These problems can be identified from two aspects:

Write a unit test to check the direction of the loss: formal expectations, closer to the ground truth should output lower losses. Run a sound check to get your model over-fitted in a single batch. 4. When we use Pytorch

Suppose you have a pre-trained model and start doing infer.

From ceevee.base import AbstractPredictor

Class MySuperPredictor (AbstractPredictor):

Def _ init__ (self

Weights_path: str

Super (). _ _ init__ ()

Self.model = self._load_model (weights_path=weights_path)

Def process (self, x, * kw):

With torch.no_grad ():

Res = self.model (x)

Return res

@ staticmethod

Def _ load_model (weights_path):

Model = ModelClass ()

Weights = torch.load (weights_path, map_location='cpu')

Model.load_state_dict (weights)

Return model

Is this code correct? Maybe! This does apply to some models. For example, when the model has no dropout or norm layers, such as torch.nn.BatchNorm2d. Or when the model needs to use the actual norm statistics for each image (for example, many pix2pix-based architectures need it).

But for most computer vision applications, the code ignores something important: switching to evaluation mode.

If you try to convert a dynamic PyTorch diagram into a static PyTorch diagram, this problem is easy to identify. Torch.jit is used for this conversion.

In [3]: model = nn.Sequential (

...: nn.Linear (10,10)

...: nn.Dropout (.5)

...:)

...:

Traced_model = torch.jit.trace (model, torch.rand (10))

/ Users/Arseny/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/jit/__init__.py:914: TracerWarning: Trace had nondeterministic nodes. Did you forget call. Eval () on your model? Nodes:

: Float (10) = aten::dropout (% input,), scope: Sequential/Dropout [1] # / Users/Arseny/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/functional.py:806:0

This may cause errors in trace checking. To disable trace checking, pass check_trace=False to torch.jit.trace ()

Check_tolerance, _ force_outplace, True, _ module_class)

/ Users/Arseny/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/jit/__init__.py:914: TracerWarning: Output nr 1. Of the traced function does not match the corresponding output of the Python function. Detailed error:

Not within tolerance rtol=1e-05 atol=1e-05 at input [5] (0.0 vs. 0.5454154014587402) and 5 other locations (60.00%)

Check_tolerance, _ force_outplace, True, _ module_class)

A simple fix:

In [4]: model = nn.Sequential (

...: nn.Linear (10,10)

...: nn.Dropout (.5)

...:)

...:

Traced_model = torch.jit.trace (model.eval (), torch.rand (10))

# No more warnings!

In this case, torch.jit.trace runs the model several times and compares the results. The difference here is suspicious.

However, torch.jit.trace is not a panacea here. This is a nuance that should be known and remembered.

5. Copy and paste problem

Many things exist in pairs: training and verification, width and height, latitude and longitude.

Def make_dataloaders (train_cfg, val_cfg, batch_size):

Train = Dataset.from_config (train_cfg)

Val = Dataset.from_config (val_cfg)

Shared_params = {'batch_size': batch_size,' shuffle': True, 'num_workers': cpu_count ()}

Train = DataLoader (train, * * shared_params)

Val = DataLoader (train, * * shared_params)

Return train, val

I'm not the only one who made a stupid mistake. For example, there is a similar version of the very popular albumentations library.

# https://github.com/albu/albumentations/blob/0.3.0/albumentations/augmentations/transforms.py

Def apply_to_keypoint (self, keypoint, crop_height=0, crop_width=0, h_start=0, w_start=0, rows=0, cols=0, * * params):

Keypoint = F.keypoint_random_crop (keypoint, crop_height, crop_width, h_start, w_start, rows, cols)

Scale_x = self.width / crop_height

Scale_y = self.height / crop_height

Keypoint = F.keypoint_scale (keypoint, scale_x, scale_y)

Return keypoint

Don't worry, it's been modified.

How to avoid it? Instead of copying and pasting code, try to write code in a way that doesn't need to be copied and pasted.

Datasets = []

Data_a = get_dataset (MyDataset (config ['dataset_a']), config [' shared_param'], param_a)

Datasets.append (data_a)

Data_b = get_dataset (MyDataset (config ['dataset_b']), config [' shared_param'], param_b)

Datasets.append (data_b) datasets = []

For name, param in zip (('dataset_a',' dataset_b')

(param_a, param_b)

Datasets.append (get_dataset (MyDataset (configname), config ['shared_param'], param)) 6. Appropriate data type

Let's write a new enhancement

Def add_noise (img: np.ndarray)-> np.ndarray:

Mask = np.random.rand (* img.shape) + .5

Img = img.astype ('float32') * mask

Return img.astype ('uint8')

The image has been changed. Is this what we expect? Well, maybe it's changed too much.

Here is a dangerous operation: convert float32 to uint8. It can cause an overflow:

Def add_noise (img: np.ndarray)-> np.ndarray:

Mask = np.random.rand (* img.shape) + .5

Img = img.astype ('float32') * mask

Return np.clip (img, 0255) .astype ('uint8')

Img = add_noise (cv2.imread ('two_hands.jpg') [:,:,:-1])

_ = plt.imshow (img)

It looks much better, doesn't it?

By the way, there is another way to avoid this problem: don't reinvent the wheel, don't write enhanced code from scratch and use the existing extension: albumentations.augmentations.transforms.GaussNoise.

I once did another bug of the same origin.

Raw_mask = cv2.imread ('mask_small.png')

Mask = raw_mask.astype ('float32') / 255

Mask = cv2.resize (mask, (64, 64), interpolation=cv2.INTER_LINEAR)

Mask = cv2.resize (mask, 128,128), interpolation=cv2.INTER_CUBIC)

Mask = (mask * 255) .astype ('uint8')

_ = plt.imshow (np.hstack ((raw_mask, mask)

What's wrong here? First of all, it is a bad idea to adjust the size of the mask with cubic interpolation. The same problem with float32 to uint8: cubic interpolation can output more than the input, which can lead to an overflow.

I found this problem when I was doing visualization. It's also a good idea to put assertions everywhere in your training cycle.

7. Spelling mistakes

Suppose you need to reason about a full convolution network (such as a semantic segmentation problem) and a large image. The image is so huge that there is no chance to put it in your GPU, it can be a medical or satellite image.

In this case, the image can be divided into grids, each block can be inferred independently, and finally merged. In addition, some prediction crossovers may help smooth the artifacts near the boundary.

From tqdm import tqdm

Class GridPredictor:

This class can be used to predict a segmentation mask for the big image

When you have GPU memory limitation

Def _ _ init__ (self, predictor: AbstractPredictor, size: int, stride: Optional [int] = None):

Self.predictor = predictor

Self.size = size

Self.stride = stride if stride is not None else size / / 2

Def _ _ call__ (self, x: np.ndarray):

H, w, _ = x.shape

Mask = np.zeros ((h, w, 1), dtype='float32')

Weights = mask.copy ()

For i in tqdm (range (0, h-1, self.stride)):

For j in range (0, w-1, self.stride):

A, b, c, d = I, min (h, I + self.size), j, min (w, j + self.size)

Patch = x [a _ v _ b, c _ RV _ d,:]

Mask [aself.predictor b, CRAV d,:] + = np.expand_dims (self.predictor (patch),-1)

Weights [aRV b, CRAV d,:] = 1

Return mask / weights

There is a symbol input error, and the code snippet is large enough to find it easily. I suspect that it can be quickly identified just by the code. But it's easy to check that the code is correct:

Class Model (nn.Module):

Def forward (self, x):

Return x.mean (axis=-1)

Model = Model ()

Grid_predictor = GridPredictor (model, size=128, stride=64)

Simple_pred = np.expand_dims (model (img),-1)

Grid_pred = grid_predictor (img)

Np.testing.assert_allclose (simple_pred, grid_pred, atol=.001)

AssertionError Traceback (most recent call last)

9 grid_pred = grid_predictor (img)

ten

-> 11 np.testing.assert_allclose (simple_pred, grid_pred, atol=.001)

~ / .pyenv/versions/3.6.6/lib/python3.6/site-packages/numpy/testing/_private/utils.py in assert_allclose (actual, desired, rtol, atol, equal_nan, err_msg, verbose)

1513 header = 'Not equal to tolerance rtol=%g, atol=%g'% (rtol, atol)

1514 assert_array_compare (compare, actual, desired, err_msg=str (err_msg))

-> 1515 verbose=verbose, header=header, equal_nan=equal_nan)

1516

1517

~ / .pyenv/versions/3.6.6/lib/python3.6/site-packages/numpy/testing/_private/utils.py in assert_array_compare (comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf)

839 verbose=verbose, header=header

840 names= ('x','y'), precision=precision)

-> 841 raise AssertionError (msg)

842 except ValueError:

843 import traceback

AssertionError:

Not equal to tolerance rtol=1e-07, atol=0.001

Mismatch: 99.6%

Max absolute difference: 765.

Max relative difference: 0.75000001

X: array ([[215.333333])

[192.666667]

[250. ],...

Y: array ([[215.33333])

[192.66667]

[250. ],...

Here is the correct version of the _ _ call__ method:

Def _ _ call__ (self, x: np.ndarray):

H, w, _ = x.shape

Mask = np.zeros ((h, w, 1), dtype='float32')

Weights = mask.copy ()

For i in tqdm (range (0, h-1, self.stride)):

For j in range (0, w-1, self.stride):

A, b, c, d = I, min (h, I + self.size), j, min (w, j + self.size)

Patch = x [a _ v _ b, c _ RV _ d,:]

Mask [aself.predictor b, CRAV d,:] + = np.expand_dims (self.predictor (patch),-1)

Weights [aRV b, CRAV d,:] + = 1

Return mask / weights

If you still don't know what the problem is, please pay attention to the line weights.

8. Imagenet normalization

When a person needs to perform transfer learning, it is usually a good idea to normalize the image in the same way as when training Imagenet.

Let's use the albumentations library that we are already familiar with.

From albumentations import Normalize

Norm = Normalize ()

Img = cv2.imread ('img_small.jpg')

Mask = cv2.imread ('mask_small.png', cv2.IMREAD_GRAYSCALE)

Mask = np.expand_dims (mask,-1) # shape (64,64)-> shape (64,64,1)

Normed = norm (image=img, mask=mask)

Img, mask = [Normed [x] for x in ['image',' mask']]

Def img_to_batch (x):

X = np.transpose (x, (2,0,1)) .astype ('float32')

Return torch.from_numpy (np.expand_dims (x, 0))

Img, mask = map (img_to_batch, (img, mask))

Criterion = F.binary_cross_entropy

It's time to train a network and over-fit a single image-- as I mentioned, this is a good debugging technique:

Model_a = UNet (3,1)

Optimizer = torch.optim.Adam (model_a.parameters (), lr=1e-3)

Losses = []

For t in tqdm (range (20))

Loss = criterion (model_a (img), mask)

Losses.append (loss.item ())

Optimizer.zero_grad ()

Loss.backward ()

Optimizer.step ()

_ = plt.plot (losses)

The curvature looks good, but the loss of cross-entropy-300 is unpredictable. What's the problem?

Normalized images work well, but mask doesn't: you need to manually zoom to [0Magne1].

Model_b = UNet (3,1)

Optimizer = torch.optim.Adam (model_b.parameters (), lr=1e-3)

Losses = []

For t in tqdm (range (20))

Loss = criterion (model_b (img), mask / 255.)

Losses.append (loss.item ())

Optimizer.zero_grad ()

Loss.backward ()

Optimizer.step ()

_ = plt.plot (losses)

Simple run-time assertions for training loops (for example, assertmask.max ())

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.