In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
2019-12-10 20:40:25
Author: Arseny Kravchenko
Compilation: ronghuaiyang
Guide reading
Give you a summary of 8 common bug in computer vision deep learning, I believe we have encountered more or less, I hope to help you avoid some problems.
People are not perfect. We often make mistakes in software. Sometimes these errors are easy to find: your code doesn't work at all, your application crashes, and so on. But some bug are hidden, which makes them even more dangerous.
When solving deep learning problems, due to some uncertainty, it is easy to have this type of bug: it is easy to see if the web application routes the request correctly, but it is not easy to check whether your gradient descent step is correct. However, there are many mistakes that can be avoided.
I would like to share some of my experience about the mistakes I have seen or made in my computer vision work in the past two years. I talked about this topic (https://datafest.ru/ia/)), and a lot of people told me after the meeting, "Yes, I have a lot of bug like this, too." I hope my article can help you avoid at least some of these problems.
1. Flip the picture and key points.
Suppose that on the issue of key point detection. The data looks like a pair of images and a series of key tuples. Each of these keys is a pair of x and y coordinates.
Let's make a basic enhancement to this data:
Def flip_img_and_keypoints (img: np.ndarray, kpts: sequence [img]):
Img = np.fliplr (img)
H, w, * _ = img.shape
Kpts = [(y, w-x) for y, x in kpts]
Return img, kpts
Looks like the right thing, huh? We visualize it.
Image = np.ones ((10,10), dtype=np.float32)
Kpts = [(0,1), (2,2)]
Image_flipped, kpts_flipped = flip_img_and_keypoints (image, kpts)
Img1 = image.copy ()
For y, x in kpts:
Img1 [y, x] = 0
Img2 = image_flipped.copy ()
For y, x in kpts_flipped:
Img2 [y, x] = 0
_ = plt.imshow (np.hstack ((img1, img2)
Asymmetry, it looks strange! What if we check the extreme value?
Image = np.ones ((10,10), dtype=np.float32)
not good! This is a typical off-by-one error. The correct code goes like this:
Def flip_img_and_keypoints (img: np.ndarray, kpts: sequence [img]):
Img = np.fliplr (img)
H, w, * _ = img.shape
Kpts = [(y, w-x-1) for y, x in kpts]
Return img, kpts
We discovered this problem visually, but using the "x = 0" point for unit testing can also be helpful. An interesting fact is that there are three people on a team (including myself) who independently made almost the same mistake.
two。 Continue to be related to the key points.
Even after the above function has been fixed, there is still a danger. Now it's more about semantics than just a piece of code.
Suppose you need to use two palms to enhance the image. It looks safe: the hand is turned left and right.
But wait! We don't know much about the semantics of key points. If this point means something like this:
Kpts = [
(20, 20), # left pinky
(20,200), # right pinky
...
]
This means that the enhancement actually changes the semantics: left becomes right, right becomes left, but we don't exchange key point indexes in the array. It will bring a lot of noise and worse measurements to the training.
We should learn a lesson:
Before applying enhancements or other fancy features, understand and consider data structures and semantics to maintain your experimental atomicity: add a small change (such as a new transformation), check how it works, and add it only if the score increases. 3. Write your own loss function
People who are familiar with semantic segmentation may know IoU metrics. Unfortunately, we can't optimize it directly with SGD, so the common method is to approximate it with a differentiable loss function.
Def iou_continuous_loss (y_pred, y_true):
Eps = 1e-6
Def _ sum (x):
Return x.sum (- 1) sum (- 1)
Numerator = (_ sum (y_true * y_pred) + eps)
Denominator = (_ sum (y_true * * 2) + _ sum (y_pred * * 2)
-_ sum (y_true * y_pred) + eps)
Return (numerator / denominator). Mean ()
It looks good. Let's do a little test first:
In [3]: 3,10,10))
...: X1 = iou_continuous_loss (ones * 0.01, ones)
...: x2 = iou_continuous_loss (ones * 0.99, ones)
In [4]: x1, x2
Out [4]: (0.010099999897990103,0.9998990001020204)
In x1, we calculate the loss of something completely different from ground truth, while x2 is the result of something very close to ground truth. We expect x1 to be large because the prediction is wrong and x2 should be close to zero. What happened?
The above function is a good approximation of metric. Metric is not a loss: it is usually (including in this case) as high as possible. When we use SGD to minimize loss, we should use something opposite:
Def iou_continuous (y_pred, y_true):
Eps = 1e-6
Def _ sum (x):
Return x.sum (- 1) sum (- 1)
Numerator = (_ sum (y_true * y_pred) + eps)
Denominator = (_ sum (y_true * * 2) + _ sum (y_pred * * 2)
-_ sum (y_true * y_pred) + eps)
Return (numerator / denominator). Mean ()
Def iou_continuous_loss (y_pred, y_true):
Return 1-iou_continuous (y_pred, y_true)
These problems can be identified from two aspects:
Write a unit test to check the direction of the loss: formal expectations, closer to the ground truth should output lower losses. Run a sound check to get your model over-fitted in a single batch. 4. When we use Pytorch
Suppose you have a pre-trained model and start doing infer.
From ceevee.base import AbstractPredictor
Class MySuperPredictor (AbstractPredictor):
Def _ init__ (self
Weights_path: str
):
Super (). _ _ init__ ()
Self.model = self._load_model (weights_path=weights_path)
Def process (self, x, * kw):
With torch.no_grad ():
Res = self.model (x)
Return res
@ staticmethod
Def _ load_model (weights_path):
Model = ModelClass ()
Weights = torch.load (weights_path, map_location='cpu')
Model.load_state_dict (weights)
Return model
Is this code correct? Maybe! This does apply to some models. For example, when the model has no dropout or norm layers, such as torch.nn.BatchNorm2d. Or when the model needs to use the actual norm statistics for each image (for example, many pix2pix-based architectures need it).
But for most computer vision applications, the code ignores something important: switching to evaluation mode.
If you try to convert a dynamic PyTorch diagram into a static PyTorch diagram, this problem is easy to identify. Torch.jit is used for this conversion.
In [3]: model = nn.Sequential (
...: nn.Linear (10,10)
...: nn.Dropout (.5)
...:)
...:
Traced_model = torch.jit.trace (model, torch.rand (10))
/ Users/Arseny/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/jit/__init__.py:914: TracerWarning: Trace had nondeterministic nodes. Did you forget call. Eval () on your model? Nodes:
: Float (10) = aten::dropout (% input,), scope: Sequential/Dropout [1] # / Users/Arseny/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/nn/functional.py:806:0
This may cause errors in trace checking. To disable trace checking, pass check_trace=False to torch.jit.trace ()
Check_tolerance, _ force_outplace, True, _ module_class)
/ Users/Arseny/.pyenv/versions/3.6.6/lib/python3.6/site-packages/torch/jit/__init__.py:914: TracerWarning: Output nr 1. Of the traced function does not match the corresponding output of the Python function. Detailed error:
Not within tolerance rtol=1e-05 atol=1e-05 at input [5] (0.0 vs. 0.5454154014587402) and 5 other locations (60.00%)
Check_tolerance, _ force_outplace, True, _ module_class)
A simple fix:
In [4]: model = nn.Sequential (
...: nn.Linear (10,10)
...: nn.Dropout (.5)
...:)
...:
Traced_model = torch.jit.trace (model.eval (), torch.rand (10))
# No more warnings!
In this case, torch.jit.trace runs the model several times and compares the results. The difference here is suspicious.
However, torch.jit.trace is not a panacea here. This is a nuance that should be known and remembered.
5. Copy and paste problem
Many things exist in pairs: training and verification, width and height, latitude and longitude.
Def make_dataloaders (train_cfg, val_cfg, batch_size):
Train = Dataset.from_config (train_cfg)
Val = Dataset.from_config (val_cfg)
Shared_params = {'batch_size': batch_size,' shuffle': True, 'num_workers': cpu_count ()}
Train = DataLoader (train, * * shared_params)
Val = DataLoader (train, * * shared_params)
Return train, val
I'm not the only one who made a stupid mistake. For example, there is a similar version of the very popular albumentations library.
# https://github.com/albu/albumentations/blob/0.3.0/albumentations/augmentations/transforms.py
Def apply_to_keypoint (self, keypoint, crop_height=0, crop_width=0, h_start=0, w_start=0, rows=0, cols=0, * * params):
Keypoint = F.keypoint_random_crop (keypoint, crop_height, crop_width, h_start, w_start, rows, cols)
Scale_x = self.width / crop_height
Scale_y = self.height / crop_height
Keypoint = F.keypoint_scale (keypoint, scale_x, scale_y)
Return keypoint
Don't worry, it's been modified.
How to avoid it? Instead of copying and pasting code, try to write code in a way that doesn't need to be copied and pasted.
Datasets = []
Data_a = get_dataset (MyDataset (config ['dataset_a']), config [' shared_param'], param_a)
Datasets.append (data_a)
Data_b = get_dataset (MyDataset (config ['dataset_b']), config [' shared_param'], param_b)
Datasets.append (data_b) datasets = []
For name, param in zip (('dataset_a',' dataset_b')
(param_a, param_b)
):
Datasets.append (get_dataset (MyDataset (configname), config ['shared_param'], param)) 6. Appropriate data type
Let's write a new enhancement
Def add_noise (img: np.ndarray)-> np.ndarray:
Mask = np.random.rand (* img.shape) + .5
Img = img.astype ('float32') * mask
Return img.astype ('uint8')
The image has been changed. Is this what we expect? Well, maybe it's changed too much.
Here is a dangerous operation: convert float32 to uint8. It can cause an overflow:
Def add_noise (img: np.ndarray)-> np.ndarray:
Mask = np.random.rand (* img.shape) + .5
Img = img.astype ('float32') * mask
Return np.clip (img, 0255) .astype ('uint8')
Img = add_noise (cv2.imread ('two_hands.jpg') [:,:,:-1])
_ = plt.imshow (img)
It looks much better, doesn't it?
By the way, there is another way to avoid this problem: don't reinvent the wheel, don't write enhanced code from scratch and use the existing extension: albumentations.augmentations.transforms.GaussNoise.
I once did another bug of the same origin.
Raw_mask = cv2.imread ('mask_small.png')
Mask = raw_mask.astype ('float32') / 255
Mask = cv2.resize (mask, (64, 64), interpolation=cv2.INTER_LINEAR)
Mask = cv2.resize (mask, 128,128), interpolation=cv2.INTER_CUBIC)
Mask = (mask * 255) .astype ('uint8')
_ = plt.imshow (np.hstack ((raw_mask, mask)
What's wrong here? First of all, it is a bad idea to adjust the size of the mask with cubic interpolation. The same problem with float32 to uint8: cubic interpolation can output more than the input, which can lead to an overflow.
I found this problem when I was doing visualization. It's also a good idea to put assertions everywhere in your training cycle.
7. Spelling mistakes
Suppose you need to reason about a full convolution network (such as a semantic segmentation problem) and a large image. The image is so huge that there is no chance to put it in your GPU, it can be a medical or satellite image.
In this case, the image can be divided into grids, each block can be inferred independently, and finally merged. In addition, some prediction crossovers may help smooth the artifacts near the boundary.
From tqdm import tqdm
Class GridPredictor:
"
This class can be used to predict a segmentation mask for the big image
When you have GPU memory limitation
"
Def _ _ init__ (self, predictor: AbstractPredictor, size: int, stride: Optional [int] = None):
Self.predictor = predictor
Self.size = size
Self.stride = stride if stride is not None else size / / 2
Def _ _ call__ (self, x: np.ndarray):
H, w, _ = x.shape
Mask = np.zeros ((h, w, 1), dtype='float32')
Weights = mask.copy ()
For i in tqdm (range (0, h-1, self.stride)):
For j in range (0, w-1, self.stride):
A, b, c, d = I, min (h, I + self.size), j, min (w, j + self.size)
Patch = x [a _ v _ b, c _ RV _ d,:]
Mask [aself.predictor b, CRAV d,:] + = np.expand_dims (self.predictor (patch),-1)
Weights [aRV b, CRAV d,:] = 1
Return mask / weights
There is a symbol input error, and the code snippet is large enough to find it easily. I suspect that it can be quickly identified just by the code. But it's easy to check that the code is correct:
Class Model (nn.Module):
Def forward (self, x):
Return x.mean (axis=-1)
Model = Model ()
Grid_predictor = GridPredictor (model, size=128, stride=64)
Simple_pred = np.expand_dims (model (img),-1)
Grid_pred = grid_predictor (img)
Np.testing.assert_allclose (simple_pred, grid_pred, atol=.001)
AssertionError Traceback (most recent call last)
In
9 grid_pred = grid_predictor (img)
ten
-> 11 np.testing.assert_allclose (simple_pred, grid_pred, atol=.001)
~ / .pyenv/versions/3.6.6/lib/python3.6/site-packages/numpy/testing/_private/utils.py in assert_allclose (actual, desired, rtol, atol, equal_nan, err_msg, verbose)
1513 header = 'Not equal to tolerance rtol=%g, atol=%g'% (rtol, atol)
1514 assert_array_compare (compare, actual, desired, err_msg=str (err_msg))
-> 1515 verbose=verbose, header=header, equal_nan=equal_nan)
1516
1517
~ / .pyenv/versions/3.6.6/lib/python3.6/site-packages/numpy/testing/_private/utils.py in assert_array_compare (comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf)
839 verbose=verbose, header=header
840 names= ('x','y'), precision=precision)
-> 841 raise AssertionError (msg)
842 except ValueError:
843 import traceback
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0.001
Mismatch: 99.6%
Max absolute difference: 765.
Max relative difference: 0.75000001
X: array ([[215.333333])
[192.666667]
[250. ],...
Y: array ([[215.33333])
[192.66667]
[250. ],...
Here is the correct version of the _ _ call__ method:
Def _ _ call__ (self, x: np.ndarray):
H, w, _ = x.shape
Mask = np.zeros ((h, w, 1), dtype='float32')
Weights = mask.copy ()
For i in tqdm (range (0, h-1, self.stride)):
For j in range (0, w-1, self.stride):
A, b, c, d = I, min (h, I + self.size), j, min (w, j + self.size)
Patch = x [a _ v _ b, c _ RV _ d,:]
Mask [aself.predictor b, CRAV d,:] + = np.expand_dims (self.predictor (patch),-1)
Weights [aRV b, CRAV d,:] + = 1
Return mask / weights
If you still don't know what the problem is, please pay attention to the line weights.
8. Imagenet normalization
When a person needs to perform transfer learning, it is usually a good idea to normalize the image in the same way as when training Imagenet.
Let's use the albumentations library that we are already familiar with.
From albumentations import Normalize
Norm = Normalize ()
Img = cv2.imread ('img_small.jpg')
Mask = cv2.imread ('mask_small.png', cv2.IMREAD_GRAYSCALE)
Mask = np.expand_dims (mask,-1) # shape (64,64)-> shape (64,64,1)
Normed = norm (image=img, mask=mask)
Img, mask = [Normed [x] for x in ['image',' mask']]
Def img_to_batch (x):
X = np.transpose (x, (2,0,1)) .astype ('float32')
Return torch.from_numpy (np.expand_dims (x, 0))
Img, mask = map (img_to_batch, (img, mask))
Criterion = F.binary_cross_entropy
It's time to train a network and over-fit a single image-- as I mentioned, this is a good debugging technique:
Model_a = UNet (3,1)
Optimizer = torch.optim.Adam (model_a.parameters (), lr=1e-3)
Losses = []
For t in tqdm (range (20))
Loss = criterion (model_a (img), mask)
Losses.append (loss.item ())
Optimizer.zero_grad ()
Loss.backward ()
Optimizer.step ()
_ = plt.plot (losses)
The curvature looks good, but the loss of cross-entropy-300 is unpredictable. What's the problem?
Normalized images work well, but mask doesn't: you need to manually zoom to [0Magne1].
Model_b = UNet (3,1)
Optimizer = torch.optim.Adam (model_b.parameters (), lr=1e-3)
Losses = []
For t in tqdm (range (20))
Loss = criterion (model_b (img), mask / 255.)
Losses.append (loss.item ())
Optimizer.zero_grad ()
Loss.backward ()
Optimizer.step ()
_ = plt.plot (losses)
Simple run-time assertions for training loops (for example, assertmask.max ())
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.