AI tutorial | Yolo3 mask detection using Didi Cloud Notebook Notebook Service 04/17 Update SLTechnology News&Howtos

AI tutorial | Yolo3 mask detection using Didi Cloud Notebook Notebook Service

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

In this tutorial, you will learn how to use the classic one-stage target detection network Yolo v3 to achieve mask detection. For more information about Yolo v3, you can read paper. All engineering files for this tutorial can be downloaded from Didi Yun S3 storage service.

Didi Cloud Notebook notebook service integrates CUDA, CuDNN, Python, TensorFlow, Pytorch, MxNet, Keras and other deep learning frameworks, without the need for users to install.

First, purchase Notebook services

You can purchase Notebook service after registering Didi Cloud and verifying your real name.

Go to the console Notebook page and click the create Notebook instance button.

Select the basic configuration:

Choose the payment method: currently, only by time is supported. Select availability zone: select the region close to your customer, including Guangzhou Zone 1 and Zone 2. Select configuration specifications: select the relevant configuration according to the required CPU, GPU, video card and memory. Select image: Jupyter Notebook image and Jupyter Lab image are provided. Select > jupyter-lab-v1 here. Set up the system disk: select the size of the system disk according to the requirements, and the setting range is 80GB-500GB. Image.png setting name and label enter Notebook name. Enter the label key and key value, and click the add button to add multiple tags.

Image.png accesses Notebook

Go to my Notebook page and click on the actions column to open Notebook.

Go to the Notebook details page and click to open Notebook.

Notebook-open.png II. Debug code import cv2import mathimport matplotlib.pyplot as pltimport numpy as npimport osimport randomimport timeimport torchimport torchvisionimport torch.nn as nnimport torch.nn.init as initimport torch.optim as optimimport xml.etree.ElementTree as ET from torch.utils.data import Dataset, DataLoader

Download the mask detection dataset and upload it to the Notebook server. Here we take the AIZOO open source dataset as an example. Download address: https://pan.baidu.com/s/1nsQf_Py5YyKm87-8HiyJeQ, extraction code: eyfz. Didi Yun S3 storage service may be used for uploading large files. To make it easier for users to download, we upload it to the public dataset object store S3 in advance, and execute the following shell command. If you upload by yourself, you can skip it.

! wget https://dataset-public.s3.didiyunapi.com/detection/ face mask detection / part1.tgzfacial mask get https://dataset-public.s3.didiyunapi.com/detection/ mask detection / part2.tgzfacial mask https://dataset-public.s3.didiyunapi.com/detection/ detection / val.tgzfacial mask tar-zxf part1.tgzfacial mask tar-zxf part2.tgzfacial mask tar-zxf val.tgz-2020-03-05 14zxf 5925rel Https://dataset-public.s3.didiyunapi.com/detection/%E4%BA%BA%E8%84%B8%E5%8F%A3%E7%BD%A9%E6%A3%80%E6%B5%8B/part1.tgzResolving dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com)... 125.94.54.9Connecting to dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com) | 125.94.54.9 |: 443. Connected.HTTP request sent, awaiting response... 200 OKLength: 270682151 (258m) [application/gzip] Saving to: 'part1.tgz'100% [= >] 270682151 6.15MB/s in 42s 2020-03-05 15:00:06 (6.18 MB/s) -' part1.tgz' saved [270682151]-- 2020-03-05 1515 https://dataset-public.s3.didiyunapi.com/detection/%E4%BA%BA%E8%84%B8%E5%8F%A3%E7 % BD%A9%E6%A3%80%E6%B5%8B/part2.tgzResolving dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com)... 125.94.54.9Connecting to dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com) | 125.94.54.9 |: 443. Connected.HTTP request sent, awaiting response... 200 OKLength: 337432016 (322m) [application/gzip] Saving to: 'part2.tgz'100% [= >] 337432016 6.41MB/s in 53s 2020-03-05 15:00:59 (6.10 MB/s) -' part2.tgz' saved [337432016Universe 337432016] % BD%A9%E6%A3%80%E6%B5%8B/val.tgzResolving dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com)... 125.94.54.9Connecting to dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com) | 125.94.54.9 |: 443. Connected.HTTP request sent, awaiting response... 200 OKLength: 184383116 (176m) [application/gzip] Saving to: 'val.tgz'100% [= >] 184383116 6.25MB/s in 29s 2020-03-05 15:01:28 (6.12 MB/s) -' val.tgz' saved [184383116]

Load the data, here we need to write the Dataset type of the custom dataset

VOC_CLASSES = ('face',' face_mask') class AnnotationTransform (object): def _ init__ (self, class_to_ind=None, keep_difficult=True): self.class_to_ind = class_to_ind or dict (zip (VOC_CLASSES, range (len (VOC_CLASSES) self.keep_difficult = keep_difficult def _ call__ (self) Target): res = np.empty ((0Power5)) for obj in target.iter ('object'): difficult = int (obj.find (' difficult'). Text) = 1 if not self.keep_difficult and difficult: continue name = obj.find ('name'). Text.lower (). Strip () bbox = obj.find (' bndbox') ) pts = ['xmin' 'ymin',' xmax', 'ymax'] bndbox = [] for I, pt in enumerate (pts): cur_pt = int (bbox.find (pt) .text)-1 bndbox.append (cur_pt) label_idx = self.class_to_ in [name] bndbox.append (label_idx) res = np.vstack ((res) Bndbox) # [xmin, ymin, xmax, ymax, label_ind] return res # [[xmin, ymin, xmax, ymax, label_ind],...] def preproc_for_test (image, input_size, mean, std): interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4] interp_method = interp_ methods. Randrange (5)] image = cv2.resize (image Input_size, interpolation=interp_method) image = image.astype (np.float32) image = image [:,:,:-1] image / = 255. If mean is not None: image-= mean if std is not None: image / = std return image.transpose (2,0,1) class TrainTransform (object): def _ init__ (self, rgb_means=None, std=None, max_labels=50): self.means = rgb_means self.std = std self.max_labels = max_labels def _ call__ (self, image, targets, img_size): boxes = targets [: : 4] .copy () # Nx4 labels = targets [:, 4] .copy () if len (boxes) = 0: targets = np.zeros ((self.max_labels, 5), dtype=np.float32) image = preproc_for_test (image, img_size, self.means, self.std) image = np.ascontiguousarray (image, dtype=np.float32) return torch.from_numpy (image) Torch.from_numpy (targets) height, width, _ = image.shape boxes_o = targets [:,: 4] labels = targets [:, 4] b_x_o = (boxes_o [:, 2] + boxes_o [:, 0]) * .5 b_y_o = (boxes_o [:, 3] + boxes_o [:, 1]) * .5 b_w_o = (boxes_o [: 2]-boxes_o [:, 0]) * 1. B_h_o = (boxes_o [:, 3]-boxes_o [:, 1]) * 1. Boxes_o [:, 0] = b_x_o boxes_o [:, 1] = b_y_o boxes_o [:, 2] = b_w_o boxes_o [: 3] = b_h_o # resize interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4] interp_method = interp_ methods [routines. Randrange (5)] image_t = cv2.resize (image, img_size, interpolation=interp_method) boxes = boxes_o boxes [:, 0 Velcr2] / = width boxes [: 1img_size [1] image_t = preproc_for_test (image_t, img_size, self.means, self.std) labels = np.expand_dims (labels, 1) targets_t = np.hstack ((labels) Boxes)) padded_labels = np.zeros ((self.max_labels, 5)) padded_ labels [range (len (targets_t)) [: self.max_labels]] = targets_t [: self.max_labels] padded_labels = np.ascontiguousarray (padded_labels, dtype=np.float32) image_t = np.ascontiguousarray (image_t, dtype=np.float32) return torch.from_numpy (image_t) Torch.from_numpy (padded_labels) # dataset type definition class VOCDetection (Dataset): def _ _ init__ (self, root, preproc=None, target_transform=AnnotationTransform (), img_size= (416,416) Split='train'): super (). _ init__ () self.root = root self.preproc = preproc self.target_transform = target_transform self.img_size = img_size self._annopath = os.path.join ('% slots, 'Annotations','% s.xml') self._imgpath = os.path.join ('% slots, 'JPEGImages' '% s.jpg') self._classes = VOC_CLASSES self._year =' 2012'# options: '2007 if split, which is related to eval protocol self.item_container = set () if split =' train': for folder in ['part1',' part2']: for item in os.listdir (os.path.join (self.root) Folder): self.item_container.add (os.path.join (self.root, folder, item [:-4]) else: for folder in ['val']: for item in os.listdir (os.path.join (self.root, folder)): self.item_container.add (os.path.join (self.root, folder) Item [:-4]) self.item_container = list (self.item_container) def _ getitem__ (self, index): item = self.item_ container [target] target = ET.parse (item+'.xml'). Getroot () img = cv2.imread (item+'.jpg') # img = Image.open (self._imgpath% img_id). Convert ('RGB') height, width _ = img.shape if self.target_transform is not None: target = self.target_transform (target) if self.preproc is not None: img, target = self.preproc (img, target, self.img_size) img_info = (width, height) return img, target, img_info Item def _ len__ (self): return len (self.item_container) dataset = VOCDetection (root='./', preproc=TrainTransform (), split='train')

Define the model and initialize the model parameters. Here, you need to import the custom library yolo, which can be downloaded from Didi Yun S3 storage service.

From yolo import YOLOv3model = YOLOv3 (num_classes = len (VOC_CLASSES)) def init_yolo (M): for min M.modules (): if isinstance (m, nn.Conv2d): init.kaiming_normal_ (m.weight, axium 0.1, mode='fan_in') if m.bias is not None: init.zeros_ (m.bias) elif isinstance (m Nn.BatchNorm2d): init.ones_ (m.weight) init.zeros_ (m.bias) elif isinstance (m, nn.Linear): init.normal_ (m.weight, 0 Init.zeros_ (m.bias) m.state_dict () [key] [...] = 0model.apply (init_yolo) model.train () torch.backends.cudnn.benchmark = Truedevice = torch.device ("cuda") model = model.to (device)

Now we use the Adam algorithm to train the model.

# before training, let's define some super parameters batch_size = 8 # the training size of each batch should not be too small. The size is limited by GPU memory base_lr = 0.0001 # benchmark learning rate warmup_epochs = 10 # learning rate gradually increases to base_lr epochepochs = 70 # total number of training epoch save_interval = 10 # save model epoch interval steps = [50,60] # epochdataloader = DataLoader (dataset, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True) optimizer = optim.Adam (model.parameters (), lr=base_lr) Weight_decay=0.0005) epoch_size = len (dataset) / / (batch_size*1) epoch = 1def set_lr (tmp_lr): for param_group in optimizer.param_groups: param_group ['lr'] = tmp_lrwhile epoch

< epochs+1: print('\n[Epoch {} started]'.format(epoch)) for iter_i, (imgs, targets, _, _) in enumerate(dataloader): start = time.time() if epoch % save_interval == 0: torch.save(model.state_dict(), 'yolov3_mask_detection_{}.pth'.format(epoch)) # 更新学习率 if epoch < warmup_epochs: tmp_lr = base_lr * pow((iter_i+epoch*epoch_size)*1. / (warmup_epochs*epoch_size), 1) set_lr(tmp_lr) elif epoch == warmup_epochs: tmp_lr = base_lr set_lr(tmp_lr) elif epoch in steps and iter_i == 0: tmp_lr = tmp_lr * 0.1 set_lr(tmp_lr) optimizer.zero_grad() imgs = imgs.to(device).to(torch.float32) targets = targets.to(device).to(torch.float32) loss_dict = model(imgs, targets, epoch) loss = sum(loss for loss in loss_dict['losses']) loss.backward() optimizer.step() end = time.time() if iter_i % 1 == 0: # 打印训练过程信息 print('\r[Epoch %d/%d][Iter %d/%d][LR %.6f]' '[Loss: l1 %.2f, conf %.6f, cls %.6f][Time: %.2f s]......' % (epoch, epochs, iter_i+1, epoch_size, tmp_lr, sum(l1_loss for l1_loss in loss_dict['l1_losses']).item(), sum(conf_loss for conf_loss in loss_dict['conf_losses']).item(), sum(cls_loss for cls_loss in loss_dict['cls_losses']).item(), end-start), end='') epoch += 1torch.save(model.state_dict(), 'yolov3_mask_detection_final.pth'.format(epoch))[Epoch 1 started][Epoch 1/70][Iter 765/765][LR 0.000020][Loss: l1 4.36, conf 1747.822632, cls 1.573278][Time: 0.61 s]........[Epoch 2 started][Epoch 2/70][Iter 765/765][LR 0.000030][Loss: l1 5.01, conf 781.659180, cls 1.723707][Time: 0.63 s].........[Epoch 3 started][Epoch 3/70][Iter 765/765][LR 0.000040][Loss: l1 18.94, conf 331.378754, cls 5.523039][Time: 0.69 s].......[Epoch 4 started][Epoch 4/70][Iter 78/765][LR 0.000041][Loss: l1 6.72, conf 293.010193, cls 1.820950][Time: 0.66 s].......KeyboardInterrupt: 得到训练的模型之后可以开始测试 class ValTransform(object): def __init__(self, rgb_means=None, std=None, swap=(2, 0, 1)): self.means = rgb_means self.swap = swap self.std = std def __call__(self, img, res, input_size): interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4] interp_method = interp_methods[0] img = cv2.resize(np.array(img), input_size, interpolation=interp_method).astype(np.float32) img = img[:, :, ::-1] img /= 255. if self.means is not None: img -= self.means if self.std is not None: img /= self.std img = img.transpose(self.swap) img = np.ascontiguousarray(img, dtype=np.float32) return torch.from_numpy(img), torch.zeros(1, 5) transform = ValTransform()im = cv2.imread("val/test_00000760.jpg") # 输入的图片ori_im = im.copy()height, width, _ = im.shapetest_size = (416, 416)im_input, _ = transform(im, None, test_size)im_input = im_input.to(device).type(torch.float32).unsqueeze(0)model.load_state_dict(torch.load('yolov3_mask_detection_final.pth')) # 加载训练权重device = torch.device("cuda")model = model.to(device)model.eval()outputs = model(im_input) 对模型输出做后处理，除去置信度较低的bbox，并利用非极大抑制（NMS）去除同类型IoU较大的bbox def postprocess(prediction, num_classes=2, conf_thre=0.3, nms_thre=0.45): box_corner = prediction.new(prediction.shape) box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2 box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2 box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2 box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2 prediction[:, :, :4] = box_corner[:, :, :4] output = [None for _ in range(len(prediction))] for i, image_pred in enumerate(prediction): # If none are remaining =>

Process next image if not image_pred.size (0): continue # Get score and class with highest confidence class_conf, class_pred = torch.max (image_pred [:, 5:5 + num_classes], 1, keepdim=True) conf_mask = (image_pred [:, 4] * class_conf.squeeze () > = conf_thre). Squeeze () # Detections ordered as (x1, y1, x2, y2) Obj_conf, class_conf, class_pred) detections = torch.cat ((image_pred [:,: 5], class_conf, class_pred.float ()), 1) detections = alternatives [conf _ mask] if not detections.size (0): continue # Iterate through all predicted classes unique_labels = detections [: -1] .unique () for c in unique_labels: # Get the detections with the particular class detections_class = detections [detections [:,-1] = c] nms_out_index = torchvision.ops.nms (detections_class [:,: 4], detections_class [:, 4] * detections_class [:, 5] Nms_thre) detections_class = detections_ class [NMS _ out_index] if output [I] is None: output [I] = detections_class else: output [I] = torch.cat ((output [I], detections_class)) return outputoutputs = postprocess (outputs, 2,0.01,0.35) outputs = outputs [0] .CPU (). Databboxes = outputs [: 0:4] bboxes [:, 0height test_size [0] bboxes [:, 1bboxes 2] * = height / test_size [1] bboxes [:, 2] = bboxes [:, 2]-bboxes [:, 0] bboxes [:, 3] = bboxes [:, 3]-bboxes [:, 1] cls = outputs [:, 6] scores = outputs [:, 4] * outputs [:, 5]

Finally, we visualize the results of the processing.

Def vis (img, boxes, scores, cls_ids, conf=0.5, class_names=None, color=None): colors = torch.FloatTensor. Def get_color (c, x) Max_val): ratio = float (x) / max_val * 5i = int (math.floor (ratio)) j = int (math.ceil (ratio)) ratio = ratio-I r = (1-ratio) * colors [I] [c] + ratio*colors [j] [c] return int (rang 255) width = img.shape [1] height = img.shape [0] For i in range (len (boxes)): box = boxes [I] cls_conf = scores [I] if cls_conf < conf: continue x1 = int (box [0]) y1 = int (box [1]) x2 = int (box [0] + box [2]) y2 = int (box [1] + box [3]) if color: Rgb = color else: rgb = (255) 0,0) if class_names is not None: cls_conf = scores [I] cls_id = int (cls_ IDS [I]) class_name = class_ namespace [CLS _ id] classes = len (class_names) offset = cls_id * 123456 classes red = get_color (2, offset, classes) green = get_color (1, offset Classes) blue = get_color (0, offset, classes) if color is None: rgb = (red, green, blue) img = cv2.putText (img,'% s:% .2f'% (class_name,cls_conf), (x1, rgb, 2) img = cv2.rectangle (img, (x1), (x2) Y2), rgb, 1) return imgpred_im = vis (ori_im, bboxes.numpy (), scores.numpy (), cls.numpy (), conf=0.3, class_names=VOC_CLASSES) plt.rcParams ['figure.figsize'] = (20,12) plt.imshow (pred_im [:,:,::-1]) 0015e6a27a5a1671b03db56e2718f19 image.png

It should be noted that this tutorial is only used for introductory learning, the input image size is selected 416x416, there is no complex data enhancement algorithm, network training optimization tricks, these factors all lead to the model population density missed detection and small head missed detection problems. If you want to get better results, you can use the face detection model, mixed with the mask data set, so that you can use more face data. Finally, the detected face images can be classified into two categories.

Author: Zhu Zhongtao [DiDi product expert]

In order to improve the efficiency of research and development, Didi Yun Technology Salon, which is all technical practical information, is in the process of signing up!

Follow Didi Yun official account immediately:

Reply to "class" to get free registration.

Reply to "Server" to get an one-month experience of getting started with CVM for free.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.