In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
Most people do not understand the knowledge points of this article "OpenCV parking space real-time detection project analysis", so the editor summarizes the following contents, detailed contents, clear steps, and has a certain reference value. I hope you can gain something after reading this article. Let's take a look at this "OpenCV parking lot real-time detection project analysis" article.
1. Write at the front
Today, we sort out the third practical project of the introduction to OpenCV. The first two articles sort out credit card digital recognition and document OCR scanning. Most of them use the basic image preprocessing techniques in OpenCV, such as contour detection, edge detection, morphological operation, perspective transformation, etc., while the project of this article not only needs some basic image preprocessing, but also needs to build a model for recognition and prediction. Therefore, through this project, it is very interesting to pull up a whole set of processes such as image preprocessing and modeling, and apply them to the actual application scene.
The task of real-time detection of parking spaces is to get a video video of the parking lot, which mainly accomplishes two things:
Check how many cars there are and how many parking spaces are available in the whole parking lot.
Mark the spare parking spaces, so that when users park, they can go directly to the vacant parking spaces, which saves a lot of time for parking.
So this project is still of great practical value. It took about a day and a half to complete the project. I refer to the video of Mr. Tang's introduction to OpenCV, but it is relatively rough to do this task. On this basis, I have made some optimizations based on my understanding, and the main changes are as follows:
According to the processing aspect, after the parking space is listed in the column box, I manually adjust the coordinates of each column box to ensure that each parking space is not omitted and superfluous, and then fine-tune the coordinate position of each parking space. try to make the mark more accurate.
In terms of the model, the original video adopts the transfer learning method and fine-tunes the VGG network based on keras, while my model here is based on pytorch, and the finetune using the ResNet32 pre-training model can verify the accuracy of the set to more than 0.94. but the first edition still has a small amount of prediction that is not very accurate, so we have made data enhancement based on the existing frame images and added some additional data to improve the accuracy to about 0.98.
All the changes in the overall structure of the project are regarded as understanding the above ideas, and then refactoring based on their own understanding. The advantage is that various optimizations can be made later, data enhancement can be done according to their own needs, data preprocessing and training of various advanced models, and so on.
However, finding a small resnet is powerful enough, and the final prediction results are as follows:
This is an image of a certain frame in the video. When it is actually running, it reads the video, quickly separates the frame, makes such a prediction mark for each frame, and then displays it in real time. In this way, at every moment, we can dynamically know which parking spaces are available in the parking lot.
The following is to sort out the key technologies used in this project. Because this project is slightly larger and has a large amount of code, it is impossible to show all of them here, but I want to record my thinking process for this project, as well as the motivation of various processing. And how to deal with it, I think this is something useful in the future.
two。 Overall process carding
First of all, after you get the task, you have to sort out the process before you can determine the course of action. We started to get a video like this, so in order to complete the task of parking space detection and identification above, we need to consider two steps:
I have to extract every parking space in the parking lot first.
With each parking space, I train a model to predict whether there is a car in the parking space, mark the number of cars that do not have cars, and count them.
In fact, there are only two big steps from a macro point of view. So the following question is how to extract each parking space, and how to train the model to predict?
Here I mainly divided into two major steps, data preprocessing and model training and prediction:
Data preprocessing
Take the image of a certain frame in the video as a unit to process
Through binarization, graying, edge detection, specific point calibration, etc., the redundant part of the picture is removed and only this part of the object in the parking lot is retained.
The straight line detection of Hough transform, to find the straight line in the picture, according to the straight line coordinates, first press the list unit, frame the parking space according to the column, and then fine-tune the frame manually.
In each column, lock the location of each parking space, label each parking space, and save this into a dictionary
With the location of each parking space, the corresponding picture can be extracted, which can be used as a data set for later model training and verification, but it needs to be divided manually.
Through the above steps, we will accumulate some data, about 800 pictures, and then we can train the model, but because the amount of data is too small, the retraining model is often not effective, so here we use the transfer learning method, using pre-trained resnet34, with these more than 800 images to fine-tune.
After training, the model is saved, and then, for each image, with the parking location dictionary, each parking space can be extracted directly, and then for each parking space, the model can predict whether there is a car or not.
So with such a process, we can further decompose and refine it, and then we can focus on the small ones and sort out the key details of each step.
3. Data preprocessing 3.1 background filtering
First, read an image in, and the original image is as follows:
First, the background is filtered out by binarization, the important information is highlighted, and then converted into a grayscale image.
Def select_rgb_white_yellow (image): # filter background lower = np.uint8 ([120,120,120]) upper = np.uint8 ([255,255,255]) # within the three channels, the parts below lower and above upper become 0 respectively, and the value between lower-upper becomes 255, which is equivalent to mask Filter background # retains pixel values white_mask = cv2.inRange (image, lower, upper) masked_img = cv2.bitwise_and (image, image, mask=white_mask) return masked_imgmasked_img = select_rgb_white_yellow (test_image)
When I see inRange () here, I think of the binarization method threshold used before, and I wonder what's the difference between the two? Why don't you use this here? Here are some experiences I have learned from my exploration:
Cv2.threshold (src, thresh, maxval, type [, dst]): for single-channel images (grayscale), binarization standard, type=THRESH_BINARY: if x > thresh, x = maxval, else x = 0, while type=THRESH_BINARY_INV: is opposite to the above standard, which is commonly used at present
Cv2.inRange (src, lowerb, upperb): can be a single-channel image, a three-channel image, or binarization. The standard is if x > = lower and x lines.
Image: the output image of edge detection. Note here that it must be the output image of edge detection.
Rho: resolution of the parameter polar diameter r in pixels, generally 1
Threa: resolution in radians, usually 1
Threshold: the least number of curve intersections required to detect a straight line
MinLineLength: the minimum length that can form a straight line, too short to be considered a straight line
MaxLineGap: the maximum direct interval between two lines is less than this value, so it is considered to be a straight line.
So, this function is used directly.
Def hough_lines (image): # the input image needs to be the result of edge detection # minLineLengh (the shortest length of a line, shorter than this is ignored) and MaxLineCap (the maximum interval between two lines, less than this value Considered to be a straight line) # rho distance accuracy, theta angle accuracy, threshod exceeds the set threshold to detect the segment return cv2.HoughLinesP (image, rho=0.1, theta=np.pi/10, threshold=15, minLineLength=9, maxLineGap=4) list_of_lines = hough_lines (roi_image) # (2338, 1, 4)
Unexpectedly detected 2338 straight lines, there must be a lot of unusable, so later processing, the need for a wave of straight lines screening. The screening principle is that the line should not be oblique and the horizontal direction should not be too long or too short. As you can see below the specific code, the effect of filtering is shown here first.
Filtered out, a total of 628 straight lines.
3.5 divide parking spaces in units
The following code will be a little more complicated, so you need to talk about ideas in blocks.
First of all, we got the straight line of the parking lot and its coordinates. Now that the filtering is done, the next step is to sort each line. Let these lines be arranged from column to column, from top to bottom.
Def identity_blocks (image, lines, make_copy=True): if make_copy: new_image = image.copy () # filter part of the straight line stayed_lines = [] for line in lines: for x1, y1, x2, y2 in line: # here is the filter line, it must not be a slanted line And the horizontal direction cannot be too long or too short if abs (y2-y1) = 25 and abs (x2-x1) 0 and key
< len(rects) - 1: # 竖直线 x = int((x1+x2) / 2) cv2.line(new_image, (x, y), (x, y2), (0, 0, 255), 2) # 计算数量 除了第一列和最后一列,中间的都是两列的 if key == 0 or key == len(rects) - 1: tot_spots += num_splits + 1 else: tot_spots += 2 * (num_splits + 1) # 字典对应好 if key == 0 or key == len(rects) - 1: for i in range(0, num_splits+1): cur_len = len(spot_dict) y = int(y1 + i * gap) + fine_tune_y[key] spot_dict[(x1, y, x2, y+gap)] = cur_len + 1 else: for i in range(0, num_splits+1): cur_len = len(spot_dict) y = int(y1 + i * gap) + fine_tune_y[key] x = int((x1+x2) / 2) spot_dict[(x1, y, x, y+gap)] = cur_len + 1 spot_dict[(x, y, x2, y+gap)] = cur_len + 2 return new_image, spot_dict 这里的fine_tune_y也是我后来加上去的,也是为了让每一列尽量把车位划分的准确些。 从这个效果上来看,基本上就把车位一个个的划分开了,划分开之后,会发现,这里面有些并不是车位, 但依然给框住了。这样统计个数的时候,以及后面给信息停车的时候会受到影响,所以我这里又一一排查,去掉了这些无效的车位。 # 去掉多余的停车位invalid_spots = [10, 11, 33, 34, 37, 38, 61, 62, 93, 94, 95, 97, 98, 135, 137, 138, 187, 249, 250, 253, 254, 323, 324, 327, 328, 467, 468, 531, 532]valid_spots_dict = {}cur_idx = 1for k, v in spot_dict.items(): if v in invalid_spots: continue valid_spots_dict[k] = cur_idx cur_idx += 1 这样,还可以把处理好的车位信息进行可视化,再进行微调,不过,我这里由于之前的一些微调操作,感觉效果还可以,就没有做任何调整啦。 # 把每一个有效停车位标记出来tmp_img = test_image.copy()for k, v in valid_spots_dict.items(): cv2.rectangle(tmp_img, (int(k[0]), int(k[1])),(int(k[2]),int(k[3])), (0,255,0) , 2)cv_imshow('valid_pot', tmp_img) 效果如下: 如果要想让后面模型对于每个车位预测的更加准确,这里的划分一定要尽量的细致和标准。 否则如果矩形框和真实的车位对应不上,比如矩形框卡在了两个车位中间这种,这样划分出的车位拿给模型看,就很容易判断出错。 另外,最终的这个字典很重要,因为这个字典里面保存的是各个车位的位置信息。 有了这个东西,拿到一帧图片,就可以直接把每个车位标定出来,拿给模型预测。 并且对于同一停车场,这个每个车位是固定的。所以这个也不会变,视频的所有图像共用。 这样能保证实时性。 3.7 为CNN生成预测图片 有了各个车位的具体位置信息,下面直接按照这里面的左边把每个车位切割出来,就能得到后面CNN的训练和验证的数据集了。 def save_images_for_cnn(image, spot_dict, folder_name = '../cnn_pred_data'): for spot in spot_dict.keys(): (x1, y1, x2, y2) = spot (x1, y1, x2, y2) = (int(x1), int(y1), int(x2), int(y2)) # 裁剪 spot_img = image[y1:y2, x1:x2] spot_img = cv2.resize(spot_img, (0, 0), fx=2.0, fy=2.0) spot_id = spot_dict[spot] filename = 'spot_{}.jpg'.format(str(spot_id)) # print(spot_img.shape, filename, (x1,x2,y1,y2)) cv2.imwrite(os.path.join(folder_name, filename), spot_img) save_images_for_cnn(test_image, valid_spots_dict) 这样,就把模型的训练数据集准备好。 在文件中组织成这个样子:In each catalog, there are images of small parking spaces, but is it artificially divided into cars or no cars? So the latter model actually does a two-category task. Given a small image of such a parking space, you can predict whether it is empty or not.
Let's start with the details of the model.
4. Training and prediction of models
As the current sample is very small, it is not enough to train a large model to convergence, so the transfer learning technique used here, the pre-training model.
The difference between the model here and the video is that I uniformly use the model training and test code written by pytorch, because I am recently trying to reproduce the classic networks in cv with pytorch, and this project just allows me to practice. The other is that the flexibility of keras is not enough, and it is not as convenient to use transforms in torchvision as for data preprocessing. Based on these two reasons, I directly use pytorch here, using the resnet34 pre-training model. The reason for using this is that I happen to reproduce the resnet in the past two days. I am just a little familiar with it. I can just apply what I have learned, without any preference.
As there is a lot of code here, it is not too much listed here, just talk about the logic, and those who are interested can look at the specific projects.
The first is the training model.
4.1 Model training
The overall logic can be seen as follows:
Def train_model (): # get dataloader data_root = os.getcwd () image_path = os.path.join (data_root, "train_data") train_data_path = os.path.join (image_path, "train") val_data_path = os.path.join (image_path, "test") train_loader, validat_loader, train_num, val_num = get_dataloader (train_data_path, val_data_path) Data_transform_pretrain, batch_size=8) # to create a model note that the number of classes is not specified here The default is class 1000 net = resnet34 () model_weight_path = 'saved_model_weight/resnet34_pretrain_ori_low_torch_version.pth' # using pre-trained parameters Then finetune net.load_state_dict (torch.load (model_weight_path, map_location='cpu')) # change fc layer structure to change the output dimension of fc to 2 in_channel = net.fc.in_features net.fc = nn.Linear (in_channel, 2) net.to (device) # Model training configuration loss_function = nn.CrossEntropyLoss () optimizer = optim.Adam (net.parameters () Lr=0.0001) epochs = 30 save_path = "saved_model_weight/resnet34_pretrain.pth" best_acc = 0. Train_steps = len (train_loader) model_train (net, train_loader, validat_loader, epochs, device, optimizer, loss_function, train_steps, val_num, save_path, best_acc)
Because I use some function encapsulation here, this logic should be a little clearer. First of all, the pytorch model training is to encapsulate the data into dataloader format, and the later model training is to read the data from this class. There is not too much detail here about the principles of dataloader and dataset. I sorted it out in detail at the pytorch foundation before.
But the detail here is data_transform_pretrain, that is, the data preprocessing operation.
Data_transform_pretrain = {"train": transforms.Compose ([transforms.RandomResizedCrop (224), # random cropping of the image, training set, verification set without transforms.RandomHorizontalFlip (), transforms.ToTensor (), # the centralization processing parameters here need the officially given parameters, here is the mean and variance of each channel of the ImageNet image Cannot specify transforms.Normalize ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)] at will, "val": transforms.Compose ([# during verification A little bit has also been changed here: transforms.Resize, transforms.CenterCrop, transforms.ToTensor (), transforms.Normalize ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)]), "test": transforms.Compose ([transforms.Resize (256)) Transforms.CenterCrop, transforms.ToTensor (), transforms.Normalize ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))])}
Since the officially trained resnet network is used here, we need to refer to the officially given parameters for centralization here, because its pre-training is trained on the big data set of ImageNet, so we had better not specify the mean and variance of each channel at will. Use what other people officially give.
With dataloader, the next step is to create the model, which is directly used in the resnet34 to import the pre-trained model parameters. When importing, you will find that the file with my parameter name has a low_torch_version because an error occurred during the previous import:
Xxx.pt is a zip archive (did you mean to use torch.jit.load ()) "
The reason for this error is that the model parameters saved by the official pre-training exercise use a version of pytorch above 1.6, and version 1.6 of PyTorch switches torch.save to use the new zipfile-based file format.
Torch.load still retains the ability to load files in the old format. If you want torch.save to use the old format, pass kwarg _ use_new_zipfile_serialization = False.
The pytorch version of my computer book is 1.0, so importing model parameters saved in version 1.6 or above will report such an error. So, how did I solve it? That is, from my server, run the following code
Model_weight_path = "saved_models/resnet34_pretrain_ori.pth" state_dict = torch.load (model_weight_path) torch.save (state_dict, 'saved_models/resnet34_pretrain_ori_low_torch_version.pth', _ use_new_zipfile_serialization=False)
The pytorch version on my server is 1.10, and you can import this parameter. After importing it, you can save it again and specify the official parameter.
After this problem is solved, let's talk about the pre-training model. After importing the parameters, we need to modify the last layer of the network, because resnet itself does 1000 classification, and the number of neurons in the last layer is 1000. We need to do two classifications here, so we need to change it to 2.
In addition, there are three ways of transfer learning:
Retrain all parameters after loading weights-good hardware facilities
After loading the weight, only the last few layers of parameters are trained, the front layer is frozen, or the learning rate of the first few layers is reduced, and the learning rate of the back full connection layer becomes larger, that is, the learning rate is adjusted in groups.
After loading the whole middle school, add another full connection layer to the original network, and only train the last full connection layer.
All the training methods I use here, but it is necessary to sort out what to do if you want to train only the back layers, or if you want to train at different learning rates between the front and back layers:
# when creating a model, note that the number of classes is not specified here. By default, class 1000 net = resnet34 () model_weight_path = 'saved_model_weight/resnet34_pretrain_ori_low_torch_version.pth'# uses pre-trained parameters Then finetunenet.load_state_dict (torch.load (model_weight_path, map_location='cpu')) # change fc layer structure to change the output dimension of fc to 2in_channel = net.fc.in_featuresnet.fc = nn.Linear (in_channel, 2) net.to (device) # Model training configuration loss_function = nn.CrossEntropyLoss () # training You can also freeze the parameters of the convolution layer, or specify the parameters of different layers to train with different learning rates res_params, conv_params, fc_params = [], [], [] # named_parameters () can return the name and parameters of each layer. Is a dictionary for name, param in net.named_parameters (): # layer series is the residual layer if ('layer' in name): res_params.append (param) # full connection layer elif (' fc' in name): fc_params.append (param) else: param.requires_grad = Falseparams = [{'params': res_params,' lr': 0.0001} {'params': fc_params,' lr': 0.0002},] optimizer = optim.Adam (params)
Here you can modify the parameters of the optimizer.
After this, the model training function is called and the training can be done directly. This script is a routine operation, so the code will not be posted here.
4.2 Model Prediction
With the preserved model, we take an image, divide the parking spaces one by one according to the parking dictionary, and then predict whether it is empty through the model. If it is empty, mark it on the original image.
So here are the core predictions for the whole project:
Def predict_on_img (img, spot_dict, model, class_indict, make_copy=True, color= [0,255,0], alpha=0.5, save=True): # this is a panoramic image of the parking lot if make_copy: new_image = np.copy (img) overlay = np.copy (img) cnt_empty, all_spots = 0,0 for spot in tqdm (spot_dict.keys ()): all_spots + = 1 (x1) Y1, x2, y2) = spot (x1, y1, x2, y2) = (int (x1), int (y1), int (x2), int (y2) spot_img = img [y1:y2, x1:x2] spot_img_pil = Image.fromarray (spot_img) label = model_infer (spot_img_pil, model, class_indict) if label = 'empty': cv2.rectangle (overlay, (int (x1)) Int (y1)), (int (x2), int (y2)), color,-1) cnt_empty + = 1 cv2.addWeighted (overlay, alpha, new_image, 1-alpha, 0, new_image) # display the cv2.putText (new_image, "Available:% d spots"% cnt_empty, (30,95), cv2.FONT_HERSHEY_SIMPLEX, 0.7,255i) of the result 255,255), 2) cv2.putText (new_image, "Total:% d spots"% all_spots, (30,125), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255,255,255), 2) if save: filename = 'with_marking_predict.jpg' cv2.imwrite (filename, new_image) # cv_imshow (' new_image' New_image) return new_image
The core of model prediction is the model_infer function, which is also a routine operation of model prediction, which is not explained too much here.
In the case of video, it is nothing more than multiple frames. If you pass this function for each frame, you can make real-time prediction of the video:
Def predict_on_video (video_path, spot_dict, model, class_indict, ret=True): cap = cv2.VideoCapture (video_path) count = 0 while ret: ret, image = cap.read () count + = 1 if count = = 5: count = 0 new_image = predict_on_img (image, spot_dict, model, class_indict, save=False) cv2.imshow ('frame' New_image) if cv2.waitKey (10) & 0xFF = = ord ('q'): break cv2.destroyAllWindows () cap.release () above is the content of this article on "Real-time Detection Project Analysis of parking spaces in OpenCV parking lots" I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.