What is the license plate recognition system of TensorFlow based on Google Street View multi-digit recognition technology? 04/16 Update SLTechnology News&Howtos

What is the license plate recognition system of TensorFlow based on Google Street View multi-digit recognition technology?

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Based on Google Street View multi-digit digital recognition technology TensorFlow license plate recognition system is how, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

Over the past few weeks I have been dabbling in the field of deep learning, especially convolution neural network models. Recently, Google released a good paper around Street View's multi-digit recognition technology. This paper describes a single end-to-end neural network system for extracting street view house numbers. Then, the author describes how to break through the accuracy of Google CAPTCHA recognition system based on the same network structure. In order to experience the implementation of neural network, I decided to try to design a system that can solve similar problems: automatic license plate recognition system. There are three reasons for designing such a system:

I should be able to build the same or similar network architecture based on Google's paper: the network architecture provided by Google is quite good at CAPTCHA recognition, so it should be very useful to use it to identify license plate numbers. Having a well-known network architecture will greatly simplify my steps in learning CNNs.

I can easily generate training data. A big problem in training neural networks is that a large number of label samples are needed. Hundreds of thousands of tagged pictures are usually needed to train a network. Fortunately, because the license plate number of the UK is relatively consistent, I can synthesize the training data.

Curiosity. The traditional automatic license plate recognition system relies on its own algorithms to achieve license plate location, standardization, segmentation and character recognition and other functions. In this way, the code to implement these systems may reach thousands of lines. However, I am more interested in how to develop a good system with relatively little code and minimal domain expertise.

The environment for developing the project requires software such as Python,Tensorflow,OpenCV and NumPy. The source code is here.

Input, output and sliding window

In order to simplify the generated training images and reduce the amount of calculation, I decided that the input images that can be operated in the network are 128-64 grayscale images.

The image with a resolution of 128 to 64 is used as input, which is small enough for training based on appropriate resources and reasonable time, and large enough for license plate reading.

In order to detect the license plate in a larger picture, a multi-scale sliding window is used to solve the problem.

The image on the right is the input image of the neural network, the size is 128-64, while the picture on the left shows the sliding window in the context of the original input image.

For each slide window, the network outputs:

Enter the probability that the license plate exists in the picture. (the green box shown in the animation above)

The probability of characters at each position, for example, for each of the seven possible locations, the network should return a probability distribution across 36 possible characters. (in this project, I assume that the license plate number is exactly 7 characters, which is usually the case with the UK license plate.)

Consider that a license plate exists if and only if:

The license plate is completely included in the boundary of the picture.

The width of the license plate is less than 80% of the width of the picture, and the height of the license plate is less than 87.5% of the height of the picture.

The width of the license plate is greater than 60% of the width of the picture, or the height of the license plate is greater than 60% of the height of the picture.

To detect these numbers, we can use a sliding window that slides 8 pixels at a time and provides a zoom level with a zoom factor of $\ sqrt {2} $without losing the license plate, while not generating excessive matching boxes for any single license plate. Some copies will be made during the post-processing (explained later).

Composite picture

In order to train any neural network, a set of training data with correct output must be provided. It is shown here as a set of 128-64-sized images with the expected output. Here is an example of the training data generated by this project:

The first part of the expected output represents the number that the network should output, and the second part represents the "presence" value that the network should output. I explained in parentheses that the marked data did not exist.

The process of generating the image is shown in the following figure:

The color of the text and license plate is randomly selected, but the text color must be darker than the license plate color. This is to simulate the light changes in a real scene. Finally, add some noise, which can not only explain the noise of the real sensor, but also avoid relying too much on the sharpened contour boundary to see the out-of-focus input image.

Having a background is important, which means that the network must learn to distinguish the boundaries of license plates without "deception": using a black background, for example, the network may learn to distinguish the location of the license plate based on non-black. This can lead to confusion about the car in the real picture.

The background picture comes from SUN database

Http://vision.cs.princeton.edu/projects/2010/SUN/

It contains more than 100000 pictures. The important thing is that a large number of pictures can avoid "remembering" background pictures on the Internet.

License plate transformation uses an affine transformation based on random roll, tilt, deflection, translation and scaling. The range allowed for each parameter is a collection of all the cases in which the license plate number may be seen. For example, deflection allows more variation than rolling (you are more likely to see a car turning rather than turning to one side).

The code to generate the picture is relatively short (about 300 lines). Can be read from gen.py.

Network structure

The network structure used is shown in the following figure:

You can view the introduction of the CNN module through Wikipedia. The above network structure is actually based on Stark's paper, which gives more details about this structure than Google's paper.

Https://vision.in.tum.de/_media/spezial/bib/stark-gcpr15.pdf

The output layer has a node (left) that is used as an indicator of the existence of the license plate. The remaining nodes are used to encode the probability of a specific license plate number: each column in the figure is consistent with each digit number in the license plate number, and each node gives the probability of being consistent with the existing characters. For example, the node in row 3 of the second column gives the probability that the second number in the license plate number is the character c.

Except that the output layer uses ReLU activation function, all layers adopt the standard structure of deep neural network. Indicates that the existing node uses the sigmoid activation function, typically for binary output. Other output nodes use softmax traversal characters (the result is that the sum of the probabilities of each column is 1), which is the standard method for modeling discrete probability distributions.

The code that defines the network structure is in model.py.

The loss function is defined according to the cross entropy of the label and the network output. For the sake of numerical stability, the activation function of the last layer is involved in the calculation of cross entropy by using softmax_cross_entropy_with_logits and sigmoid_cross_entropy_with_logits. For a detailed and intuitive introduction to cross entropy, you can see this section in the free online book of Michael A. Nielsen.

It takes about 6 hours to train.py using a piece of nVidia GTX 970. a background process of CPU is used to run the generation of training data.

Output processing

In fact, in order to detect and recognize the license plate from the input image, a detection network similar to the one above is built, and a 128'64 sliding window with multi-location and multi-scale is used, which is described in the sliding window section.

The difference between the detection network and the training network is that the last two layers use the convolution layer instead of the full connection layer, so that the input picture size of the detection network is not limited to 128-64. Throw a complete picture into the network at a specific size, and then return a picture in which each "pixel" has an existence / character probability value. Because adjacent sliding windows share many convolution features, rolling these specific images into the same network can avoid calculating the same features multiple times.

The presence section of the visual output returns an image like the following:

The bounding box on the figure is the area where the probability of the presence of a license plate in network detection is greater than 99%. The reason for setting the high threshold is to explain a deviation introduced during the training process: almost half of the training images contain a license plate, but pictures with license plates in the real scene are rare, so if the threshold is set to 50%, then the false positive detection network will be on the high side.

After examining the network output, we use a non-maximum suppression (NMS) method to filter out redundant bounding boxes:

First group overlapping rectangles, and then output for each group:

The intersection of all bounding boxes.

Find out the license plate number corresponding to the boundary box with the highest probability of license plate existence in the group.

The test results of the license plate picture shown at the beginning of the article shown below:

Oops, the character R was mistakenly detected as P. The sliding window with the highest probability of license plate existence in the image above is as follows:

At first glance, it seems that this is a piece of cake for the detector, but it turns out to be a problem of over-fitting. The following figure shows the font of R in the license plate number used to generate the training picture:

Notice how the angle of the character R leg is different from that of the character R leg in the input picture. Because the network has only learned the R font above, it is confused when it comes to R characters of different fonts. To test this hypothesis, I improved the picture in GIMP to make it closer to the training font:

After the improvement, the detection gets the correct output:

The source code for detection is here, detect.py.

I have opened up a system with relatively short code (about 800 lines) that can automatically recognize license plate numbers without importing any domain-specific libraries and without much domain-specific knowledge. In addition, I have solved the problem of thousands of training pictures by synthesizing pictures online (usually in the case of deep neural networks).

On the other hand, my system also has some shortcomings:

It only applies to specific license plate numbers. In particular, the network structure explicitly assumes that the output is only seven characters.

Applies only to specific fonts.

The speed is too slow. It takes a few seconds for the system to run a picture of the right size.

To solve the first problem, the Google team split the top layers of their network structure into multiple subnetworks, each of which is used to assume different digits in the output number. There is also a parallel subnetwork to determine how many numbers exist. I think this method can be applied here, but I didn't implement it in this project.

With regard to point 2, I gave an example above that the character R was mistakenly detected due to a slightly different font. If you try to detect the US license plate, the false positive will be even more serious, because there are more font types of US license plate. One possible solution is to make the training data have more different font types to choose from, although it is not clear how many font types are needed to succeed.

The slow problem mentioned in point 3 is the cancer that kills many applications: it takes a few seconds to process an input image of the right size on a fairly powerful GPU. I don't think it is possible to avoid this problem without introducing a cascaded detection network, such as Haar cascade, HOG detector, or a simpler neural network.

I am interested in trying to compare it with other machine learning methods, especially when posture regression looks promising and may end up with a basic classification stage. If you use a machine learning library like scikit-learn, it should be equally simple.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.