Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the more practical character recognition interfaces in artificial intelligence?

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Xiaobian to share with you what are the more practical text recognition interfaces in artificial intelligence, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article. Let's learn about it!

A more practical character recognition interface. Baidu AI interface docking is very easy, signature encryption is not involved. The only disadvantage is that the interface documentation is not perfect, so it is easy to encounter pits. Before that, the first practical interface: ID card identification interface, we only took a positive example at that time. This API does not support image url, but needs to encode image data in BASE64. We post the key code directly:

Maybe careful readers will find that my code seems to have changed! The code changes in the previous article and the previous article are very big. Let's first take a look at the BASE64 coding implementation code for images in the last lesson:

Var image = fs.readFileSync ('. /.. / public/images/begin.jpg') .toString ('base64')

You can find that my last article actually put the picture under the project public static folder, and then use NodeJS to read the file asynchronously to read the image data, and then encode it with BASE64. But in the actual development, this logic is very unreasonable! What's the irrationality? There are two main points:

Can only identify local photos of the project, which often do not meet the requirements. If too many photos are stored locally, the efficiency may be greatly reduced.

So you can see that I have made appropriate changes to the interface, the front end passes the parameter image url, then buffers the image data continuously through http.get (), then converts the data into a Buffer object, and finally encodes it with BASE64.

We can take a look at the effect of the modified API:

(the picture is from Baidu, fake data)

You can see that we passed in the image url, which can finally be converted into BASE64 coding, and then the API can be successfully parsed to the user ID card text message. But what are the pros and cons of doing this:

Advantages: compared with reading local photos, users can pass the url of the specified picture to buffer data and then encode it as BASE64, which can achieve the text recognition of the image that the user wants to upload. Disadvantages: users will be particularly troublesome if they want to recognize local photos on their mobile phones and computers. They need to upload the photos to the server (such as Qiniu image server), and then pass the image url to the interface for analysis, which greatly reduces the user experience.

Therefore, such a proposal is in fact unreasonable, so how to modify the proposal will be more reasonable? In fact, I think there are two options to choose from:

1. If the image only needs to be parsed and does not need to be saved, the front end provides the user to select the local image operation, directly encode it into BASE64, and then directly pass the BASE64 code to the backend, and the backend directly calls the API to parse the image text information.

two。 If the picture needs to be saved, the front end can convert the picture into binary format, and the back end can upload the image to the server or directly to the object storage oss, then obtain the image path, load the buffered data using the existing method, encode it with BASE64, and finally call the API to parse the image text information.

This part I did not continue to optimize, interested can be combined with my previous uploaded image article: Node upload file (1) to continue to optimize the interface.

So after we have explained the specific ideas of the current interface implementation, we can start to implement several practical interfaces to see the results. First, let's take a look at the first practical interface: bank card photo recognition.

Bank card photo recognition

The use of this interface can be seen by the interface name: identify the bank card and return the card number, expiration date, issuing bank and card type. First, let's take a look at the specific description of the interface in the document:

The request parameters of this API are simple and clear. It carries access_token for authentication, and image encodes the BASE64 of the buffered data of the image. Next, post the key code:

We can test whether the incoming bank card photos can resolve the valid information of the bank card:

You can see that the input url can successfully parse the basic information carried on the bank card photo. We directly paste the return parameter document description:

Business license identification

As the name implies, this interface is to identify the photos of the business license: you can identify the business license and return the values of key fields, including unit name, type, legal person, address, validity period, document number, social credit code, and so on. First of all, directly paste the description of the document request parameters:

The required parameters are also fixed with access_token and image. There are two optional parameters that can be passed as appropriate. Next, post the key code:

I will directly find a business license link to test the effect of the interface:

You can see that the business information was parsed successfully, and there was too much information returned, so I didn't take all the screenshots. I posted the description of the return parameter document:

Passport identification

Structural identification of the information page of Chinese mainland resident passports is supported, including country code, name, pinyin, gender, license number, date of birth, place of birth, date of issue, period of validity, and place of issue. Post the request parameter document description first:

The request parameters are also very concise. You only need to carry the access_token authentication and the image parameter of the image encoded by BASE64. We post the key code:

Let's directly test the interface effect:

Directly paste the description of the returned parameters document to understand the meaning of the parameters:

Form text recognition (contains two interfaces)

Automatically identify table lines and table contents, and structurally output the header, footer and the text content of each cell. This API is an asynchronous API, which is divided into two APIs: submit request API and get result API. The use of the two interfaces is described below.

Submit request API

Directly post the request parameter document description:

This API is the same or two required parameters: access_token and image. We post the key code:

We can take a look at the documentation of the request API about the return parameters:

In other words, if the request result is passed into image, a request_id will be returned, and then we can get the parsing result by passing request_id in the next result. Now let's take a look at the interface effect:

You can see that we have successfully obtained request_id. Next, let's take a look at the second interface to get the parsing results of the table. First of all, paste the request parameter description:

The parsing result can be obtained from the request_id and access_token obtained in the previous step. The returned result is specified as excel or json format by passing the parameter result_type. Next, make improvements directly in the code in the first step and add the second step:

We are calling for the first time

PostHelper.baseRequestBase (url, param, function (err, data))

The request is returned to request_id, and then the request is made again with request_id to get the parsing table result. We can take a look at the results and return the results:

You can see that the data in the table picture has been successfully parsed. Of course, we choose to change the parameter result_type carried by the second request to excel, and the API will parse the data in the table image into an excel table and return the link. We can test the effect:

You can see that the interface returns the download link for the excel table. One advantage of this is that, for example, our database design table truncates a graph of a data table and throws it into the interface to automatically generate a table link for us to download, saving us the time and effort of making data tables.

Relatively practical and free interfaces basically have the above several, in fact, Baidu AI provides a series of interfaces. But I am not going to introduce them one by one here. If you are interested, you can view Baidu AI character recognition documents by yourself:

Https://ai.baidu.com/docs#/OCR-API/87932804

In fact, if face recognition, character recognition and other AI interfaces are properly introduced into the business development process, it can greatly improve the user experience and retain potential customers to a greater extent. And Baidu AI also provides a series of interfaces that need to apply for permission, so Chengdu facilitates our development. We don't have to investigate how the bottom layer recognizes the text in the picture, we can quickly access API to identify the functions we need.

These are all the contents of this article entitled "what are the more practical text recognition interfaces in artificial intelligence?" Thank you for your reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 214

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report