Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use pytesseract to realize character recognition in python Development

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how to use pytesseract to achieve text recognition in python development". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Pytesseract is an open source picture character recognition library, can recognize Chinese, English and many other types of language, need to install tesseract-ocr engine before use, this engine in windows, linux, macos can be installed, (OCR,Optical Character Recognition, optical character recognition), after installation, corresponding to install pytesseract library, you can do simple text recognition, with its own text class library, recognition is sometimes not accurate, but you can train your own recognition library.

Tesseract's OCR engine was first developed by HP Labs in 1985, and by 1995 it had become one of the three most accurate recognition engines in the OCR industry. However, HP soon decided to give up the OCR business, and Tesseract has been sealed ever since.

A few years later, HP realized that instead of putting Tesseract on the shelf, it would be better to contribute to the open source software industry and reinvigorate it-in 2005, Tesseract was acquired by the Nevada Institute of Information Technology and asked Google to improve, eliminate Bug, and optimize Tesseract. "

The following is a list of how to build a python tesseract-ocr environment under the environment of the Centos7 system, and use python for simple picture recognition.

Install Tesseract-ocr, there is an installation URL on github, https://github.com/tesseract-ocr/tesseract/wiki, on Centos7, use the simplest yum to install, this way, you need to keep the Internet connection, but also do not need to compile the source code for installation.

/ / Update the configuration and add the address of tesseract

Yum-config-manager-- add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/

Sudo rpm-- import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key// updates yum

Yum update

/ / install tesseractyum install tesseract

/ / install simplified Chinese language pack yum install tesseract-langpack-chi_sim

After installing the engine, install the development package for python and install it using pip. The installation command is:

Pip install pytesseract

Simple picture recognition code:

Import pytesseract

From PIL import Image

# Open the picture

Image = Image.open ('picture path')

# convert the text in the picture into a string

Code = pytesseract.image_to_string (image, lang='chi_sim')

# output string

Print (code)

This is the end of the content of "how to use pytesseract to achieve character recognition in python development". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report