In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "how to use pytesseract to achieve text recognition in python development". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Pytesseract is an open source picture character recognition library, can recognize Chinese, English and many other types of language, need to install tesseract-ocr engine before use, this engine in windows, linux, macos can be installed, (OCR,Optical Character Recognition, optical character recognition), after installation, corresponding to install pytesseract library, you can do simple text recognition, with its own text class library, recognition is sometimes not accurate, but you can train your own recognition library.
Tesseract's OCR engine was first developed by HP Labs in 1985, and by 1995 it had become one of the three most accurate recognition engines in the OCR industry. However, HP soon decided to give up the OCR business, and Tesseract has been sealed ever since.
A few years later, HP realized that instead of putting Tesseract on the shelf, it would be better to contribute to the open source software industry and reinvigorate it-in 2005, Tesseract was acquired by the Nevada Institute of Information Technology and asked Google to improve, eliminate Bug, and optimize Tesseract. "
The following is a list of how to build a python tesseract-ocr environment under the environment of the Centos7 system, and use python for simple picture recognition.
Install Tesseract-ocr, there is an installation URL on github, https://github.com/tesseract-ocr/tesseract/wiki, on Centos7, use the simplest yum to install, this way, you need to keep the Internet connection, but also do not need to compile the source code for installation.
/ / Update the configuration and add the address of tesseract
Yum-config-manager-- add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/
Sudo rpm-- import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key// updates yum
Yum update
/ / install tesseractyum install tesseract
/ / install simplified Chinese language pack yum install tesseract-langpack-chi_sim
After installing the engine, install the development package for python and install it using pip. The installation command is:
Pip install pytesseract
Simple picture recognition code:
Import pytesseract
From PIL import Image
# Open the picture
Image = Image.open ('picture path')
# convert the text in the picture into a string
Code = pytesseract.image_to_string (image, lang='chi_sim')
# output string
Print (code)
This is the end of the content of "how to use pytesseract to achieve character recognition in python development". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.