In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
This "Python+Pillow+Pytesseract how to achieve CAPTCHA recognition" article knowledge points most people do not understand, so the editor summarized the following content, detailed, clear steps, with a certain reference value, I hope you can get something after reading this article, let's take a look at this "Python+Pillow+Pytesseract how to achieve CAPTCHA recognition" article.
I. Environmental configuration
Need pillow and pytesseract these two libraries, pip install installation is fine.
Pip install pillow-I http://pypi.douban.com/simple-- trusted-host pypi.douban.compip install pytesseract-I http://pypi.douban.com/simple-- trusted-host pypi.douban.com
Install Tesseract-OCR.exe
The configuration of the pytesseract library: search to find pytesseract.py, open the .py file, find tesseract_cmd, and change its value to the path where tesseract.exe was just installed.
Second, verification code recognition
In order to identify the CAPTCHA code, it is necessary to preprocess the image to remove the lines or noise that will affect the recognition accuracy and improve the recognition accuracy.
Example 1import cv2 as cvimport pytesseractfrom PIL import Imagedef recognize_text (image): # Edge preserving filtering dst = cv.pyrMeanShiftFiltering (image, sp=10, sr=150) # Grayscale image gray = cv.cvtColor (dst, cv.COLOR_BGR2GRAY) # binary ret, binary = cv.threshold (gray, 0,255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU) # morphological operation corrosion expansion erode = cv.erode (binary None, iterations=2) dilate = cv.dilate (erode, None, iterations=1) cv.imshow ('dilate', dilate) # logical operation makes it easy to identify cv.bitwise_not (dilate, dilate) cv.imshow (' binary-image') when the background is white and black. Dilate) # identify test_message = Image.fromarray (dilate) text = pytesseract.image_to_string (test_message) print (f 'recognition result: {text}') src = cv.imread (rattle.Unitestbank 044.png') cv.imshow ('input image', src) recognize_text (src) cv.waitKey (0) cv.destroyAllWindows ()
The running effect is as follows:
Recognition result: 3n3D
Process finished with exit code 0
Example 2import cv2 as cvimport pytesseractfrom PIL import Imagedef recognize_text (image): # Edge preserving filtering blur = cv.pyrMeanShiftFiltering (image, sp=8, sr=60) cv.imshow ('dst', blur) # Grayscale image gray = cv.cvtColor (blur, cv.COLOR_BGR2GRAY) # binary ret, binary = cv.threshold (gray, 0,255) Cv.THRESH_BINARY_INV | cv.THRESH_OTSU) print (f 'binarization adaptive threshold: {ret}') cv.imshow ('binary', binary) # Morphology operation get structural elements open operation kernel = cv.getStructuringElement (cv.MORPH_RECT, (3,2)) bin1 = cv.morphologyEx (binary, cv.MORPH_OPEN, kernel) cv.imshow (' bin1', bin1) kernel = cv.getStructuringElement (cv.MORPH_OPEN (2, 3) bin2 = cv.morphologyEx (bin1, cv.MORPH_OPEN, kernel) cv.imshow ('bin2', bin2) # logical operation makes it easy to identify cv.bitwise_not (bin2, bin2) cv.imshow (' binary-image') when the background is white and black. Bin2) # identify test_message = Image.fromarray (bin2) text = pytesseract.image_to_string (test_message) print (f 'recognition result: {text}') src = cv.imread (rattle.UnitestDB 045.png') cv.imshow ('input image', src) recognize_text (src) cv.waitKey (0) cv.destroyAllWindows ()
The running effect is as follows:
Binarization adaptive threshold: 181.0
Recognition result: 8A62N1
Process finished with exit code 0
Example 3import cv2 as cvimport pytesseractfrom PIL import Imagedef recognize_text (image): # Edge preserving filter denoising blur = cv.pyrMeanShiftFiltering (image, sp=8, sr=60) cv.imshow ('dst', blur) # Grayscale image gray = cv.cvtColor (blur, cv.COLOR_BGR2GRAY) # if the adaptive threshold is set by binarization, ret cannot be extracted from yellow 4, binary = cv.threshold (gray, 185,255) Cv.THRESH_BINARY_INV) print (threshold set by f 'binarization: {ret}') cv.imshow ('binary', binary) # logical operation makes the background white font black to make it easy to identify cv.bitwise_not (binary, binary) cv.imshow (' bg_image') Binary) # identify test_message = Image.fromarray (binary) text = pytesseract.image_to_string (test_message) print (f 'recognition result: {text}') src = cv.imread (rattle.UnitestDB 045.jpg') cv.imshow ('input image', src) recognize_text (src) cv.waitKey (0) cv.destroyAllWindows ()
The running effect is as follows:
Threshold set for binarization: 185.0
Recognition result: 7364
Process finished with exit code 0
The above is about the content of this article on "how to achieve CAPTCHA recognition in Python+Pillow+Pytesseract". I believe we all have some understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.