In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
Today, I will talk to you about the rapid deployment of CAPTCHA identification interface in Serverless using cloud function SCF. Many people may not know much about it. In order to make you understand better, the editor summarizes the following contents. I hope you can get something from this article.
CAPTCHA recognition is an unavoidable problem for crawlers to implement automated scripts. Usually the CAPTCHA program is deployed either locally or on the server side. If deployed on the server side, you need to build and configure the network environment and write the calling interface, which is an extremely tedious and time-consuming process.
But now we can quickly release the local CAPTCHA identification program online through the Tencent Cloud function SCF, which greatly improves the development efficiency.
Effect display
As you can see, the recognition effect is quite good, even exceeding the naked eye recognition rate.
Operation steps
The traditional CAPTCHA recognition process is
Image preprocessing (ashing, denoising, cutting, binarization, de-interfering lines, etc.)
CAPTCHA character feature extraction (SVM,CNN, etc.)
CAPTCHA recognition
Next, I will take you to create, write and publish an online verification and identification cloud function.
Step 1: create a new python cloud function
See the series of articles, "everything can Serverless using SCF+COS to quickly develop full-stack applications."
Step 2: write verification to identify cloud functions
Life is short, show me the code.
Here I will take the simplest CAPTCHA recognition program as an example, directly on the code.
Import ioimport osimport timefrom PIL import Image as imageimport json# character features chars = {'1percent: [1, 1, 1, 0, 1,...],' 2 characters: [1, 0, 0, 1, 0,...],'3 characters: [0, 1, 0, 0, 1,...] # other character features.} # Grayscale processing def covergrey (img): return img.convert ('L') # remove CAPTCHA border def clearedge (img): for y in range (img.size [1]): img.putpixel ((0, y), 255) img.putpixel ((1, y), 255) img.putpixel ((2, y) Img.putpixel ((img.size [0]-1,255) img.putpixel ((img.size [0]-2, y), 255) img.putpixel ((img.size [0]-3, y), 255) for x in range (img.size [0]): img.putpixel ((x, 0), 255) img.putpixel ((x, 1), 255) img.putpixel ((x, 2) Img.putpixel ((x, img.size [1]-1), 255) img.putpixel ((x, img.size [1]-2), 255) img.putpixel ((x, img.size [1]-3) Return img# removes interference lines and converts them to black-and-white photos def clearline (img): for y in range (img.size [1]): for x in range (img.size [0]): if int (img.getpixel ((x, y) > = 110: img.putpixel ((x, y), 0xff) else: img.putpixel ((x, y)) 0x0) return img# denoising / pnum- denoising efficiency def del_noise (im, pnum=3): W, h = im.size white = 255black = 0 for i in range (0,w): im.putpixel ((I, 0), white) im.putpixel ((I, h-1), white) for i in range (0, h): im.putpixel ((0, I), white) im.putpixel ((w-1) I), white) for i in range (1, w-1): for j in range (1, h-1): val = im.getpixel ((I, j)) if val = = black: cnt = 0 for ii in range (- 1,2): for jj in range (- 1) 2): if im.getpixel ((I + ii, j + jj)) = = black: cnt + = 1 if cnt
< pnum: im.putpixel((i, j), white) else: cnt = 0 for ii in range(-1, 2): for jj in range(-1, 2): if im.getpixel((i + ii, j + jj)) == black: cnt += 1 if cnt >= 7: im.putpixel ((I, j) Black) return im# image data binarization def two_value (code_data): table = [serverless] for i in code_data: if I < 140,140,140 table.append (0) else: table.append (1) return table# image preprocessing def pre_img (img): img = covergrey (img) # go Color img = clearedge (img) # de-edge img = clearline (img) # de-line img = del_noise (img) # de-noising return img# processing picture data def data_img (img): code_data = [serverless] # CAPTCHA data list for i in range (4): # cut verification code x = 5 + I * 18 # you can use PS to determine the picture Cutting position code_data.append (img.crop (x) 9, x + 18 Getdata () code_ data [I] = two_value (code_ data [I]) # binary data return code_data# CAPTCHA recognition def identify (data): code = ['] * 4 # CAPTCHA character list diff_min = [432] * 4 # minimum initialization distance-the number of data points that do not match (a total of 120 data points) for char In chars: # traverse CAPTCHA characters (compare 4 CAPTCHAs at a time) diff = [0] * 4 # CAPTCHA difference (reset this distance before judging each character) for i in range (4): # calculate four CAPTCHA for j in range: # compare CAPTCHA features pixel by pixel If data [I] [j]! = chars [char] [j]: diff [I] + = 1 # distance + 1 for i in range (4): if diff [I] < diff_min [I]: # smaller than the existing distance (more consistent with) diff_ [I] = diff [I] # refresh minimum distance Code [I] = char # Refresh Best CAPTCHA return''.join (code) # output result def predict (imgs): code =' 'img = imgs.read () img = image.open (io.BytesIO (img)) img = pre_img (img) # pre-process image data = data_img (img) # get picture data code = identify (data) # Identification verification code return codedef apiReply (reply Code=200): return {"isBase64Encoded": False, "statusCode": code, "headers": {'Content-Type':' application/json', "Access-Control-Allow-Origin": "*"}, "body": json.dumps (reply, ensure_ascii=False)} def main_handler (event Context): main_start = time.time () flag = True if 'image' in event [' queryString'] else False code = predict (event ['queryString'] [' image']) if 'image' in event [' queryString'] else 'invalid request' return apiReply ({'ok': flag,' code': code, 'spendTime': str (time.time ()-main_start)})
The old rule, first sort out the flow of the whole cloud function.
Def main_handler (event, context): main_start = time.time () flag = True if 'image' in event [' queryString'] else False code = predict (event ['queryString'] [' image']) if 'image' in event [' queryString'] else 'invalid request' return apiReply ({'ok': flag,' code': code, 'spendTime': str (time.time ()-main_start)})
First, we get the CAPTCHA image data of the api request through the event event, and then determine whether the image parameter exists. If it does not exist, we return a prompt that the request is invalid.
Def predict (imgs): code =''img = imgs.read () img = image.open (io.BytesIO (img)) img = pre_img (img) # preprocess picture data = data_img (img) # get picture data code = identify (data) # Identification code return code
If the image request parameter exists, call the predict function to parse the verification code. The process is as follows:
Read CAPTCHA image
CAPTCHA image preprocessing
Identify the processed CAPTCHA code
# Image preprocessing def pre_img (img): img = covergrey (img) # Color removal img = clearedge (img) # Edge removal img = clearline (img) # Line removal img = del_noise (img) # denoising return img
Let's take a look at the image preprocessing process.
Decolorize the CAPTCHA and convert it into a grayscale image
Remove the black border of the CAPTCHA
Remove CAPTCHA interference lines
Remove CAPTCHA noise
# character feature chars = {'1characters: [1, 1, 1, 0, 1, 1,...],' 2 characters: [1, 0, 0, 1, 0,...],'3 characters: [0, 1, 0, 0, 1,...] # other character features.} # CAPTCHA recognition def identify (data): code = [''] * 4 # CAPTCHA character list diff_min = [432] * 4 # initialization minimum distance-the number of data points that do not match (a total of 120 data points) for char in chars: # traversal of CAPTCHA characters (4 CAPTCHAs per character comparison) Diff = [0] * 4 # difference between CAPTCHA codes (reset this distance before judging each character) for i in range (4): # calculate four CAPTCHA codes for j in range (432): # compare CAPTCHA feature if data pixel by pixel [I] [j]! = chars [char] [j]: Diff [I] + = 1 # distance + 1 for i in range (4): if diff [I] < diff_min [I]: # less than the existing distance (more consistent with) diff_ [I] = diff [I] # refresh minimum distance code [I] = char # refresh the best verification code return''.join (code) # output result
PS: the character feature chars in the article is not complete, you may need to extract all the features yourself.
Finally, let's take a look at the recognition process of the CAPTCHA: here we directly and rudely take all the pixels of the processed image data as the characteristics of the characters (the so-called main road to simplicity). Then compare the image data of each character to be recognized with the characteristics of all characters one by one, and take the most similar character as the recognition result.
Well, if there's no problem, you can get the correct identification results.
After reading the above, do you have any further understanding of the rapid deployment of CAPTCHA identification API using cloud function SCF in Serverless? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.