In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
Editor to share with you how to use selenium+opencv to identify slide verification and simulate login Zhihu function in python crawler. I hope you will get something after reading this article. Let's discuss it together.
Sliding verification distance
Two photos of CAPTCHA background map and slider map are obtained respectively, and then processed by Gaussian blur and Canny algorithm using opencv library, and then matched by matchTemplate method to obtain the sliding distance. It should be noted that when performing the operation of Zhihu verification code, the 10px distance needs to be offset to the right on the original basis.
Def get_distance (self, bg_img_path='./bg.png', slider_img_path='./slider.png'): "get the slider movement distance"# background image processing bg_img = cv.imread (bg_img_path, 0) # read grayscale image bg_img = cv.GaussianBlur (bg_img, (3,3)) 0) # Gaussian fuzzy denoising bg_img = cv.Canny (bg_img, 50,150) # Canny algorithm for edge detection # Slider does the same slider_img = cv.imread (slider_img_path, 0) slider_img = cv.GaussianBlur (slider_img, (3,3), 0) slider_img = cv.Canny (slider_img, 50) Find the best match res = cv.matchTemplate (bg_img, slider_img, cv.TM_CCOEFF_NORMED) # minimum Maximum, and get the minimum, maximum index min_val, max_val, min_loc, max_loc = cv.minMaxLoc (res) # for example: (- 0.0577279390818596, 0.3098162417411804, (0,0), (196,1) top_left = max_loc [0] # Abscissa return top_left slider trajectory
Simulate human behavior, continue to slide back some distance when you reach the position of the gap, and then fall back to the exact position
Def get_tracks (self, distance):''sliding track' 'tracks = [] v = 0 t = 0.2 # unit time current = 0 # slider current displacement distance + = 10 # multiple moving 10px, then back up while current < distance: if current < distance * 5 / 8: a = random.randint (1 3) else: a =-random.randint (2 4) v0 = v # initial velocity track = v0 * t + 0.5 * a * (t * * 2) # sliding distance tracks.append (round (track)) # add track current + = round (track) v = v0 + a * t # fall back to approximate position For i in range (5): tracks.append (- random.randint (1) 3) return tracks mouse slide operation
Slide according to the sliding track through the mouse action chain in selenium
Def mouse_move (self,slide,tracks):''mouse slide''# mouse click on the slider and follow ActionChains (self.driver). Click_and_hold (slide). Perform () # to slide according to the track For track in tracks: ActionChains (self.driver). Move_by_offset (track, 0). Perform () ActionChains (self.driver). Release (slide). Perform () circumvent Zhihu selenium detection
It occurs when crawling Zhihu using selenium automated test: error code 10001: please upgrade the client for an exception and try again. This error occurs because Zhihu can detect the script of selenium automated test.
Use chrome's remote debugging mode combined with selenium to remotely operate chrome for crawling, so as to prevent selenium from being detected by the website.
Add environment variabl
Add the directory of chrome.exe to the system environment variables, such as C:\ Program Files\ Google\ Chrome\ Application, so you can type chrome.exe directly on the command line to start the browser
Open the cmd window and execute the command
Chrome.exe-remote-debugging-port=9222-user-data-dir= "E:\ eliwang\ selenium_data"
Note that the port is not occupied. User-data-dir is used to indicate the path of the configuration file and customize it.
The browser opens and a new tab opens
The main code taken over by selenium
Options.add_experimental_option ("debuggerAddress", "127.0.0.1pur9222") closes the browser window
1. Use the close () method of the browser object, but not the quit () method.
2. Open and close manually
Complete login code
# coding:utf-8import cv2 as cvimport timeimport randomfrom selenium import webdriverfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.support.ui import WebDriverWait as WAITfrom selenium.webdriver import ActionChainsfrom selenium.webdriver.common.by import Byfrom urllib.request import urlretrieveclass Zhihu_login:''Zhihu simulated login' 'def _ _ init__ (self): options = webdriver.ChromeOptions () # manipulate chrome browser options.add_experimental_option ("debuggerAddress") "127.0.0.1 options=options 9222") self.driver = webdriver.Chrome (options=options) self.wait = WAIT (self.driver) 5) self.url = 'https://www.zhihu.com/' self.bg_img_path ='. / bg.png' self.slider_img_path ='. / slider.png' def run (self):''execution entry' self.driver.get (self.url) try: if WAIT (self.driver) 3) .login (EC.presence_of_element_located ((By.ID,'Popover15-toggle'): print ('login succeeded') self.save_cookie () self.driver.close () except: # switch to password login self.wait.until (EC.element_to_be_clickable ((By.XPATH)) '/ / div [contains (@ class) "SignFlow-tabs")] / div [2]')). Click () name_input = self.driver.find_element_by_name ('username') name_input.clear () name_input.send_keys (' account') pass_input = self.driver.find_element_by_name ('password') pass_input.clear () Pass_input.send_keys ('password') self.wait.until (EC.element_to_be_clickable ((By.XPATH) '/ / button [@ type= "submit"]'). Click () # Click the login button time.sleep (1) # to slide the verification Up to 5 attempts to revalidate if self.slide_verify (): print ('login successful') self.save_cookie () self.driver.close () else: print ('first login failed') for i in range (4): Print ('trying% d login'% (iTun2)) if self.slide_verify (): print (% d login successful'% (iTun2)) self.save_cookie () self.driver.close () Return print (% d login failed'% (I + 2)) print ('login failed 5 times) Stop login') self.driver.close () def slide_verify (self):''slide verification' slider_button = self.wait.until (EC.element_to_be_clickable ((By.XPATH,'/ / div [@ class= "yidun_slider"]') self.bg_img_url = self.wait.until ((By.XPATH) '/ / img [@ class= "yidun_bg-img"]') .get_attribute ('src') # get CAPTCHA background map url self.slider_img_url = self.wait.until ((By.XPATH,' / img [@ class= "yidun_jigsaw"]')) .get_attribute ('src') # get CAPTCHA slider map url urlretrieve (self.bg_img_url Self.bg_img_path) urlretrieve (self.slider_img_url, self.slider_img_path) distance = self.get_distance (self.bg_img_path, self.slider_img_path) distance + = 10 # the actual moving distance needs to be offset to the right 10px tracks = self.get_tracks (distance) self.mouse_move (slider_button Tracks) try: element = self.wait.until (EC.presence_of_element_located ((By.ID 'Popover15-toggle')) except: return False else: return True def save_cookie (self): cookie = {} for item in self.driver.get_cookies (): cookie [item [' name']] = item ['value'] print (cookie) print (' successfully obtain cookie information after landing Zhihu') def mouse_move (self,slide Tracks):''mouse slide''# mouse click on the slider and follow ActionChains (self.driver). Click_and_hold (slide). Perform () # to slide according to the track For track in tracks: ActionChains (self.driver). Move_by_offset (track, 0). Perform () ActionChains (self.driver). Release (slide). Perform () def get_distance (self, bg_img_path='./bg.png') Slider_img_path='./slider.png'): "" get the slider movement distance "" # background image processing bg_img = cv.imread (bg_img_path, 0) # read grayscale image bg_img = cv.GaussianBlur (bg_img, (3,3), 0) # Gaussian blur denoising bg_img = cv.Canny (bg_img, 50) Slider_img = cv.imread (slider_img_path, 0) slider_img = cv.GaussianBlur (slider_img, (3,3), 0) slider_img = cv.Canny (slider_img, 50,150) # find the best match res = cv.matchTemplate (bg_img, slider_img) Cv.TM_CCOEFF_NORMED) # minimum Maximum value And get the minimum and maximum indexes min_val, max_val, min_loc, max_loc = cv.minMaxLoc (res) # for example: (- 0.0577279390818596, 0.3098162417411804, (0,0), (196,1)) top_left = max_loc [0] # Abscissa return top_left def get_tracks (self) Distance):''sliding track' 'tracks = [] v = 0 t = 0 2 # unit time current = 0 # slider current displacement distance + = 10 # multiple moving 10px, then back up while current < distance: if current < distance * 5 / 8: a = random.randint (1 3) else: a =-random.randint (2 4) v0 = v # initial velocity track = v0 * t + 0.5 * a * (t * * 2) # sliding distance tracks.append (round (track)) # add track current + = round (track) v = v0 + a * t # fall back to approximate position For i in range (5): tracks.append (- random.randint (1) 3) return tracksif _ _ name__ = ='_ _ main__': Zhihu_login (). Run () finished reading this article I believe you have a certain understanding of "how to use selenium+opencv to identify slide verification and simulate login Zhihu function in python crawler". If you want to know more about it, welcome to follow the industry information channel, thank you for your reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.