In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the relevant knowledge of how to use Selenium to search Baidu automatically in Python, the content is detailed and easy to understand, the operation is simple and fast, and it has a certain reference value. I believe you will have something to gain after reading this Python article on how to use Selenium to automatically search Baidu. Let's take a look.
Install Selenium
You can use pip to install Python's Selenium library: pip install selenium
(optional: to execute the project and control the browser, you need to install browser-specific WebDriver binaries.
Download the WebDriver binary and put it in the system PATH environment variable.)
Due to the inconsistency caused by the upgrade of the local browser version, and the tedious setting of the system PATH environment variable, I use webdriver_manager
Install Install manager:
Pip install webdriver-manager writes code
Bring in the module:
From selenium import webdriverfrom webdriver_manager.chrome import ChromeDriverManagerfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.common.keys import Keys
First, we define a class Search_Baidu, which is mainly used to initialize; define methods for automating steps; and close the browser at the end.
Class Search_Baidu:def _ init__ (self): def search (self, keyword): def tear_down (self):
Next, we introduce the implementation process of each method separately.
Def _ _ init__ (self): # class constructor Webdriver url used to initialize selenium = "https://www.baidu.com/" # the network address for access defined here self.url = url options = webdriver.ChromeOptions () options.add_experimental_option (" prefs ", {" profile.managed_default_content_settings.images ": 2}) # does not load pictures, speeding up access to options.add_experimental_option (" excludeSwitches ") ["enable-automation"]) # this step is very important Set it to developer mode to prevent it from being recognized by major websites that Selenium# is used here using the chrome browser, and using the chrome driver of webdriver_manager that we just installed, and assigning the above browser to set the options variable self.browser = webdriver.Chrome (ChromeDriverManager (). Install (), options=options) self.wait = WebDriverWait (self.browser, 10) # the timeout is 10s, because automation needs to wait for web page controls to load So here you set a default wait timeout of 10 seconds def tear_down (self): self.browser.close () # finally, close the browser
Next is the highlight, write the steps we take to operate the browser, open the browser, enter the Baidu web page, enter the search keyword: Selenium, wait for the search results, and save the title and URL of the search results to a file.
Def search (self, keyword): # Open Baidu self.browser.get (self.url) # wait for the search box to appear, wait up to 10 seconds Otherwise, the timeout error search_input = self.wait.until (EC.presence_of_element_located ((By.XPATH) is reported. "/ / * [@ id=" kw "]") # enter the search keyword search_input.send_keys (keyword) # enter search_input.send_keys (Keys.ENTER) # wait 10 seconds self.browser.implicitly_wait (10) # find all the search results results = self.browser.find_elements_by_css_selector (".t a, em C-title-text ") # iterate through all search results with open (" search_result.txt ") "w") as file: for result in results: if result.get_attribute ("href"): print (result.get_attribute ("text"). Strip () # search results title title = result.get_attribute ("text"). Strip () # search results Link = result.get_attribute ("href") # write to the file file.write (f "Title: {title}) Link is: {link} ") click on web page elements
There is a key point here, that is, how to click on web page elements:
For example:
Search_input = self.wait.until (EC.presence_of_element_located ((By.XPATH, "/ / * [@ id=" kw "]"))
And:
Self.browser.find_elements_by_css_selector (".t a, em, .c-title-text")
For example, couriers find your home through the address and send you express delivery. the XPATH and CSS Selector here are the addresses of web page elements, so how to get them?
The first is the developer tool that comes with Chrome. You can use the shortcut key F12 or you can find it yourself in the following figure:
Then in the Baidu search box, right-click:
Find the HTML element of the input box
Right-click on the HTML element and copy the XPath address.
This is a relatively simple way to locate web page elements. Then we have trouble locating the elements of the search results, as shown in the following figure:
We can't locate each element individually, but we need to find the rules, find all the search results at once, and then return a list, so that we can traverse the list. How can this be realized?
Next, we come up with a big artifact: SelectorGadget
SelectorGadget is a CSS Selector generator. You can find specific instructions in its official documentation. I'll give you a brief introduction here:
Start SelectorGadget first and click on the icon
The following box appears in the browser:
Then we use the left mouse button on the web page and click on the element we want to locate.
The page then looks like the following:
All the yellow instructions have been selected, and if we don't want the element, right-click to make it red, which means it has been removed. If the element we need is not selected, we left-click to select it to make it green. Finally, we want the selected page elements to become green or yellow, as shown in the following figure:
We can copy the contents of the box as CSS Selector.
Find all the search results through CSS Selector.
Results = self.browser.find_elements_by_css_selector (".t a, em, .c-title-text")
So far, we have achieved such a simple small application, in fact, selenium is to help us automatically operate web page elements, so we locate web page elements is the top priority, I hope this article will give you some help.
I attach the code below:
From datetime import timefrom selenium import webdriverfrom webdriver_manager.chrome import ChromeDriverManagerfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.common.keys import Keysclass Search_Baidu: def _ init__ (self): url = "https://www.baidu.com/" self.url = url options = webdriver.ChromeOptions () options.add_experimental_option (" prefs " {"profile.managed_default_content_settings.images": 2}) # do not load pictures, speed up access to options.add_experimental_option ("excludeSwitches", ["enable-automation"]) # this step is very important Set to developer mode to prevent Selenium self.browser = webdriver.Chrome (ChromeDriverManager (). Install (), options=options) self.wait = WebDriverWait (self.browser, 10) # timeout of 10s def search (self, keyword): # Open Baidu web page self.browser.get (self.url) # wait for the search box to appear, wait up to 10 seconds Otherwise, the timeout error search_input = self.wait.until (EC.presence_of_element_located ((By.XPATH) is reported. "/ / * [@ id=" kw "]") # enter the search keyword search_input.send_keys (keyword) # enter search_input.send_keys (Keys.ENTER) # wait 10 seconds self.browser.implicitly_wait (10) # find all the search results results = self. Browser.find_elements_by_css_selector (".t a Em, .c-title-text ") # traverses all search results with open (" search_result.txt ") "w") as file: for result in results: if result.get_attribute ("href"): print (result.get_attribute ("text"). Strip () # search result title title = result.get_attribute ("text") .strip () # URL of search results link = result.get_attribute ("href") # write to the file file.write (f "Title: {title}) Link is: {link} ") def tear_down (self): self.browser.close () if _ _ name__ = =" _ _ main__ ": search = Search_Baidu () search.search (" selenium ") search.tear_down () this is the end of the article on" how to use Selenium to search Baidu automatically in Python ". Thank you for reading! I believe that everyone has a certain understanding of the knowledge of "how to use Selenium to search Baidu automatically in Python". If you still want to learn more knowledge, welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.