In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "how Python Selenium climbs the daily weather". The content is simple and clear. I hope it can help you solve your doubts. Let the editor lead you to study and learn this article "how Python Selenium climbs the daily weather".
I. preparatory work
1. Introduction and installation of Selenium
Selenium is a Web automation (testing) tool that allows browsers to load pages automatically and get the data they need according to our instructions.
A crawler is a program that automatically grabs Internet information, grabbing valuable information from the Internet, just like a bug crawling tirelessly around in a building.
Traditional crawlers crawl site information by directly simulating HTTP requests. Because of the obvious difference between this way and browser access, many sites take some anti-crawling measures. Selenium crawls information through a simulated browser, which behaves almost the same as the user, and does not have to analyze the specific parameters of each request, which is easier to learn than traditional crawlers. The only disadvantage is that the speed is slow, if there is no requirement for the speed of the crawler, then using Selenium is a very good choice.
The installation of Selenium is quite simple, just install pip install selenium online directly.
2. Download and install chromedriver
Selenium does not have the function of a browser itself, and it needs to be combined with a third-party browser to use it. Google's Chrome browser can easily support this function, as long as you install its driver Chromedriver.
Download address: http://chromedriver.storage.googleapis.com/index.html
How do I view the browser version? You can type the version information on the first line of 'chrome://version/',' in the address bar. Take the scum's own browser as an example, the version is 76.0.3809.132 (32-bit). (in addition, it can also be found in the "about" information that software usually has.)
Extract the download file to get chromedriver.exe. As a novice tutorial, scum is not recommended to set any complex environment variables, and later use absolute path references when using chromedriver.
2. Python+Selenium crawler
After completing the preparatory work, you can crawl. First of all, we use Selenium's browser driver interface tool webdriver to open our crawling target: the daily weather chart on the website of the Central Meteorological Observatory.
Code:
From selenium import webdriver # # browser driver interface chrome_driver for importing selenium = path + 'chromedriver.exe' # chromedriver file location driver = webdriver.Chrome (executable_path = chrome_driver) # load browser driver driver.get (' http://www.nmc.cn/publish/observations/china/dm/weatherchart-h000.htm') # Open Page
1. Selenium page element positioning
Suppose we want to download the basic weather analysis chart of 500hPa height, what would we do in peacetime? It must be to click 500hPa on the page, select basic weather analysis, and then right-click on the picture to save as right? Then we will control the web page with the help of Selenium, and the process of crawling is to simulate the above series of human operations and let the machine execute automatically.
The control of web pages by Selenium is based on a variety of HTML structural elements. In the process of use, the positioning of various elements of web pages is the foundation, and only by accurately grasping the corresponding elements can subsequent automatic control be carried out.
1.1 View page elements
Selenium positioning will involve a little basic knowledge of HTML, but it doesn't matter if you don't understand it, we have a way to be lazy. Using the developer tool of the chrome browser, right-click-> check in the chrome browser, we can easily view the appropriate page elements.
For example: if we move the mouse to the hierarchical 500hPa, we can automatically navigate to the HTML element corresponding to the 500hPa level button. Looking at its properties, we can see that the page element corresponding to the 500hPa button is link text.
1.2 element positioning
After viewing the element, it is necessary to locate the element, as the name implies, to find the location of the element in the web page, so that the precise page control can be carried out later. Selenium provides many ways to locate elements, two of which are mainly used in this article:
Id location: find_element_by_id ()
Link location: find_element_by_link_text ()
Button1=driver.find_element_by_link_text ('500hpa') # precise positioning element through link text button2=driver.find_element_by_id ('plist') # precise positioning element elem_pic = driver.find_element_by_id (' imgpath') # via id precise positioning element through id
2. ActionChains simulates mouse operation
After the element positioning is completed, we also need to simulate the mouse for hovering and clicking to select and download the target picture we need. Selenium provides us with a class to handle such events-- ActionChains. The two operations used here are:
Mouse suspension operation: move_to_element ()
Left mouse button: click ()
The complete implementation script is given below:
From selenium import webdriver # # browser driver interface from selenium.webdriver.common.action_chains import ActionChainsimport timeimport oschrome_driver for importing selenium = path + 'chromedriver.exe' # chromedriver file location driver = webdriver.Chrome (executable_path = chrome_driver) # load browser driver driver.get (' http://www.nmc.cn/publish/observations/china/dm/weatherchart-h000.htm') # Open page # driver.maximize_ Window () time.sleep (1) # simulate mouse selection height layer button1=driver.find_element_by_link_text ('500hpa') # precise positioning element action = ActionChains (driver) through link text. Move_to_element (button1) # Mouse over an element action.click (button1). Perform () # mouse click time.sleep (1) # pay attention to adding waiting time Avoid failure due to too fast for p in range (1Magne3): # download weather charts at 08:00 and 20:00 yesterday str_p=str (p) # simulate mouse selection time button2=driver.find_element_by_id ('plist') # precisely locate the element action = ActionChains (driver). Move_to_element (button2) # Mouse over an element through id action.click (button2) .perform () # Mouse click time.sleep (1) Select (button2). Select_by_index (str_p) # drop-down menu starts from 0 by text positioning time.sleep (1) # simulated mouse selection picture elem_pic = driver.find_element_by_id ('imgpath') # pass Id precise positioning element action = ActionChains (driver). Move_to_element (elem_pic) action.context_click (elem_pic). Perform () # right-click filename= str (elem_pic.get_attribute ('src')). Split (' /') [- 1] .split ('?) [0] # get the file name print (filename) time.sleep (1) above is "Python Selenium" How to get all the contents of the article "Daily Weather" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.