In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Python how to use selenium to achieve a dynamic crawler, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.
1. Installation
Selenium installation is relatively simple, you can install it directly with pip. Open cmd and type
Pip install selenium
Just fine
two。 Install chromedriver
Chromedriver is the driver of Google browser, because I usually use chrome
It should be noted here that the version of chromedriver needs to correspond to the version of Chrome you installed, and the version of Chrome can be found in the upper right corner of the browser-check the browser version of Google Chrome. The specific corresponding rules are as follows:
Chromedriver supported Chrome version v2.40v66-68v2.39v66-68v2.38v65-67v2.37v64-66v2.36v63-65v2.35v62-64v2.34v61-63v2.33v60-62v2.32v59-61v2.31v58-60v2.30v58-60v2.29v56-58v2.28v55-57v2.27v54-56v2.26v53-55v2.25v53-55v2.24v52-54v2.23v51-53v2.22v49-52
After installation, just add the installation directory of the driver to the system Path. If you do not add it, you will get an error when running the program, indicating that you have not added it to the Path.
3. Start crawling.
The URL to be crawled today is https://www.upbit.com/service_center/notice, and then click the page flip button to find that url has not changed. If you view the requested address change through F12, you can find
Https://www.upbit.com/service_center/notice?id=1
The main change here is the later id,1,2,3. And so on.
Before you start with the selenium crawler, you need to define the following
# set options for Google browser
Opt = webdriver.ChromeOptions ()
# set the browser to a headless browser, that is, a browser that is not displayed when crawling first
Opt.set_headless ()
# browser is set to Google browser and set to the options set above
Browser = webdriver.Chrome (options=opt)
Save = []
Home = 'https://www.upbit.com/home'
# after the browser object is created, the URL can be sent to the browser through the get () method
# get URL information
Browser.get (home)
Time.sleep (15)
Then there is how to locate the elements of html. In selenium, the methods to locate elements are
Find_element_by_id (self, id_)
Find_element_by_name (self, name)
Find_element_by_class_name (self, name)
Find_element_by_tag_name (self, name)
Find_element_by_link_text (self, link_text)
Find_element_by_partial_link_text (self, link_text)
Find_element_by_xpath (self, xpath)
Find_element_by_css_selector (self, css_selector)
The id,name and so on can be obtained through the browser. The purpose of locating the element is to get the information we want, and then parse it and save it. We can get the text information of the element by calling the tex method.
Below, post the code of the whole crawler for your reference.
From selenium import webdriver
Import time
From tqdm import trange
From collections import OrderedDict
Import pandas as pd
Def stringpro (inputs):
Inputs = str (inputs)
Return inputs.strip () .replace ("\ n", ") .replace ("\ t ",") .lstrip () .rstrip ()
Opt = webdriver.ChromeOptions ()
Opt.set_headless ()
Browser = webdriver.Chrome (options=opt)
Save = []
Home = 'https://www.upbit.com/home'
Browser.get (home)
Time.sleep (15)
For page in trange (500):
Try:
Rows = OrderedDict ()
Url = "https://www.upbit.com/"\"
"service_center/notice?id= {}" .format (page)
Browser.get (url)
Content = browser.find_element_by_class_name (
Name='txtB'). Text
Title_class = browser.find_element_by_class_name (
Name='titB')
Title = title_class.find_element_by_tag_name (
'strong'). Text
Times_str = title_class.find_element_by_tag_name (
'span'). Text
Times = times_str.split ('|') [0] .split (") [1:]
Num = times_str.split ("|") [1] .split (") [1]
Rows ['title'] = title
Rows ['times'] = "" .join (times)
Rows ['num'] = num
Rows ['content'] = stringpro (content)
Save.append (rows)
Print ("{}, {}" .format (page, rows))
Except Exception as e:
Continue
Df = pd.DataFrame (save)
Df.to_csv (". / datasets/www_upbit_com.csv", index=None)
After reading the above, have you mastered how to use selenium to implement a dynamic crawler in Python? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.