In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how to use Python to analyze lipstick data to select Valentine's Day gifts. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Prepare for work driver installation
Before implementing the case, we have to install a Google driver, because we use selenium to manipulate the Google driver, and then manipulate the browser to operate automatically, simulating human behavior to operate the browser.
Take Google browser as an example, open the browser and take a look at our own version, then download the version that is the same or closest to our browser version, download it, and put the extracted plug-in into our python environment, or put it with the code.
Module use and introduction
Selenium pip install selenium, if you enter selenium directly, it is the latest installation by default, and the version number followed by selenium is the corresponding version of the installation.
Csv built-in module, which does not need to be installed, is used to save data to Excel tables
Time built-in module, does not need to install, time module, mainly used for delay waiting
Process analysis
When we visit a website, we have to enter a URL, so that's what the code says.
First import the module
From selenium import webdriver
Do not name the file name or package name selenium, which will cause it to fail to import. Webdriver can be thought of as the driver of the browser, to drive the browser must use webdriver, support a variety of browsers.
Instantiate browser objects. I use Google here. I suggest you use Google for convenience.
Driver = webdriver.Chrome ()
We use get to visit a web site and open it automatically.
Driver.get ('https://www.jd.com/')
Run it.
After opening the URL, take buying lipstick as an example.
First of all, we have to search for the product information through the keywords you want to buy, and use the search results to get the information.
Then we also need to write an input, right-click in the blank, and select check. Select the element elements panel
Mouse click on the left arrow button, to click on the search box, it will be directly located to the search tag. Right-click on the tab, select copy, and select copy selector. If you are xpath, copy its xpath. And then write down what we want to search for.
Driver.find_element_by_css_selector ('# key'). Send_keys ('lipstick')
When it runs again, it will automatically open a browser and go to the target URL to search for lipstick.
In the same way, find the search button and click.
Driver.find_element_by_css_selector ('.button') .click ()
Run again will automatically click on the search, the page search comes out, so we normally browse the web page is to drop down the web page, right? let's just let it drop down automatically. Import time module first
Import time
Perform the operation of scrolling the page
Def drop_down (): "" perform page scrolling operation "" # javascript for x in range (1,12,2): # for loop drop down times, take 1 3 5 7 9 11, the page height will also change during your continuous drop down process. Time.sleep (1) j = x / 9 # 1 document.documentElement.scrollTop 9 3 driver.execute_script 9 5 + 9 9 # document.documentElement.scrollTop specify the location of the scroll bar # document.documentElement.scrollHeight get the maximum height of the browser page js = 'document.documentElement.scrollTop = document.documentElement.scrollHeight *% f'% j driver.execute_script (js) # execute our JS code
Write the loop, and then call it.
Drop_down ()
Let's give it another delay.
Driver.implicitly_wait (10)
This is an implicit wait, waiting for the web page delay, if the network is not good, the load is very slow.
Implicit wait does not have to wait for ten seconds. After your network is loaded in ten seconds, it will load at any time. If it is not loaded out in ten seconds, it will be forced to load.
There is another kind of waiting, you write a few seconds to wait a few seconds, relatively less humane.
Time.sleep (10)
After loading the data, we need to find the source of commodity data.
Price / title / evaluation / cover / store, etc.
Or right mouse button click to check, in element, click the small arrow to click on the data you want to view.
You can see that all the contents of the li tag are obtained in the li tag, and it is the same, directly copy. Here is the first one in the lower left corner, but we want to get all the tags, so you can delete the one after li in the box on the left. If not, you can see that here are 60 commodity data, 60 on one page. So let's copy the rest and receive it with lis.
Lis = driver.find_elements_by_css_selector ('# J_goodsList ul li')
Because we are getting all the tag data, we have one more s than before.
Print it.
Print (lis)
Return the element objects in the data list [] list through lis
Go through it and take out all the elements.
For li in lis: title = li.find_element_by_css_selector ('.p-name em') .text.replace ('\ n' '') # get the label text data price = li.find_element_by_css_selector ('.p-price strong i'). Text # price commit = li.find_element_by_css_selector (' .p-commit strong a'). Text # comments shop_name = li.find_element_by_css_selector ('.J _ im_icon a'). Text # store Name href = li.find_element_by_css_selector ('.p-img a'). Get_attribute ('href') # Product details page icons = li.find_elements_by_css_selector (' .p-icons i') icon =' '.join ([i.text for i in icons]) # list derivation', '.join concatenates the elements in the list into a string data dit = {' commodity title': title, 'commodity price': price, 'comments': commit, 'shop name': shop_name, 'label': icon, 'commodity details page': href } csv_writer.writerow (dit) print (title, price, commit, href, icon, sep=' |')
Search function
Key_world = input ('Please enter the product data you want to get:')
To obtain the data, save CSV after obtaining it.
F = open (f 'JD.com {key_world} Commodity data .csv', mode='a', encoding='utf-8', newline='') csv_writer = csv.DictWriter (f, fieldnames= ['Commodity title', 'Commodity Price', 'comment quantity', 'Store name', 'label', 'Commodity details Page',]) csv_writer.writeheader ()
And then write an automatic page turn.
For page in range (1 11): print (f' is crawling the data content of page {page}') time.sleep (1) drop_down () get_shop_info () # download data driver.find_element_by_css_selector ('.pn-next') .click () # Click the next page of complete code from selenium import webdriverimport timeimport csv def drop_down (): "" perform page scrolling operation "for x in range (1) twelve, 2): time.sleep (1) j = x / 9 # 1 driver.execute_script 9 3, 9 5 document.documentElement.scrollTop, specify the location of the scroll bar # document.documentElement.scrollHeight get the maximum height of the browser page js = 'document.documentElement.scrollTop = document.documentElement.scrollHeight *% f'% j driver.execute_script (js) # execute JS code key_world = input ('Please enter the commodity data you want to obtain:') f = open (f 'JD.com {key_world} commodity data .csv') Mode='a', encoding='utf-8', newline='') csv_writer = csv.DictWriter (f, fieldnames= ['Product title', 'Commodity Price', 'comment quantity', 'Store name', 'label', 'Commodity details Page' ]) csv_writer.writeheader () # instantiate a browser object driver = webdriver.Chrome () driver.get ('https://www.jd.com/') # access a web address to open a browser URL # find # key a tag data in element (element panel) through css syntax enter a keyword lipstick driver.find_element_by_css_selector (' # key ') .send_keys (key_world) # find the input box label driver.find_element_by_css_selector (' .button'). Click () # find the search button and click # time.sleep (10) # wait # driver.implicitly_wait (10) # implicitly wait for def get_shop_info (): # the first step is to get all the li tag content driver.implicitly_wait (10) lis = driver.find_elements_by_css_selector ('# J_goodsList ul li') # get multiple tags # element object in the return data list [] list # print (len (lis)) for li in lis: title = li.find_element_by_css_selector ('.p-name em') .text.replace ('\ n' '') # get the label text data price = li.find_element_by_css_selector ('.p-price strong i'). Text # price commit = li.find_element_by_css_selector (' .p-commit strong a'). Text # comments shop_name = li.find_element_by_css_selector ('.J _ im_icon a') Text # shop name href = li.find_element_by_css_selector ('.p-img a'). Get_attribute (' href') # Product details page icons = li.find_elements_by_css_selector ('.p-icons i') icon =' '.join ([i.text for i in icons]) # list derivation', '.join concatenates the elements in the list into a string data dit = {' commodity title': title, 'commodity price': price, 'comments': commit, 'shop name': shop_name, 'label': icon 'Product details Page': href,} csv_writer.writerow (dit) print (title, price, commit, href, icon, sep=' |') # print (href) for page in range (1 11): print (f' is crawling the data content of page {page}') time.sleep (1) drop_down () get_shop_info () # download data driver.find_element_by_css_selector ('.pn-next') .click () # Click the next page driver.quit () # to close the browser effect display
Thank you for reading! This is the end of the article on "how to use Python to analyze lipstick data to select Valentine's Day gifts". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.