In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces Pythonr based on selenium how to achieve different mall commodity price difference analysis system related knowledge, the content is detailed and easy to understand, simple and fast operation, has a certain reference value, I believe that everyone read this Pythonr based on selenium how to achieve different mall commodity price difference analysis system article will have a harvest, let's take a look at it.
1. Preface
Selenium is originally an automated testing tool, which is often used in crawlers because of its excellent ability to parse page data and simulate user behavior, which makes the crawling process of crawlers easier and faster.
Compared with other types of programs, crawlers essentially provide processing logic for data, except that the data of crawlers comes from HTML code snippets.
How to accurately find the tags (or nodes, elements, components) of the data on the page has become the key to the crawler. Only when this step is established, the subsequent data extraction, cleaning and aggregation are possible.
Compared with the Beaufulsoup module, the bottom layer of selenium relies on a powerful browser engine, which is calm and determined in the ability of page parsing.
This paper will use selenium to automatically simulate the search behavior of users, obtain the price information of the same type of goods in different malls, and finally generate the price difference comparison table of goods in different malls.
2. Program design process 2.1 requirement analysis:
This procedure realizes that users do not open the browser, only need to enter a commodity keyword, can automatically find commodity prices in different shopping malls, and summarize some price differences information.
1. When the program is running, prompt the user to enter the keywords of the product that need to be searched.
This procedure is only to explore the wonders of selenium, feel its king style, without painstaking efforts on the program structure and interface.
2. Use selenium to simulate users to open the home pages of JD.com and SUNING.
Why choose JD.com and Su Ningyi instead of Taobao?
Because there is no need for login verification when these two websites use the search function, the code of this program can be simplified.
3. Use selenium to automatically enter commodity keywords in the text search box on the home page, and then automatically trigger the click event of the search button to enter the product list page.
4. Use selenium to analyze and crawl the commodity names and price data in the commodity list pages of different shopping malls.
5. After a simple analysis of the commodity price data, use the CSV module to save in the form of file.
This paper mainly analyzes the differences of the average price, the lowest price and the highest system of goods in different malls.
Of course, if necessary, more data analysis conclusions can be obtained with the help of other modules or analysis logic.
2.2 understanding selenium
Although this article does not delve into the details of selenium API, since you want to use it, you still need to cover all aspects of its use process.
1. Install:
Selenium is the third library of python. If you want to install it before use, there is no need to spend a lot of ink on the installation details.
Pip3 install selenium
In addition to installing the selenium module, you need to download a browser driver for it, or it won't work.
What is a browser driver? Why do you need it?
To explain this problem, we need to start with how selenium works.
2. The working principle of selenium is briefly introduced.
Beautiful soup uses a specific parser program to parse HTML pages. Selenium is more straightforward and directly relies on the parsing capabilities of browsers. By calling the browser's underlying API to complete the page data search, but also kneel down, not only crawl, but also send operation instructions to the browser to simulate user behavior.
Do you feel that the browser is a string puppet in the hands of selenium (playing with the browser in the palm of your hand). The job of selenium is to drive the browser, send instructions to the browser or receive feedback from the browser. In this process, the browser driver (webdriver) plays the role of uploading and downloading.
A typical component development model.
Obviously, due to the differences in the kernel of different browsers, the drivers must be different, so before downloading the drivers, please make sure that you use the type and version of the browser.
This article uses Google browser, and you need to download the webdriver driver corresponding to Google browser.
Go to the https://www.selenium.dev/downloads/ website, select the python language, and select the latest stable version.
Please select a driver that is consistent with the browsing version you are using.
After downloading, specify a directory to store the driver, and this article is stored in D:\ chromedriver\ chromedriver.exe. It can also be stored in the browser's installation directory.
2.3 functional function design
When the preparations are in place, start coding:
Import the modules needed by the program and define the variables needed by the program.
From selenium import webdriverfrom selenium.webdriver.chrome.service import Servicefrom selenium.webdriver.common.by import Byimport csvimport timeimport math# browser object chrome_browser = None# commodity keyword search_keyword = None# saves the commodity data searched in JD.com mall in the format {product name: price} jd_data = {} # saves the commodity data searched in SUNING mall in the format {product name: price} sn_data = {}
Webdriver: used to build browser objects, from the underlying design point of view, is the interface layer between the selenium and the browser. Selenium provides users with advanced application interfaces up and down through webdriver to communicate with browsers without barriers.
Service: the parameter type when webdriver builds the browser object.
By: * * encapsulates various ways to find page components. Selenium** provides many advanced methods for querying HTML page components, such as ID, style, style selector, XPATH, etc. These schemes are encapsulated by By.
Such as: find_element_by_class_name (), find_element_by_id (), find_element_by_ (), find_element_by_tag_name (), find_element_by_class_name (), find_element_by_xpath (), find_element_by_css_selector ()
The above methods have been marked as obsolete, please use the find_element () method, in conjunction with the By object switching method.
Csv: used to save the acquired data in csv format.
Time: time module, used to simulate network latency.
Math: mathematical module, auxiliary data analysis.
Initialization function: initialize browser objects and user input data
'' initial browser object''def init_data (): # driver storage path webdriver_path = r "D:\ chromedriver\ chromedriver.exe" service= Service (webdriver_path) # build browser object browser = webdriver.Chrome (service=service) # waiting for browser ready browser.implicitly_wait (10) return browser''' initial user input trade name pass Key''def input_search_key (): info = input ("Please enter commodity keywords:") return info
Inquire about JD.com 's commodity information. Query goods in JD.com Mall, in two steps, enter commodity keywords on the home page, click on the search, and then query the price information on the results page. The complete code is as follows:
'' enter JD.com Mall to inquire about merchandise information''def search_jd (): global jd_data products_names = [] products_prices = [] # JD.com Home jd_index_url = r "https://www.jd.com/" # Open JD.com 's first face try: if chrome_browser is None: raise Exception () else: # Open JD.com homepage chrome_browser.get (jd_index_url) # simulate network delay chrome_browser.implicitly_wait (10) # find the text input component search_input = chrome_browser.find_element (By.ID) "key") # enter the commodity keyword search_input.send_keys (search_keyword) chrome_browser.implicitly_wait (5) # in the text box to find the search button here use CSS selector scheme search_button = chrome_browser.find_element (By.CSS_SELECTOR "# search > div > div.form > button") # trigger button event search_button.click () chrome_browser.implicitly_wait (5) # get all open windows (there should be 2 when the button is clicked) windows = chrome_browser.window_handles # switch the newly opened window Use the negative index to find the last open window chrome_browser.switch_to.window (windows [- 1]) chrome_browser.implicitly_wait (5) # to get the commodity price product_price_divs = chrome_browser.find_elements (By.CLASS_NAME "p-price") for i in range (5): div = product_price_ divs [I] if len (div.text)! = 0: # Delete the dollar sign products_prices.append (float (div.text [1:])) # acquirer before the price Product name product_name_divs = chrome_browser.find_elements (By.CLASS_NAME "p-name") chrome_browser.implicitly_wait (10) for i in range (5): div = product_name_ divs [I] if len (div.text)! = 0: products_names.append (div.text) jd_data = dict (zip (products_names) Products_prices) jd_data ["average price"] = sum (products_prices) / len (products_prices) jd_data ["minimum price"] = min (products_prices) jd_data ["maximum price"] = max (products_prices) # write document csv_save using CSV module ("JD.com Mall") Jd_data) except Exception as e: print (e)
Chrome_browser: a browser-mapped object built by webdriver, through which selenium controls all operations to the browser.
This object has a find_element () core method that is used to find (locate) HTML page elements. When looking up, you can specify the lookup method through the By object (factory design pattern is used here). The values of By can be ID, CSS_SELECTOR, XPATH, CLASS_NAME, CSS_SELECTOR, TAG_NAME, LINK_TEX, PARTIAL_LINK_TEXT.
After opening JD.com 's home page, first locate the text search box and the search button.
Using the developer tool of the browser, check that the source code of the text box is a piece of input html. In order to accurately locate this component, generally try to analyze whether this component has unique attributes or eigenvalues. Id is a good choice. The html syntax specification id value should be a unique value.
Search_input = chrome_browser.find_element (By.ID, "key")
After you find a component, you can perform a series of actions on it, commonly used operations:
Text property: gets the text content of the component.
The send_keys () method: assign a value to this component.
The get_attribute () method: gets the property value of the component.
Here, send_keys is used to give the text component a user input commodity keyword.
Search_input.send_keys (search_keyword)
Then look for the search button component:
The button component is a piece of button html code with no overly significant property values, and to find this unique component, you can use XPATH or CSS selectors. Right-click the code snippet, find the copy command in the pop-up shortcut menu, and then find the CSS selector value for this component.
Search_button = chrome_browser.find_element (By.CSS_SELECTOR, "# search > div > div.form > button")
Call the click () method of the button component to simulate the user click action, which opens a new window and displays the searched product data in a list.
Search_button.click ()
After selenium receives the feedback from the browser after opening a new window, it can use the window_handles property to get all the windows that have been opened in the browser and store the action references for each window in the form of a list.
Windows = chrome_browser.window_handles
When locating a page element, there is a concept of the current window (the window that is currently available and is being operated on). At first you operated in the home window, but now you want to operate in the search results window, so switch to the new window that just opened. Use a negative index to get the window you just opened (the window you just opened must be the last window).
Chrome_browser.switch_to.window (windows [- 1])
Notice that when you switch to the search results window, you can search for the desired components in this window.
On this page, you only need to get the specific information of the top 5 goods, including the name and price of the goods. As for the specific data to be obtained, you can decide according to your own needs. This procedure only needs the price and name of the product, then check the page and find the corresponding html fragment.
The trade name information is stored in a div fragment, and the div has a class attribute with a value of p-name. You can get it using CSS-NAME, because all items use the same fragment template, so you can use the find_elements () method here.
Product_name_divs = chrome_browser.find_elements (By.CLASS_NAME, "p-name")
The find_elements method returns a list of components with the same CSS-NAME, writes code to iterate over each component, gets the data, and stores it in the list of trade names.
For i in range (5): div = product_name_ divs [I] if len (div.text)! = 0: products_names.append (div.text)
In the same way, price data are obtained. Then make the product name and price data into a dictionary, and make a simple analysis of the price data.
Jd_data = dict (zip (products_names, products_prices)) jd_data ["average price"] = sum (products_prices) / len (products_prices) jd_data ["minimum price"] = min (products_prices) jd_data ["maximum price"] = max (products_prices) csv_save ("JD.com Mall", jd_data)
Store the data crawled from JD.com Mall in CSV format.
Get the commodity data on SUNING. The same logic as getting data from JD.com (two pieces of code can be integrated into one function, and this article is written separately for ease of understanding). The difference between the two lies in the page structure, the page component that hosts the data, or the property settings of the component.
Def search_sn (): global sn_data # Save commodity name products_names = [] # Save commodity price products_prices = [] # SUNING homepage sn_index_url = r "https://www.suning.com/" try: if chrome_browser is None: raise Exception () else: # Open the first page Page chrome_browser.get (sn_index_url) # simulated network delay chrome_browser.implicitly_wait (10) # find text input component search_input = chrome_browser.find_element (By.ID "searchKeywords") # enter the commodity keyword search_input.send_keys (search_keyword) time.sleep (2) # in the text box to find the search button here using the CSS selector scheme search_button = chrome_browser.find_element (By.ID "searchSubmit") # trigger button event search_button.click () time.sleep (3) # get all open windows (there should be 2 when the button is clicked) windows = chrome_browser.window_handles # switch the newly opened window Use the negative index to find the last open window chrome_browser.switch_to.window (windows [- 1]) chrome_browser.implicitly_wait (20) # to get the label product_price_divs = chrome_browser.find_elements (By.CLASS_NAME) where the commodity price is located "def-price") # only view the first five commodity information for i in range (5): div = product_price_ DVS [I] # Delete the dollar sign if len (div.text) before the price! = 0: products_prices.append (div.text [ 1:]) chrome_browser.implicitly_wait (10) # get the trade name product_name_divs = chrome_browser.find_elements (By.CLASS_NAME "title-selling-point") for i in range (5): products_names.append (product_name_ divs [I] .text) # sn_data = dict (zip (products_names) Products_prices) sn_data ["average price"] = sum (products_prices) / len (products_prices) sn_data ["minimum price"] = min (products_prices) sn_data ["maximum price"] = max (products_prices) # write document csv_save using CSV module ("SUNING Mall") Sn_data) except Exception as e: print (e)
After obtaining the commodity data on SUNING, it is also stored in CSV format.
Store the final analysis results. This paper only analyzes the difference of the average price, the lowest price and the highest price of the same type of goods in the two malls.
Def price_result (): if len (jd_data)! = 0 and len (sn_data)! = 0: with open ("DVR / Commodity comparison Table .csv", "w" Newline='') as f: csv_writer = csv.writer (f) jd_name = list (jd_data.keys ()) jd_price = list (jd_data.values ()) sn_price = list (sn_data.values ()) csv_writer.writerow (["comparison item", "JD.com Price", "SUNING Price" "Price difference") for i in range (5, len (jd_price)): csv_writer.writerow ([jd_name [I], jd_price [I], sn_price [I], math.fabs (jd_ print [I]-sn_ print [I])
The average, minimum, maximum and absolute difference of commodity prices in the two malls are preserved.
Final test code
If _ _ name__ = ='_ main__': search_keyword = input_search_key () chrome_browser = init_data () search_jd () time.sleep (2) search_sn () price_result ()
Please enter the commodity keyword: Huawei meta 40
This is the end of the article on "how to realize the commodity price difference analysis system of different shopping malls based on Pythonr based on selenium". Thank you for reading! I believe that everyone has a certain understanding of the knowledge of "how to realize the commodity price difference analysis system of different shopping malls based on selenium". If you want to learn more knowledge, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.