In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "Python how to grab Taobao commodity information". In daily operation, I believe many people have doubts about Python how to grab Taobao commodity information. Xiaobian consulted all kinds of materials and sorted out simple and easy operation methods. I hope to help you answer the question of "Python how to grab Taobao commodity information"! Next, please follow the small series to learn together!
For web pages using asynchronous loading technology, sometimes it is difficult to design crawlers to crawl through reverse engineering. Therefore, if you want to obtain asynchronous loading data through python, you can often use Selenium to simulate browsers.
Selenium is a tool for Web application testing, it can run in the browser, simulate the user's real browsing web operation, that is to say, can achieve the browser loading pages, search keywords and click page turning operations, etc., so even if the use of asynchronous loading technology web pages, you can also simulate page turning to get different web pages, you can also get the data you want.
Selenium module can be found in the third-party library, use pycharm software, open PyCharm-> Projects->Python Interpreter->preferences in the upper left corner to add, the result is as follows: search and install.
Since Selenium does not have its own browser, it needs to be used in conjunction with the browser installed on our own computer. Here, we simulate crawling data through the commonly used Chrome browser. Specific operations that can be simulated mainly include the following operations: input box content filling, clicking buttons, screenshots, slides, etc. Therefore, when logging in to the website, we no longer need to construct forms or submit cookies to log in to the website, only need to simulate the input of "account" and "password" through python code to achieve login.
(1) The following are the most commonly used codes for simulated search and login
(2) Common codes for obtaining data after login
Note: The difference between xpath and xpath alone here is that here it adds.text at the end, and xpath alone adds/text() inside the path.
The above is the basic point of selenium, now let's start to practice the knife, how to grab Taobao commodity information.
Tools and languages: Selenium+Chrome+PyCharm+Python
Grab Platform: www.taobao.com
Grab ideas: Take the projector item I want to buy recently as an example
(1) Open Taobao, input projector, get product information page
(2) Open the source code of the webpage, check the position of the search box after entering the projector as shown in the figure below, determine the position, and prepare for the subsequent input of keywords.
(3) Sometimes after you click on search, Taobao forces you to log in to the webpage, you need to check the location of the input account and password again, and then the same operation enters the webpage. I will not repeat it here. The code is as follows.
(4) Next, we need to simulate the page turning operation. Similarly, we need to find the page turning position: Next page, as shown below.
(5) At this time, all the information on the page can be tried to view, for example, the number of people paying, the price, etc. can be tried to test.
(6) Finally, the data storage database can be used. For detailed warehousing codes, see the detailed code section.
The detailed codes are as follows:
from selenium import webdriverfrom lxml import etreeimport timeimport pymysql
db = pymysql.connect(host ='localhost ', user ='root', passwd ='database password', db ='database name', port=3306, charset ='utf8 ')print("database connection")cursor = db.cursor()cursor.execute("DROP TABLE IF EXISTS Learn_data.taobao_touyingyi_data")sql = ""CREATE TABLE IF not EXISTS Learn_data.taobao_touyingyi_data ( id int auto_increment primary key, price CHAR(100), sell CHAR(100), detail CHAR(100))DEFAULT CHARSET=utf8"""cursor.execute(sql)
driver=webdriver.Chrome()driver.maximize_window()def get_info(url,page): page=page+1 driver.get(url) driver.implicitly_wait(10) selector=etree.HTML(driver.page_source) infos=selector.xpath('//div[@class="items"]/div[@class="item J_MouserOnverReq "]') for info in infos: price = info.xpath('div[2]/div/div/strong/text()')[0] sell = info.xpath('div[2]/div/div[2]/text()')[0] detail = info.xpath('div[2]/div/div[2]/text()')[1]
print(price,sell,detail)
cursor.execute("insert into taobao_touyingyi_data (price,sell,detail)values(%s,%s,%s)",(str(price),str(sell),str(detail)))
if page
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.