In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "how to use BeautifulSoup selector to grab JD.com net commodity information". In daily operation, I believe many people have doubts about how to use BeautifulSoup selector to grab JD.com net commodity information. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for everyone to answer the doubt of "how to use BeautifulSoup selector to grab JD.com net commodity information". Next, please follow the editor to study!
Yesterday, the editor crawled the commodity information of JD.com by using Python regular expression, and the friends who had seen the code were basically unable to sit still, so many rules and long code, so sad and spicy, it was really unbearable. But friends, don't worry. Today, the editor uses beautiful soup to show you how to achieve accurate matching of JD.com 's commodity information.
HTML documents are actually organized by a set of tags composed of angle brackets, with each pair of angle brackets in the form of a tag, and there is an upper and lower relationship between the tags, forming a tag tree; therefore, it can be said that the Beautiful Soup library is a functional library for parsing, traversing, and maintaining the "tag tree".
JD.com official website dog food commodity details page
First of all, enter the JD.com network, enter the goods you want to query, and send a web page request to the server. Here the editor still uses the keyword "dog food" as the search object, and then gets the latter string of URLs: https://search.jd.com/Search?keyword=%E7%8B%97%E7%B2%AE&enc=utf-8, in which the parameter means the keyword we entered, in this case the parameter represents "dog food". For details, you can refer to Python God to teach you to deal with JD.com commodity information with regular expressions. So, as long as you enter the parameter keyword and encode it, you can get the target URL. Then request the web page, get the response, and then use the bs4 selector for the next step of data collection.
The source code of some of the web pages for commodity information on JD.com 's official website is as follows:
The source code of dog food information on JD.com 's official website
If we take a closer look at the source code, we can find that the target information we need is under the tag, so then we are like peeling onions, layer by layer to get the information we want.
Add the code directly, as shown in the following figure:
Request the web page and get the source code using the Python standard library
Usually the way of URL encoding is to convert the characters that need to be encoded into the form of% xx. Generally speaking, URL coding is based on UTF-8, and of course some are related to the browser platform. The quote method is provided in the urllib library of Python, which can encode the string of URL so that it can enter the corresponding web page.
Then use the beautiful soup to extract the target information, such as the name, link, picture and price of the item. The specific code is shown below:
Using beautiful soup to extract target information
In this example, there is one thing to note that the links to some images are null, so you need to take this into account when extracting. There are two solutions. One is that if you use img ['src'], an error will occur because the match does not match the corresponding value; but if you use get [' src'], it will not report an error, and if it does not match, it will automatically return None. In addition, you can also use try+except for exception handling. If there is no match, you can pass. Friends can test it by themselves. This code speed measurement process is also mentioned in the above figure. Using the get method to obtain information is a small skill in bs4. I hope all friends can learn and apply it.
The final effect is as follows:
The final effect of the output
At this point, the study on "how to use BeautifulSoup selector to grab JD.com net commodity information" is over. I hope to be able to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.