In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail how python crawls pictures of Baidu Tieba. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.
1, goal:
Crawl every post, master drawing, and save it.
As most of the pictures are sent by the main building, it will waste a lot of time to find all of them.
2, analysis
I choose to climb the post bar as the picture, you can choose the post you want to climb.
2.1, get the page
We write the code that crawls the page into a get_html () method, passing him the url parameter
The code is shown below:
Get normal, no problem.
We use chrome developer mode to analyze the connection of each post and use location to locate a post, which makes it easy for us to find the information we want quickly.
As shown in the figure:
2.2 use regular expressions to find the connections we want
After searching, we found that each post is under class= "col2_right j_threalist_li/_right".
We can make it a marker, and through it we can continue to look down. He has two class names, and we can choose the latter.
. * (. *?)
What is returned is an array, and in order to see that we return it in a dictionary, we can understand it as a return value with yield. On the basis of python, we will pass in the acquired page as a parameter to implement the get_url method.
As shown in the figure:
Let's print it and see what we got.
The result is shown in the figure:
Obviously, we need to splice it up to get the complete url, and when we click on an entry, we can see that the url goes like this: https://tieba.baidu.com/p/5768252315, we got the second half, that's easy, just splice it, and the result becomes:
After getting the link, we need to send the request again to get the content of each post, that is, call the get_html () method we wrote above.
2.3 find the picture link posted by the main post of each post
In the same way, open the developer mode, find the picture, find the flag bit, and write the rule. I won't go into details here, but the rule is:
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.