In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
How to use Python to climb Little Red Book, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Little Red Book
First of all, let's open the charles that we configured before.
Let's simply grab and pack the little red book Mini Program (note that this is Mini Program, not app).
The reason for not choosing app is that the App of Xiao Hongshu is a little difficult. Referring to some ideas on the Internet, we still choose Mini Program.
1. Analyze Mini Program by grabbing packages on charles.
Following my path, you can find that the data in the list has been caught by us.
But you think it's over?
No no no
Through this packet capture, we know that data can be obtained through this api interface.
But when we write all the crawlers, we will find that there are two difficult parameters in headers.
"authorization" and "x-sign"
These two things are changing all the time, and I don't know where to get them.
So
2. Use mitmproxy to grab packets
As a matter of fact, we have made it clear that we have grasped the package through charles.
Is to get the parameters "authorization" and "x-sign", and then make a get request to url.
The mitmproxy used here is actually similar to charles, and they are all bag grabbing tools.
But mitmproxy can be executed with Python.
This is much more comfortable.
Let me give you an example.
Def request (flow): print (flow.request.headers)
Provide us with such a method in mitmproxy that we can intercept url, cookies, host, method, port, scheme and other properties in request headers through the request object.
Isn't that what we want?
We intercept the parameters "authorization" and "x-sign" directly.
Then fill in the headers
The whole thing is done.
The above is our whole idea of crawling. Let's explain how to write the code.
In fact, the code is not difficult to write.
First, we have to intercept the stream searching for api so that we can get information about it.
If 'https://www.xiaohongshu.com/fe_api/burdock/weixin/v2/search/notes' in flow.request.url:
We judge whether there is a url to search api in the request of flow.
To determine the request we need to grab.
Authorization=re.findall ("authorization',.*?' (. *?)'\)", str (flow.request.headers)) [0] x_sign=re.findall ("xmursignsignsignsignsignsignwriting. (. *)'\)", str (flow.request.headers)) [0] url=flow.request.url
With the above code, we can get the three most critical parameters, followed by some normal parsing json.
Finally, we can get the data we want.
If you want to get a single piece of data, you can get the article id and grab it.
"https://www.xiaohongshu.com/discovery/item/">
The headers of this page needs to have cookie. You can get cookie if you visit any website at will. At present, it seems to be fixed.
Finally, you can put the data into the csv
Summary
In fact, the crawling of the little red book crawler is not particularly difficult, the key lies in the way of thinking and what method is used.
After reading the above, have you mastered the method of how to climb Little Red Book with Python? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.