In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you how python climbed Wechat official account comments, I believe most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to understand it!
1. Obtain Cookie
Because you are climbing the comments in your Wechat account, you first need to log in to the official account backend to see the comments on the article. Login will involve cookie. Only by bringing cookie to the request can you get the correct data. So the first step is to get the cookie information.
Open the Chrome browser and you will see that the Cookie information will be automatically sent to Wechat when sending the request. We will copy this Cookie data and use Python to build a Cookie object for requests to use.
From http.cookies import SimpleCookie
Raw_cookie = "gsScrollPos-5517=;.. there is a lot of omission in the middle. Bizuin=2393828"
Cookie = SimpleCookie (raw_cookie)
Requests_cookies = dict ([(c, cookie [c] .value) for c in cookie])
R = requests.get (url, cookies=requests_cookies)
2. Construct URL
Open the comment list of any article, you will find that its URL structure is very clear, according to the name can basically judge the meaning of each parameter, the more important parameter here is begin, it is as the starting position of the page, in fact, it is fixed.
Url = "https://mp.weixin.qq.com/misc/appmsgcomment?"\"
"action=list_comment&"\
"mp_version=7&"\
"type=0&"\
"comment_id=2881104117&"\ # commented article ID
"begin=0&"\ # paging parameters
"count=10&"\ # returns 10 comments at a time
"token=1300595798&"\
"lang=zh_CN"
3. Grab data
After figuring out both Cookie and URL, we can grab the data from the simulation browser and clean it, and start thinking about parsing the web page with BeautifulSoup, but it fails.
Find out the reason, save the crawled page as a html file, look for the keywords in the message in the html source code file, and find that the comments are not in the div tag, but in a block of JS code that looks like JSON data, which seems to be displayed locally after rendering with JavaScript.
So switch to regular expressions, intercept the required data, and finally store the database, almost 10 code done.
Def main ():
# General messages, total number of selected messages
Normal_count, selected_count = 141,100
# regular message url
Normal_url = "https://mp.weixin.qq.com/misc/appmsgcomment?"
Dd = dict ([(normal_count, selected_url)])
For k, v in dd.items ():
Crawler (k, v)
Def crawler (count, url):
For i in range (0, count, 10):
R = requests.get (url.format (begin=i), cookies=requests_cookies)
Match = re.search (r'"comment": (\ [\ {. *\}\])', r.text, re.S)
If match:
Data = json.loads (match.group (1), encoding= "utf-8")
Conn.insert_many (data)
Time.sleep (1)
This is the article Python advanced book recommendation, as well as the comments in the delivery of the book:
The above is all the content of this article entitled "how python crawled Wechat official account comments". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.