How to crawl articles on Wechat official account by python 04/28 Update SLTechnology News&Howtos

How to crawl articles on Wechat official account by python

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Editor to share with you how python to achieve Wechat official account article crawling, I believe that most people do not understand, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to understand it!

The specific steps are as follows:

First, install the proxy server

Anyproxy is currently used. The feature of this software is that you can get the content of the https link.

1.1 sudo is required to run npm install-g anyproxy,mac system on the command line or terminal

1.2 this certificate is required to generate RootCA,https: run the command sudo anyproxy-- root (sudo may not be required for windows)

1.3 start anyproxy to run the command: sudo anyproxy-I; the parameter-I means to parse HTTPS

1.4 install the certificate, install the certificate in the mobile phone, open the http://localhost:8002/fetchCrtFile in the mobile browser, and get the rootCA.crt file.

Change localhost to the ip address of the computer running anyproxy. Note that the phone and the computer should be located on the same local area network.

Set proxy: in the mobile phone wifi connection management, set the proxy. The proxy server address is the ip address of the computer running anyproxy. The default port for proxy server is 8001

Now open Wechat, click on any official account history message or article, you can see the response code scrolling in the terminal.

1.6 the computer opens the browser address http://localhost:8002 to see the web interface of anyproxy. Click on a history message page from Wechat, and then look at the browser's web interface, which will scroll to the address where the history message page appears.

Second, use SPY to climb the list of articles

Because I want to save it to the database, I used the SPY crawler software developed by myself. If I don't need to save it to the database, I can use chrome.

2.1 the phone opens the list of historical articles on the official account, drops down to the bottom, and loads all the articles.

2.2 Open SPY, enter the address http://localhost:8002, and paste the code.

The general logic of the code is:

A. Get mp/profile_ext?action=home&__biz=MzA3ODkyNDg4OA=

The article list data obtained in the.

B, because the article list data is loaded asynchronously, it is necessary to manually load the drop-down article list in the mobile phone and load all the articles.

C, then, all the article data are extracted from SPY and saved to the database.

The code is as follows:

Var results = []

Var doms = document.querySelectorAll ('.record _ status_done')

Var pages = []

Doms.forEach (function (dom, I) {

Var isUrl = dom.children [4] .getAttribute ('title')

If (isUrl.match (/\ / mp\ / profile\ _ ext\? action\ = getmsg\ & / I) {

Pages.push (dom)

}

});

Var step = 0

StepByStep ()

Function stepByStep () {

Pages [step] .click ()

Var res

SetTimeout (function () {

If (document.querySelector ('.resBodyContent')) {

Res = JSON.parse (JSON.parse (document.querySelector ('.resBodyContent') .innerText) .general_msg_list) .list

}

If (res) {

Res.forEach (function (r, I) {

If (r.app_msg_ext_info) {

Var target = r.app_msg_ext_info

Console.log (target, step, 'num')

Var obj_save = {

Author: target.author

Content_url: target.content_url

Cover: target.cover

Digest: target.digest

Title: target.title

}

Spy.save (obj_save)

Results.push (obj_save)

Console.log (results.length, step)

}

});

} else {

Console.log (res, document.querySelector ('.resBodyContent'))

}

Step = step + 1

SetTimeout (function () {

Document.querySelector ('.escBtn') .click ()

}, 1000)

If (step < pages.length) {

SetTimeout (function () {

Window.stepByStep ()

}, 3000)

} else {

Spy.getResult (results)

}

}, 1000)

}

The above is all the content of this article "how to achieve Wechat official account crawling by python". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.