In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains the methods and steps of using anyproxy to improve the collection efficiency of official account articles. Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn how to make good use of anyproxy to improve the efficiency of official account article collection.
The main influencing factors are as follows:
1. The network environment is not good
2. Wechat client crashes in mobile phone or simulator
3. Other network transmission errors
Because I pay more attention to the operating cost of the acquisition system, which includes hardware input, computing power input and labor effort. Therefore, the stability of operation must be improved. Therefore, if the collection is interrupted, it is bound to increase the cost of labor energy. So in view of this, I have done some advanced improvements to anyproxy, and with the help of some other tools to improve the operation efficiency. Here are the specific solutions:
I. Code upgrade
1) the white screen of Wechat browser
Solution: modify the file requestHandler.js, or in the same level directory of rule_default.js (mac system / usr/local/lib/node_modules/anyproxy/lib/;win system comment area netizens cnbattle provides C:\ Users\ Administrator\ AppData\ Roaming\ npm\ node_modules\ anyproxy\ lib)
Find proxyReq.on ("error", function (e) {this function in the code and modify the content
/ / userRes.end (); / / comment out the line userRes.end ('setTimeout (function () {_ window.location.reload ();}, 2000);'); / / insert this line
In this way, when an error occurs, a js; is returned to refresh the current page so that the program can continue.
2) replace all pictures to lighten the burden on browsers
First of all, you need to make a very small picture. What I do is 1x1 pixels, png transparent image; put it in any folder. Then modify the code of the file rule_default.js:
Add the following code to the location where there are many var at the beginning of the file
Var fs = require ("fs"), img = fs.readFileSync ("/ Library/WebServer/Documents/space.png"); / / the absolute path of the code is replaced with its own
Find the shouldUseLocalResponse: function (req,reqBody) {function in the following code, and insert the code inside the function:
If (/ mmbiz\ .qpic\ .cn / i.test (req.url)) {req.replaceLocalFile = true; return true;} else {return false;}
Continue to find the dealLocalResponse: function (req,reqBody,callback) {function in the following code, and insert the code inside the function:
If (req.replaceLocalFile) {callback (200,{ "content-type": "image/png"}, img);}
These three pieces of code will replace all the pictures in the official account with local pictures, reduce the pressure of network transmission and the memory occupied by the browser, and effectively improve the running efficiency.
3) prohibit mobile phones or simulators from accessing useless and error-causing web sites
Also find the code replaceRequestOption: function (req,option) {function in rule_default.js, and insert the code inside the function:
Var newOption = option;if (/ google | btrace/i.test (newOption.headers.host)) {/ / the rule in this can be replaced with a URL feature string that you do not want to visit. The btrace in this is a domain name of Tencent Video, which is found to be particularly easy to crash the browser through practice, so it is added to it. If you continue to add it, you can use | Segmentation. NewOption.hostname = "127.0.0.1"; / / this ip can also be replaced with other newOption.port = "80";} return newOption
This revision has also been mentioned in the previous article, so let's reintroduce it in detail here. It has many uses, different mobile phones and simulators may access some useless addresses to cause the device to slow down, through this code can block access.
Second, use pm2 to manage anyproxy processes
Pm2 is a process manager for Node applications with load balancing function.
When you want to take advantage of all the CPU on all servers with your stand-alone code and make sure the process is alive forever, a 0-second reload, PM2 is perfect. It fits well with the IaaS structure, but don't use it for PaaS scenarios (a solution for Paas will be developed later).
Main features:
Built-in load balancing (using Node cluster cluster module)
Background operation
0 second downtime overload, I understand that it probably means that there is no downtime during maintenance and upgrade.
Startup script with Ubuntu and CentOS
Stop unstable processes (avoid infinite loops)
Console detection
Provide HTTP API
Remote control and real-time interface API (Nodejs module, which allows interaction with PM2 process manager)
Tested Nodejs v0.11 v0.10 v0.8, compatible with CoffeeScript, based on Linux and MacOS.
Install pm2 first
Sudo npm install-g pm2
Run anyproxy in pm2 environment
Sudo pm2 start anyproxy-x-I
Now anyproxy is running in the pm2 environment.
Then there are several pm2 commands that can help manage and monitor anyproxy
/ / View the running log sudo pm2 logs anyproxy [--lines 10] / / turn off anyproxysudo pm2 delete anyproxy// restart anyproxysudo pm2 restart anyproxy// monitoring memory consumption sudo pm2 monit// monitoring running status sudo pm2 list
Special note: after pm2 is running, the terminal window can be closed.
The most important purpose of using pm2 to manage anyproxy processes is that after anyproxy exits the program because of an error, pm2 can automatically restart another anyproxy.
Cancel the sudo password and make pm2 boot automatically
The following is the method in the mac environment, windows should also have a similar method, if you know the netizens can send me a private message.
1) cancel the password of sudo first
Run the command:
Sudo visudo
Find the code:
% admin ALL = (ALL) ALL
Modified to:
% admin ALL = (ALL) NOPASSWD: ALL
In this way, the password of sudo is cancelled, and then pm2 can be added to boot.
2) set self-boot
Enter the command in the terminal:
Cdtouch autoexec.shvim autoexec.sh
Then enter edit mode, press the keyboard letter I to start editing, and paste the code:
#! / bin/sh sudo pm2 start anyproxy-x-isudo pm2 monit
When you are finished editing, press esc, and then type the command wq to save and exit edit mode.
Then execute the command:
Chmod 755 autoexec.sh
Such an executable file is established.
Then open the "system preferences" of the mac system, find "users and groups", select the current user on the left, and select the login item on the right; then click the + sign to find the root directory of the current user (you can press the shift+command+h shortcut key), select the autoexec.sh file, add it to the login entry, and you can boot.
After the above settings, the anyproxy system will be more stable than before, in fact, the main reason is the error of anyproxy caused by the instability of the simulator or mobile phone. After actual testing, anyproxy can now run for a long time without crashing. On the other hand, the Wechat client crashed after running for about 6 hours, turning a page in 2 seconds and collecting a total of about 10, 000 pages. If the amount of reading is not collected, it can be a historical message page of 10,000 official accounts.
The crash of the Wechat client is to quit the Wechat browser and stay on the official account page. So if you want to further improve automation, you can also use Touch Wizard as an automation script, launch Wechat browser regularly, and then click on the history message page. In this way, it should be possible to realize automatic collection for a long time.
At this point, I believe you have a deeper understanding of the "methods and steps of using anyproxy to improve the efficiency of official account article collection". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.