In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article will explain in detail how to realize alarm investigation in the development of Wechat. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.
Summary description
Wechat public platform has opened the API for alarm. When the number of failed Wechat servers to push messages to developers reaches a predetermined threshold, the alarm message will be sent to the specified Wechat alarm group (setting method: public platform-> developer Center-> API alarm). Developers are invited to actively pay attention to the alarm, solve the failure immediately, and improve the service quality of Wechat official account.
In order to better troubleshoot problems according to the examples at the tail of alarm messages (openid and timestamp stamp are provided), developers need to add detailed logs containing key information at each level, such as access layer and logic layer, in order to quickly locate the problem.
At present, there are two types of alarm:
1. General alarm, all developers need to pay attention to.
two。 Alarm on official account third-party platform. Only developers who apply to become official account third-party platform on Wechat Open platform (open.weixin.qq.com) need to pay attention to this alarm.
The following is an example of a specific alarm and an explanation of troubleshooting guidelines.
Description of alarm content
Description of alarm content:
A) appid: official account appid
B) nickname: official account nickname
C) time: all alarms provide the time when the exception occurred for the first time. (for example, the time when the timeout occurred for the first time, the time when the response failed for the first time)
D) content: specific description of the error
E) number of times: the number of failures that occurred
F) error example: some information to help find the problem is indicated in the error sample. For example, the IP and push message type of the developer for the first time. If the response fails, the error sample also indicates the developer's response packet when the first response fails.
In general, through the IP, time and message type provided by the alarm, we can quickly locate the cause of the problem caused by the third party.
Alarm example 1: overtime alarm
Appid: wxxxxxx
Nickname: WxNickName
Time: 2014-12-01 20:12:00
Content: after the Wechat server pushed a message or event to the official account, the developer did not return within 5 seconds
Times: 1272 times in 5 minutes
Error example: [IP=203.205.140.29] [Event=UnSubscribe]
The alarm said: when the Wechat server pushed the unfollow event to the developer, the developer did not return the result within 5 seconds. It happened 1272 times in the five minutes of 2014-12-01 20:12:00-2014-12-01 20:17:00. The first timeout in these 5 minutes occurs at 20:12:00 from 2014 to 12 to 01, the developer's IP is 203.205.140.29, and the event type is to unfollow the event.
Alarm example 2: response failed
Appid: wxxxx
Nickname: WxNickName
Time: 2014-12-01 20:12:00
Content: after Wechat server pushes messages or events to the official account, the response it gets is illegal.
Times: 1320 times in 5 minutes
Error example: [Event=Click] [ip=58.248.9.218] [response_length=10] [response_content=Error 500:]
The alarm said: when the Wechat server pushes a custom menu click event to the developer, the developer's return result is illegal. It happened 1320 times in the five minutes of 2014-12-01 20:12:00-2014-12-01 20:17:00. The time of the first response failure in these 5 minutes is: 2014-12-01 20:12:00, the developer's IP is 58.248.9.218, the event type is a click menu event, and the content returned by the third party is 10 bytes in length and "Error 500:".
Alarm example 3: connection timeout
Appid: wxxxx
Nickname: WxNickName
Time: 2015-02-04 20:13:09
Content: a timeout occurred when the Wechat server connected to the official account developer server. The timeout period is 5 seconds.
Times: 7289 times in 5 minutes
Error example: [IP=180.150.190.135] [Msg=Text]
The alarm said: when the Wechat server pushed the text message sent by fans to the developer, it could not connect to the server address filled in by the developer. 7289 times occurred in 2015-02-04 20:13:09-2015-02-04 20:18:00 in 5 minutes. The first connection timeout occurred in these 5 minutes is: 2015-02-04 20:13:09. The developer's IP is: 180.150.190.135. The event type is a message pushed by the user.
Investigation methods of all kinds of alarm
1.DNS failed
The error is that the Wechat server failed to parse dns when pushing a message to the developer. If you encounter this alarm, please confirm that:
A) whether the url entered and the domain name are incorrect
B) whether the domain name has changed, such as expiration, update, etc.
If it is not the above two questions, please contact Wechat public platform.
2.Dns timeout
There will not be such an error at this time.
3. Connection timeout
The error is that the connection between Wechat server and developer server is not successful. The alarm message provides the time when the connection failed for the first time and the IP of the connection. In case of this alarm, please confirm that:
A) whether the IP is incorrect.
B) whether the IP machine is overloaded and has too many connections.
C) if the server is hosted by a third party, whether the host is malfunctioning.
D) whether the network operator is malfunctioning.
4. Request timeout
The Wechat server pushes messages or events to the developer server, but the developer does not return it within 5 seconds. When the request times out, the alarm message provides the time when the request timed out for the first time, the developer IP and the message type. Ask the developer to confirm:
A) whether the IP is incorrect
B) whether the IP receives a request for this message type given by the alarm message
C) whether the request takes too long to process
5. Response failed
If the developer fails to reply according to the reply message format in wiki, or if a network error occurs, the response will fail. The alarm message will provide the time when the request failed for the first time, the developer's IP, message type and the message content of the response. Please confirm:
A) whether the IP is incorrect
B) whether a network error occurred in the IP
C) whether the business processing logic does not reply to the message according to the wiki specification, or whether it has entered the exception logic.
6.MarkFail (automatic masking)
The Wechat backend will count the number of developers' failures in real time. When there are a large number of failures in pushing messages to developers, Wechat server will automatically block developers, stop pushing any messages within 1 minute, and send an alarm to WeChat group. This alarm is the highest level alarm. When the developer receives this alarm, please deal with the background failure as soon as possible and restore the service. In fact, before receiving this alarm, developers will inevitably receive alarms such as connection timeout, request timeout or response failure. Developers need to solve these problems immediately to avoid being blocked by Wechat server and seriously affect the official account service!
7. Push component_verify_ticket timeout & 8. Failed to push component_verify_ticket & 9. Push component message timeout & 10. Failed to push component message
The above four alarms are only received by third-party platform developers on official accounts, and other official account developers do not need to pay attention to them. As the official account third-party platform carries more official accounts, so the service quality of the official account third-party platform needs more stringent requirements and alarm, so these four special events are reported to the police separately. The specific way of finding problems is the same as that of 4pr 5, so I won't repeat it here. For the specific application and development implementation of the official account third-party platform, please go to Wechat Open platform (open.weixin.qq.com)
common problem
1. How to troubleshoot DNS failures?
1.Ping tests the domain name in the url configured on your MP to make sure you can get the correct IP. If you can't get it or make an error, please check the configuration on your domain name custodian management system.
two。 For example, 1 can get the correct IP and there is an alarm of DNS failure; please use DNS server 182.254.116.116 to test and verify. Linux: dig @ 182.254.116.116 domain name; windows modify the DNS server address in the network configuration, and then ping the domain name. If the IP you got is incorrect or not available, please contact Wechat team.
two。 How to solve the problem of connection timeout?
1. Check to see if there is a network environment problem.
(1) use the public platform API to obtain the IP,api.weixin.qq.com/cgi-bin/getcallbackip?access_token=ACCESS_TOKEN of Wechat callback server
(2) ping test on your service to check the network quality of your server to the Wechat calling server. If there is a network problem, please contact your server provider to solve it.
two。 Check the number of server connections in the access layer, load, nginx configuration, and the number of connections allowed. Check the nginx error log to see if there is a "Connection reset by peer" or "Connection timed out" error log, if there is an indication that the number of nginx connections is overloaded.
3. It is suggested to set up testing tools, check the heartbeat of the system, and monitor and alarm the system load, connection number, processing number and processing time in real time.
For nginx configuration, official documentation and a link to a brief configuration introduction are provided here, hoping to help: nginx.org/en/docs/, focuses on connection number configuration, log configuration, and so on. Some important configuration examples of nginx are as follows:
Worker_processes 16; / / CPU core error_log logs/error.log info; / / error log logworker _ rlimit_nofile 102400; / / Open maximum number of handles events {worker_connections 102400 / / maximum number of connections allowed} / / request logging, key field: total time of request_time- request Upstream_response_time backend processing between log_format main'$remote_addr-$remote_user [$time_local] "$request"''$status $body_bytes_sent "$http_referer"'"$http_user_agent"$http_x_forwarded_for"$host"$cookie_ssl_edition"'"$upstream_addr"$upstream" _ status "" $request_time "$upstream_response_time" Access_log logs/access.log main
3. How to solve the request timeout problem?
Each module needs to have a complete log, can find out the time-consuming information of each request in each module, with Wechat alarm to provide information, can easily locate which server is the problem. The common reasons are:
1) the machine load is too high and the time-consuming increases
2) exception in machine processing and message loss
3) Machine exception. For machine handling exception, it is recommended to fix bug as soon as possible. For machine exception, please block the faulty machine as soon as possible. The load on the machine is too high, so it is easy to provide a feasible solution. Plan one: optimize performance and expand capacity. Check the load (cpu, memory, io, network, see appendix for details) and adopt different optimization methods according to specific performance bottlenecks. Scheme 2: asynchronous processing. If the message pushed by the Wechat server is too late for real-time processing, you can store the message first and return the success to the Wechat server first, and the backend can process the message later. If you need to reply to the user message, you can call the customer service message API API and then reply to the user message.
4. How to solve the problem of access_token storage and usage?
There is often third-party feedback on the problem of service interruption caused by access_token. Public platform troubleshooting shows that most third parties are frantically refreshing access_token, making access_token invalidate beyond the frequency limit of the API. A relatively simple access_token storage and usage scheme is provided here.
1) the central control server calls Wechat api regularly (recommended for 1 hour), refreshes access_token, and stores the new access_token in mysql (or other storage)
2) other working servers get access_token from mysql (or other storage) every time Wechat api is called, and can cache it in memory for a period of time (1 minute is recommended).
The public platform will ensure that the old access_token will still be available for 5 minutes after the access_token is refreshed to ensure that there is no failure of a third party to call Wechat api when updating the access_token.
This is the end of the article on "how to realize alarm investigation in Wechat development". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.