In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
How to achieve monitoring and alarm through the Serverless architecture, in view of this problem, this article introduces the corresponding analysis and answers in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
In actual production, we often need to do some monitoring scripts to monitor the availability of web services or API services. The traditional method is to use website monitoring platforms (such as DNSPod monitoring, 360web service monitoring, and Aliyun monitoring, etc.). Their principle is that users set the service address to be monitored and the time threshold for monitoring, and the monitoring platform initiates requests to judge the availability of the website or service on a regular basis.
These methods are popular and versatile, but not all scenarios are suitable. For example, if our requirement is to monitor the status code of the website, the delay of different regions, and set a threshold by monitoring the data, and once the threshold is exceeded, it is difficult for most monitoring platforms to meet these requirements, so we need to customize and develop a monitoring tool.
An important application scenario of Serverless service is operation and maintenance, monitoring and alarm, so this article will deploy a website status monitoring script through the existing Serverless platform to monitor and alarm the availability of the target website.
Web service monitoring alarm
For Web service, we first design a simple process for monitoring alarm function:
In this process, we only monitor the status code of the website, that is, if the returned status is 200, we determine that the website can be used normally, otherwise we will give an alarm:
#-*-coding: utf8-*-import sslimport jsonimport smtplibimport urllib.requestfrom email.mime.text import MIMETextfrom email.header import Header ssl._create_default_https_context = ssl._create_unverified_context def sendEmail (content, to_user): sender = 'service@anycodes.cn' receivers = [to_user] mail_msg = content message = MIMEText (mail_msg,' html', 'utf-8') message [' From'] = Header ("website Monitoring" 'utf-8') message [' To'] = Header ("webmaster", 'utf-8') subject = "website monitoring alarm" message [' Subject'] = Header (subject, 'utf-8') try: smtpObj = smtplib.SMTP_SSL ("smtp.exmail.qq.com", 465) smtpObj.login (' email address', 'password') smtpObj.sendmail (sender, receivers) Message.as_string () except smtplib.SMTPException as e: print (e) def getStatusCode (url): return urllib.request.urlopen (url). Getcode () def main_handler (event, context): url = "http://www.anycodes.cn" if getStatusCode (url) = = 200: print (" your website% s is accessible! " % (url)) else: sendEmail ("your website% s is not accessible!" % (url), "recipient email address") return None
ServerlessFramework can be deployed, and time triggers can be added at deployment time:
MyWebMonitor: component: "@ serverless/tencent-scf" inputs: name: MyWebMonitor codeUri:. / code handler: index.main_handler runtime: Python3.6 region: ap-guangzhou description: website Monitoring memorySize: 64 timeout: 20 events:-timer: name: timer parameters: cronExpression:'* / 5 * 'enable: true
Here, timer represents a time trigger and cronExpression is an expression:
When creating a timing trigger, the user can customize when to fire using the form of a standard Cron expression. Timing flip-flop has now introduced the second trigger function, in order to be compatible with the old timing flip-flop, so there are two ways to write Cron expressions.
Cron expression Syntax 1 (recommended)
The Cron expression has seven required fields, separated by spaces.
Each field has a corresponding range of values:
Cron expression Syntax II (not recommended)
The Cron expression has five required fields, separated by spaces. Each field has a corresponding range of values:
Wildcard character
Matters needing attention
When the day and week fields in the Cron expression specify values at the same time, the relationship is or, that is, the conditions of both take effect.
Example
* / 5 * * indicates that 0 021 is triggered every 5 seconds * means that 0 15 10 * * MON-FRI * is triggered at 2: 00 a.m. on the 1st of each month, that 0 010 14 MON-FRI * is triggered at 10:15 every day from Monday to Friday, that it is triggered every day at 10:00, 2 p.m. Triggering 0 * / 309-17 * at 4 o'clock means triggering 0012 * * WED * every half hour from 9: 00 a.m. to 5: 00 p.m. Every Wednesday at 12:00
Therefore, our above code can be thought of as a trigger every 5 seconds, of course, you can also customize the interval between triggers according to the monitoring density of the website. When our website service is not available, you can receive an alarm:
This kind of website monitoring method is relatively simple, and the accuracy may be problematic. For the monitoring of websites or services, you can't simply look at the return value. It also depends on the link time, download time and the delay information of different regions and different operators visiting websites or services.
So, we need to make additional updates and optimizations to this code:
Through the website of online speed test, grab the packet to obtain the request characteristics of different operators in different regions.
Write the crawler program and write the online network speed test module
Integrated into the project just now
The following webmaster tools website in the domestic website speed measurement tool as an example, through the web page to consult the relevant information.
Encapsulate the website speed measurement tool, such as:
Through the analysis of the web page, the request characteristics are obtained, including Url,Form data, Headers and other relevant information. When the website uses different monitoring points to request the website, it is realized through the parameters of guid in Form data, such as the guid of some monitoring points:
Guangdong Foshan Telecom f403cdf2-27f8-4ccd-8f22-6f5a28a01309 Jiangsu Suqian Multiline 74cb6a5c-b044-49d0-abee-bf42beb6ae05 Jiangsu Changzhou Mobile 5074fb13-4ab9-4f0a-87d9-f8ae51cb81c5 Zhejiang Jiaxing Unicom ddfeba9f-a432-4b9a-b0a9-ef76e9499558
At this point, we can write basic crawler code for preliminary parsing of Response, and take 62a55a0e-387e-4d87-bf69-5e0c9dd6b983 Jiangsu Suqian [Telecom] as an example to write code:
Import urllib.requestimport urllib.parse url = "* address of a speed testing website *" form_data = {'guid':' 62a55a0e-387e-4d87-bf69-5e0c9dd6b983website, 'host':' anycodes.cn', 'ishost':' 1address, 'encode':' ECvBP9vjbuXRi0CVhnXAbufDNPDryYzO', 'checktype':' 1address,} headers = {'Host':' tool.chinaz.com', 'Origin':' * address of a speed testing website *' 'Referer':' * address of a speed testing website *', 'User-Agent':' Mozilla/5.0 (Macintosh Intel Mac OS X 10: 14: 3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36', 'XmurRequestedwishwishful: 'XMLHttpRequest'} print (urllib.request.urlopen (urllib.request.Request (url=url, data=urllib.parse.urlencode (form_data) .encode (' utf-8'), headers=headers)). Read (). Decode ("utf-8"))
Get the results:
({state: 1, msg:', result: {ip: '119.28.190.46, httpstate: 200, alltime:' 212, dnstime:'18, conntime: '116, downtime:' 78' Filesize:'-', downspeed: '4.72Singapore, ipaddress:' Singapore', headers: 'HTTP/1.1 OK br > Server:', pagehtml:''}})
In this result, we can extract some data, such as the basic data of Jiangsu Suqian [Telecom] visiting the target website:
Total time: alltime:'212' link time: conntime:'116' download time: downtime:'78'
At this point, we can modify the code to test more nodes:
Jiangsu Suqian [Telecom] Total time: 223 Link time: 121 download time: 81 Guangdong Foshan [Telecom] Total time: 44 Link time: 27 download time: 17 Huizhou [Telecom] Total time: 56 link time: 34 download time: 22 Guangdong Shenzhen [Telecom] Total time: 149 link time: 36 download time: 25 Zhejiang Lake State [Telecom] Total time: 3115 download time: 75 Liaoning Dalian [Telecom] Total time: 458Link time: 255download time: 170Jiangsu Taizhou [Telecom] Total time: 104Link time: 104.Anhui Hefei [Telecom] Total time: 3115 link time: 110download time: 73.
And make code changes to the index.py in the project:
#-*-coding: utf8-*-import sslimport jsonimport reimport socketimport smtplibimport urllib.requestfrom email.mime.text import MIMETextfrom email.header import Header socket.setdefaulttimeout ssl._create_default_https_context = ssl._create_unverified_context def getWebTime (): final_list = [] final_status = True total_list =''62a55a0e-387e-4d87-bf69-5e0c9dd6b983 Jiangsu Suqian [Telecom] f403cdf2-27f8-4ccd-8f22-6f5a28a01309 Guangdong Buddha Shan [Telecom] 5bea1430-f7c2-4146-88f4-17a7dc73a953 Henan Xinxiang [Multiline] 1f430ff0-eae9-413a-af2a-1c2a8986cff0 Henan Xinxiang [Multiline] ea551b59-2609-4ab4-89bc-14b2080f501a Henan Xinxiang [Multiline] 2805fa9f-05ea-46bc-8ac0-1769b782bf52 Heilongjiang Harbin [Unicom] 722e28ca-dd02-4ccd-a134-f9d4218505a5 Guangdong Shenzhen [Mobile] 8e7a403c-d998-4efa-b3d1-b67c0dfabc41 Guangdong Shenzhen Shenzhen [Mobile]''url = "* the address of a speed measurement website *" for eve in total_list.split ('\ n'): id_data Node_name = eve.strip () .split ("") form_data = {'guid': id_data,' host': 'anycodes.cn',' ishost': '1century,' encode': 'ECvBP9vjbuXRi0CVhnXAbufDNPDryYzO',' checktype':'1' } headers = {'Host':' * a speed measurement website address *', 'Origin':' * a speed measurement website address *', 'Referer':' * a speed measurement website address *', 'User-Agent':' Mozilla/5.0 (Macintosh) Intel Mac OS X 10: 14: 3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36', 'Xmuri Requestedwi Withboat: 'XMLHttpRequest'} try: result_data = urllib.request.urlopen (urllib.request.Request (url=url) Data=urllib.parse.urlencode (form_data) .encode ('utf-8'), headers=headers) .read () .decode ("utf-8") try: alltime = re.findall ("alltime:' (. *?)'" Result_data) [0] conntime = re.findall ("conntime:' (. *?)', result_data) [0] downtime = re.findall (" downtime:' (. *?)', result_data) [0] final_string = "% s\ t Total time:% s\ t Link time:% s\ t download time:% s"% (node_name, alltime, conntime) Downtime) except: final_string = "% s link exception!" % (node_name) final_status = False except: final_string = "% s link timed out!" % (node_name) final_status = False final_list.append (final_string) print (final_string) return (final_status,final_list) def sendEmail (content, to_user): sender = 'service@anycodes.cn' receivers = [to_user] mail_msg = content message = MIMEText (mail_msg,' html', 'utf-8') message [' From'] = Header ("website Monitoring" 'utf-8') message [' To'] = Header ("webmaster", 'utf-8') subject = "website monitoring alarm" message [' Subject'] = Header (subject, 'utf-8') try: smtpObj = smtplib.SMTP_SSL ("smtp.exmail.qq.com", 465) smtpObj.login (' service@anycodes.cn', 'password') smtpObj.sendmail (sender, receivers Message.as_string () except smtplib.SMTPException: pass def getStatusCode (url): return urllib.request.urlopen (url). Getcode () def main_handler (event, context): url = "http://www.anycodes.cn" final_status,final_list = getWebTime () if not final_status: sendEmail (" status of your site% s:
S "(url,"
".join (final_list))," service@52exe.cn ")
Since this article focuses on learning, we reduce the list of nodes and keep only a few. Through deployment, you can get the results:
The sensitivity of alarm and the frequency of monitoring can be adjusted according to their own needs in the actual production process.
Cloud service monitoring alarm
In the previous article, we monitored and alerted the status and health of the website. In the actual production operation and maintenance, we also need to monitor the services, such as monitoring the health of nodes when using Hadoop and Spark, monitoring multi-dimensional indicators such as API gateway and ETCD when using K8S, and monitoring data backlog, Topic, Consumer and other indicators when using Kafka.
However, the monitoring of these services can not be judged by simple URL and some states. The traditional practice of operation and maintenance is to set up a scheduled task on the additional machine to bypass the related services. In this paper, we use Serverless technology to monitor and alarm cloud products.
When using Kafka on the cloud, we usually have to look at the data backlog, because if the Consumer cluster goes down, or if the data backlog is caused by a sudden decline in consumption capacity, it is likely to have an unpredictable impact on the service. At this time, it is particularly important to monitor and alarm the data backlog of Kafka.
This article takes monitoring Tencent Cloud's Ckafka as an example, and combines several cloud products (including Cloud Monitoring, Ckafka, Cloud API and Cloud SMS, etc.) to achieve SMS alarm, email alarm and WeCom alarm.
First, you can design a simple flowchart:
Before we start the project, we need to prepare some basic modules:
Kafka data backlog acquisition module:
Def GetSignature (param): # Common parameter param ["SecretId"] = "param [" Timestamp "] = int (time.time ()) param [" Nonce "] = random.randint (1, sys.maxsize) param [" Region "] =" ap-guangzhou "# param [" SignatureMethod "] =" HmacSHA256 "# generate the string to be signed sign_str =" GETckafka.api.qcloud.com/v2/index.php? " Sign_str + = "&" .join ("% slots% s"% (k, param [k]) for k in sorted (param)) # generate signature secret_key = "" if sys.version_info [0] > 2: sign_str = bytes (sign_str, "utf-8") secret_key = bytes (secret_key, "utf-8") hashed = hmac.new (secret_key, sign_str) Hashlib.sha1) signature = binascii.b2a_base64 (hashed.digest ()) [:-1] if sys.version_info [0] > 2: signature = signature.decode () # signature string Encoding signature = urllib.parse.quote (signature) return signature def GetGroupOffsets (max_lag PhoneList): param = {} param ["Action"] = "GetGroupOffsets" param ["instanceId"] = "" param ["group"] = "" signature = GetSignature (param) # generate request address param ["Signature"] = signature url = "https://ckafka.api.qcloud.com/v2/index.php?Action=GetGroupOffsets&" url + =" & ".join ("% slots% s "% (k) Param [k]) for k in sorted (param) req_attr = urllib.request.urlopen (url) res_data = req_attr.read (). Decode ("utf-8") json_data = json.loads (res_data) for eve_topic in json_data ['data'] [' topicList']: temp_lag = 0 result_list = [] for eve_partition in eve_topic ["partitions"]: Lag = eve_partition ["lag"] temp_lag = temp_lag + lag if temp_lag > max_lag: result_list.append ({"topic": eve_topic ["topic"] Lag: lag}) print (result_list) if len (result_list) > 0: KafkaLagRobot (result_list) KafkaLagSMS (result_list,phoneList)
Connect to WeCom robot module:
Def KafkaLagRobot (content): url = "" data = {"msgtype": "markdown", "markdown": {"content": content,}} data = json.dumps (data) .encode ("utf-8") req_attr = urllib.request.Request (url Data) resp_attr = urllib.request.urlopen (req_attr) return_msg = resp_attr.read () .decode ("utf-8")
Connect to Tencent Cloud SMS module:
Def KafkaLagSMS (infor, phone_list): url = "" strMobile = phone_list strAppKey = "" strRand = str (random.randint (1, sys.maxsize)) strTime = int (time.time ()) strSign = "appkey=%s&random=%s&time=%s&mobile=%s"% (strAppKey, strRand, strTime, " ".join (strMobile) sig = hashlib.sha256 () sig.update (strSign.encode (" utf-8 ")) phone_dict = [] for eve_phone in phone_list: phone_dict.append ({" mobile ": eve_phone," nationcode ":" 86 "}) data = {" ext ":" "extend": "," params ": [infor,]," sig ": sig.hexdigest ()," sign ":" your sign "," tel ": phone_dict," time ": strTime Tpl_id: your template id} data= json.dumps (data). Encode ("utf-8") req_attr = urllib.request.Request (url=url, data=data) resp_attr = urllib.request.urlopen (req_attr) return_msg = resp_attr.read (). Decode ("utf-8")
Send mail alarm module:
Def sendEmail (content, to_user): sender = 'service@anycodes.cn' message = MIMEText (content,' html', 'utf-8') message [' From'] = Header ("Monitoring", 'utf-8') message [' To'] = Header ("webmaster", 'utf-8') message [' Subject'] = Header ("alarm" 'utf-8') try: smtpObj = smtplib.SMTP_SSL ("smtp.exmail.qq.com", 465) smtpObj.login (' service@anycodes.cn', 'password') smtpObj.sendmail (sender, [to_user], message.as_string ()) except smtplib.SMTPException as e: logging.debug (e)
Complete the module writing and deploy the project as in the above method. After the deployment is successful, the test shows that the features are available:
SMS alarm style:
WeCom alarm style:
Designing a website monitoring program is actually a very preliminary scenario. I hope you can combine more monitoring alarm functions with Serverless technology, such as monitoring your own MySQL pressure, monitoring the data indicators of existing servers, and so on. By monitoring and alarming these indicators, not only can managers discover the potential risks of the service in time. You can also automate the operation and maintenance of the project through some automation processes.
This is the answer to the question on how to monitor and alarm through the Serverless framework. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.