Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Operation and maintenance must read: the six principles of avoiding failure and refusing to take the blame!

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

As we all know, failure is the everlasting pain of operation and maintenance personnel! I believe there is one item in the KPI of every operation and maintenance staff: usability.

High availability means no failure. Companies have different criteria for usability and fault rating, but the way to avoid failure is the same.

How should the operation and maintenance personnel avoid failure? Here is a brief list of the following:

01. Changes should be rolled back and tested in the same environment

All changes must be rolled back and tested in the same environment. Things that have not been done will always give you a blow in unexpected places. Years of operation and maintenance experience tells us that all the changes that have not been made are most likely to make mistakes.

So we need to give the change the possibility of rolling back, and consider rolling back to the original state if each step may go wrong. Good operators stay away from operations that do not consider rollback. In a sense, operation and maintenance is a discipline of experience and a discipline of trial and error.

02. Be careful with destructive operations

What are the columns of destructive operations? For the database, it is almost impossible to think about how to roll back all the data after DROP Table,Drop database,truncate table,delete all data; has been done. Even if it is rolled back, the cost will be very high. It is very easy for you to execute such a statement, but it is very difficult to roll back and recover the data. These operations need to be done more carefully.

03. Set up the command prompt

Let you know which database you are working on and which directory you are in. If you open multiple tabs, if the content on the title of each tab is the same, we may operate on the wrong tab if we cut it back and forth. After setting this, the probability of this problem will be much lower.

Back up and verify the validity of the backup

People always make mistakes, and machines may suddenly collapse one day. What should we do? We need to prepare a backup. With backup, can you rest easy? I still can't. You need to verify the validity of the backup. No backup can guarantee that the data it backs up can be 100% restored to the correct data. Therefore, backup is not just backup, it also includes backup verification, if it can not recover the correct data, it is just a waste of space.

05. Handover and vacation are the most tolerant of fault changes, please be careful

This is from experience. When we summarize the situation of the failure, we find that when there is a change in the company department, the frequency of work handover and failure will be more than 50% more than normal. Some people say that this is because machines or applications are emotional operators who are reluctant to leave.

Let's not talk about feelings, let's make a simple rational analysis. Companies or departments will inevitably make some adjustments, and change is the only constant thing in the world. While the operation and maintenance personnel are front-line people who do things, the adjustment of departments or the change of leaders may lead to a different focus of work, the way of doing things and the standards of evaluation have changed, and it is inevitable that there will be some ill-considered places in the process of adaptation. failure is also reasonable.

Therefore, the operation and maintenance department and the operation and maintenance personnel need to be as calm as possible about the change; to take over the work of others, it is necessary to confirm the change plan again and again. Asking for advice is not necessarily a sign of incompetence; it is best to prepare a document indicating how to do it and who to contact before taking a vacation. To take over the work when others are on holiday, "procrastinate if you can" really need to be carried out: you must take great pains to confirm the details of the operation with the original operator.

06. Set up the alarm and get the error information in time

Set up performance monitoring, understand history, obtain trends, and predict the future. The highest state of operation and maintenance is not the failure, Taishan collapse in front without shock, but no fault, so that the fault disappeared in the bud. Please applaud those who are unknown and think about what hidden dangers still exist in our system, how to solve them, and how to detect them as soon as possible. They are the cutest people. And the tools they rely on for survival are alarm and monitoring. After so many years of development of Oracle, awr and related performance parameters are relatively comprehensive; MySQL has now caught up with more and more supporting tools.

Call the police to let you know what is wrong with the system in time. Performance monitoring allows you to know the historical performance information of the system. Analyze the various phenomena when the fault occurs, confirm the real cause of the fault, understand the changing trend, find the signs of the fault, optimize and adjust as soon as possible. In fact, alarm and performance monitoring are not completely independent, and many performance monitoring items can also be alerted.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report