In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/03 Report--
As the job of operation and maintenance engineer becomes more and more delicious, more and more people choose it to start their own work career. So do you want to know more about the days of operation and maintenance engineers? How did they spend their day? Editor, I have sorted out some operation and maintenance calves' self-accounts from baidu post bar and Zhihu to see if there is any shadow of you?
Chen Zhanwei, engaged in operation and maintenance work
After I interviewed some students in operation and maintenance positions, I think a large number of operation and maintenance students in China live the most typical day that I would like to mention below.
My least favorite day:
As soon as he arrived at the office in the morning, he was interrupted by a colleague who ran over: he had a need. Other colleagues also raised their needs in IM, email and phone calls. I have no choice but to record these requirements on todo list silently.
As soon as he sat down, he was temporarily dragged to a meeting, and his colleague said how to help him.
I just got back and found that there was an interview in 10 minutes.
When I came back from the interview, I found that there was a planned meeting in 10 minutes.
When the meeting comes back, the functional testing of the product is completed, and we should assist in the online operation.
There is no standardization in the launch process, there is an error in the production environment and an emergency rollback.
Arrest the relevant personnel of this launch to discuss why such an accident occurred and how to avoid it in the future.
After coming back, get ready to go online again, this time the whole process of follow-up.
Finally, the normal launch is complete.
Oh, no. It's just that the function is online, and it turns out that there is still a big performance problem. Keep fighting the fire.
Adjust the parameters, tune the performance, and the server load finally goes down.
Look at the time. It's almost time to get off work.
Facing the growing todo list, his face was at a loss.
The above is a bit exaggerated, but all kinds of strange interruptions are really terrible. There are all kinds of interrupts and context switching. A lot of people are just buried in the interruption.
Personally, I think the most suitable working schedule for an operation and maintenance staff is:
20% of the time-dealing with urgent and important matters.
80% of the time-work on important and non-urgent matters.
It is easy to understand the importance of emergency. in fact, it is fire fighting work.
The important and non-urgent work is the work that can best reflect the value of operation and maintenance.
Monitoring system, this is a big topic. In addition to passively monitoring whether all kinds of services are normal or not, we also actively develop all kinds of systems to assist system analysis, and plan for the future of the whole system.
Performance tuning is one of my favorite aspects. I love finding performance bottlenecks and solving performance problems.
Developing tool-based systems is a way to improve the productivity of yourself and everyone on the team, especially tools that can quickly resolve those interruptions.
Study-this is the most important thing. Operation and maintenance involves a wide range of knowledge, continuous learning can smoothly and quickly solve the above problems, keep trying to experience in order to have enough experience to meet gods and Buddhas to kill Buddhas.
Only by doing a good job of important and non-urgent work every day can we make the operation and maintenance work more efficient, the whole system more stable, and the future development more predictable.
Shili, Taobao operation and maintenance engineer
A normal day, get up at 08:30, arrive at the company at 09:30 to start the day's work.
1) take a look at yesterday's timeout report to see which system has more timeouts.
2) check the machines with concentrated timeouts from the monitoring chart, check the basic monitoring of the machines, whether there is any hardware failure, whether there is any misoperation, and whether anyone can access the engine without notice. Find out the reason, discuss the solution and deadline with the developer, and reply to the email.
Chen Xiaosheng, operation and maintenance engineer of online game system
Fire fighting: sudden failure is inevitable.
Interruption: products, programs, QC anyone can find you, things may be strange, can not come one by one
Knowledge: there are a lot of things you need to know, including to "deal with" the interruptions above.
Development: various systems to assist operation and maintenance
Make up the leak: already BUG, predictability problem, defect
Planning: high predictability, overall vision
Yang Jian is good at repairing computers.
After working in operation and maintenance for several years, tell me how you feel.
When I open nagios in the morning, I see a series of alarms, such as the log space is less than 80%, a backup is not successful, a scheduled task fails to execute, the index establishment of a database fails, and so on. . It's all done manually at about 11:00.
Look at the log on duty yesterday, all kinds of online, all kinds of offline, all kinds of tinkering, nginx main configuration increased 14 lines, 8 configuration files; DNS configuration increased N lines; two hard drives to be replaced, a storage head to be replaced, has been offline in the computer room waiting for DELL to change. Call a colleague at IDC to confirm the mess.
.
Development and testing said that the performance of a project should be improved to 20W/ hours (in fact, the independent ip of this project is no more than 200 per day). The editor said that we should convert tens of thousands of articles to UID, write emails to the header of three departments, "do not add servers to the project, give the task of transferring uid to dba", and then be handed over to the office by vp to say-try our best to cooperate with other departments, not push back and forth. . Go back and write an email to your colleague on duty saying that you will add 2 servers to a certain project, for fear of being scolded, you can only transfer uid by yourself. . This is the end of the day.
GNUer, pit digger
While (1)
{
Usually get up at 9: 00 in the morning, go to the office at 09:30, usually eat roadside cakes, while reading the news and technical chapters subscribed on kindle. In the morning, do something you are interested in and write some scripts to improve your current work. Accept the consultation of development and testing, and help them with the problem of R & D environment. Things are more concentrated in the afternoon, while meeting while dealing with some online problems, basically they are more than three online at the same time, their brain is not hyperthreaded, but generally do N things at the same time.
When it's time to get off work, there's still a lot of work to do. I want to get off work early every day. Every time you're ready to leave, mail, IM and phone calls come again.
When I get home, surf the Internet, look at the documents, see OS, TCP/IP and other basic masterpieces to edify my temperament. Get to 12:00 and go to bed.
}
Li Zhenyu, operation and maintenance / Alibaba
Be invited to give a brief talk:
1. Deal with the alarm, check the cause of the alarm, solve it with the developer, and try to find ways to avoid recurrence, such as adding some regular cleaning scripts
two。 Dealing with publishing is basically automated, but there is always a time when the release is unsuccessful or needs to be rolled back, so you need to intervene manually, find out the reason, and discuss with the developer whether to undo or redo it.
3. Daily work that can be automated try to find ways to automate
4. Some projects related to operation and maintenance will be started, so sometimes part-time project development will be started.
5. Study, read the news, study materials and so on.
Aiirii wong
1. Wait for the alarm of the monitoring system
2, core system patrol inspection, backup system backup task completion patrol inspection
3. Wait for the user to report a barrier.
4, routine task schedule writing, routine task execution, such as new users, storage expansion, etc.
5. Project work, such as newly purchased storage, new monitoring system, new operating system, application system verification
6. Learn new knowledge, read technical documents or company notices
7, meeting with different suppliers (engineers) and internal staff
Cheng Keke, a siege master for operation and maintenance.
Take a look at the server log, our server is always down, and the reasons are endless, and then QA and PO will have a lot of questions to ask you to show you what's going on. There will be all kinds of meetings, and you need to sit there and listen. The biggest part of the work is the development of automated operation and maintenance tools. Catch up with the release of the version, especially fooling, all kinds of non-rest.
Gu Paul
Read newspapers, drink tea, check cell phones-because they say Linux won't crash.
Original link: http://www.magedu.com/71469.html
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.