What aspects should Linux operation and maintenance engineers pay attention to? 07/19 Update SLTechnology News&Howtos

What aspects should Linux operation and maintenance engineers pay attention to?

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the Linux operation and maintenance engineers should pay attention to which aspects of the relevant knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe that after reading this Linux operation and maintenance engineers should pay attention to which aspects of the article will have a harvest, let's take a look.

I. online operating specifications

1. Test use

At the beginning, learning the use of Linux, from basic to service to cluster, is done in virtual machines. Although the teacher told us that there is no difference between the real machine and the real machine, the desire for the real environment is growing day by day, but the various snapshots of the virtual machine let us develop all kinds of cheap habits, so that when we get the server operation rights, we can't wait to try it.

I remember that on the first day of work, the boss gave me the root password. Because I could only use putty, I wanted to use xshell, so I quietly logged in to the server and tried to log in with xshell+ key, because there was no test and no ssh connection was left. After restarting the sshd server, I was blocked out of the server. Fortunately, I backed up the sshd_config file at that time, and later asked the computer room staff to cp it. Fortunately, this is a small company. Or you'll just get fucked... I'm glad I had better luck in those days.

The second example is about file synchronization. We all know that rsync synchronization is very fast, but he deletes files much faster than rm-rf. There is a command in rsync that synchronizes a file with a directory (if the first directory is empty, then the result can be imagined), and the source directory (with data) will be deleted. I wrote the directory upside down because of misoperation and lack of testing. The point is that there is no backup. The production environment data was deleted.

No backup, you think about the consequences, its importance is self-evident.

2. Confirm repeatedly before Enter

With regard to the error of rm-rf / var, I believe that people with fast hands, or when the Internet speed is relatively slow, are quite likely to occur.

When you find that after the implementation, your heart is at least half cold.

You may say, there is nothing wrong with pressing it so many times, don't be afraid. I just want to say that when it happens once, you will understand. Don't think that those operation and maintenance accidents are all on others. If you don't pay attention, you will be next.

3. Do not operate by many people

In the last company I worked for, the operation and maintenance management was quite chaotic. To cite the most typical example, the operation and maintenance staff who left several terms had server root passwords.

Usually, when our operation and maintenance staff receive a task, they will have a simple check. If you can't solve it, ask others for help, but when the problem is in trouble, the customer service supervisor, the network manager, and your boss debug a server together. When you have all kinds of Baidu and all kinds of comparisons, you find that your server configuration file is different from the last time you modified it, and then change it back, and then Google, excitedly discover the problem and solve it, but others tell you that he has also solved it. Different parameters are modified. This, I really do not know which is the real cause of the problem, of course, this is still good, the problem is solved, everyone is happy, but you have encountered the file you just modified, the test is invalid, and then modify and find that the file has been modified? Really annoyed, do not operate by many people.

4. Backup before operation

Make it a habit to back up your data when you want to modify it, such as the .conf configuration file.

In addition, when modifying the configuration file, it is recommended to comment on the original options, and then copy and modify

Furthermore, if there is a database backup in the first example, will the misoperation of rsync be all right soon?

So it is not overnight to lose the database, so it is not so miserable to back up a random one.

Second, the data involved

1. Be careful with rm-rf

There are many examples on the Internet, all kinds of rm-rf /, all kinds of deletion of main database, all kinds of operation and maintenance accidents.

A small mistake can cause a lot of loss. If you really need to delete it, be careful.

two。 Backup is more than anything else

Originally, there are all kinds of backups above, but I want to divide it into data classes and emphasize again that backup is very important.

I remember my teacher said that you can't be too cautious when it comes to data.

I work for a company that does third-party payment websites and online loan platforms. Third-party payment is fully backed up every two hours, and the online loan platform is backed up every 20 minutes.

I won't say any more. Let's make your own decisions.

3. Stability is above all else.

In fact, not only the data, in the entire server environment, stability is greater than anything else, not the fastest, but the most stable, the availability.

So without testing, do not use new software on the server, such as nginx+php-fpm, php in the production environment.

Just restart it, or change the apache.

4. Secrecy is above all else.

Now all kinds of porn photo doors are flying all over the sky, and all kinds of router back doors, so it is impossible not to keep data secret when it comes to data.

Third, involving safety

1. Ssh

Change the default port (of course, if the major wants to hack you, the scan will come out)

Disable root login

Use normal user + key authentication + sudo rules + ip address + user limit

Use hostdeny-like explosion-proof cracking software (more than several attempts to block directly)

Filter users of login in / etc/passwd

two。 Firewalls

Firewall production environment must be open, and follow the minimum principle, drop all, and then release the required service ports.

3. Fine permissions and control granularity

Services that can be started by ordinary users are determined not to use root, the permissions of various services are controlled to a minimum, and the control granularity should be fine.

4. Intrusion detection and log monitoring

Use third-party software to detect changes in system key files and various service configuration files at all times. For example, / etc/passwd,/etc/my.cnf,/etc/httpd/con/httpd.con, etc.

Use a centralized log monitoring system to monitor / var/log/secure,/etc/log/message,ftp upload and download files and other alarm error logs.

In addition, for port scanning, you can also use some third-party software to find that it is scanned and directly pulled into the host.deny. This information is very helpful for troubleshooting after the system has been invaded. It has been said that the cost of a company's investment in security is proportional to the cost lost by security attacks, and security is a big topic.

It is also a very basic work, if the foundation is done well, the security of the system can be greatly improved, and the rest is done by security experts.

IV. Daily monitoring

1. System operation monitoring

Many people start from monitoring when they step into operation and maintenance. Large companies generally have professional 24-hour monitoring of operation and maintenance. System operation monitoring generally includes hardware occupancy, common, memory, hard disk, cpu, network card, os including login monitoring, system key file monitoring.

Regular monitoring can predict the probability of hardware damage and bring very practical functions to tuning.

two。 Service operation monitoring

Service monitoring is generally a variety of applications, web,db,lvs, etc., which generally monitor some indicators, when there is a performance bottleneck in the system can be quickly found and solved.

3. Log monitoring

Log monitoring here is similar to secure log monitoring, but it is generally hardware, os, and application error and alarm messages.

Monitoring is really useless when the system is running steadily, but if something goes wrong and you don't monitor it, it will be very passive.

Fifth, performance tuning

1. In-depth understanding of the operating mechanism

In fact, according to more than a year of operation and maintenance experience, talking about tuning is basically empty talk, but I just want to briefly sum up, if I have a more in-depth understanding, I will update.

Before optimizing the software, for example, to deeply understand the running mechanism of a software, such as nginx and apache, we all say that nginx is fast, then we must know why nginx is fast, what principle it uses, to process requests more than apache, and to be able to speak with others in easy-to-understand words, and to be able to understand the source code when necessary, otherwise all documents with parameters as tuning objects are nonsense.

two。 Tuning framework and priority

If you are familiar with the underlying running mechanism, you must have the framework and sequence of tuning. For example, when there is a bottleneck in the database, many people directly change the configuration file of the database. My suggestion is to first analyze according to the bottleneck, check the log, write down the tuning direction, and then start, and database server tuning should be the last step, the first should be the hardware and operating system. Today's database servers are released after various tests.

It is suitable for all operating systems and should not start with him first.

3. Adjust only one parameter at a time

Adjust only one parameter at a time, this compared to everyone knows, adjust more, you will be confused.

4. Benchmark test

Benchmark testing is necessary to determine whether tuning is useful, and to test the stability and performance of a new version of the software, which involves many factors.

Whether the test is close to the real needs of the business depends on the tester's experience. For related information, you can refer to the third edition of "High performance mysql".

My teacher once said that there are no universal parameters, and any parameter change and any tuning must be in line with the business scenario. So stop tuning Google, it won't have a long-term effect on your promotion and the improvement of your business environment.

VI. Mentality of operation and maintenance

1. Control the state of mind

A lot of rm-rf / data are at the peak of irritability a few minutes before leaving work, so aren't you going to control your mindset?

It has been said that you have to go to work when you are upset, but you can try to avoid dealing with critical data environments when you are upset.

The more pressure there is, the more calm you should be, or you will lose more.

Most people have the experience of rm-rf / data/mysql, and you can imagine the feeling after deletion, but what's the use if you don't have a backup? in general, you should calmly think about the worst. For mysql, if you delete the physical files, some of the tables will still be stored in memory, so disconnect the business, but do not close the mysql database, which is very helpful for recovery. And use dd to copy the hard drive, and then you restore it.

Of course, most of the time you have to go to the data recovery company.

Imagine that the data is deleted, you do various operations, close the database, and then repair, it is not only possible to overwrite the file, but also can not find the table in memory.

two。 Be responsible for the data

The production environment is not a game, the database is not a game, we must be responsible for the data. The consequences of not backing up are very serious.

3. go into the whys and wherefores of it

Many operation and maintenance personnel are busy, so they will no longer take care of them when they encounter problems. I remember that last year, a customer's website could not be opened. After reporting an error in php code, I found that it was damaged by session and whos_online. The former operation and maintenance staff repaired it through repair. I also repaired it in this way, but after a few hours, it appeared again. After repeated three or four times, I went to Google database table for inexplicable causes of damage: one is myisam's bug, the second is mysqlbug, and the third is that mysql was kill in the process of writing. Finally, it was found that there was not enough memory, resulting in OOM kill mysqld process, and no swap partition, background monitoring memory is sufficient, and finally upgrade the physical memory solution.

4. Test and production environment

Be sure to look at your machine before important operations and try to avoid opening more windows.

This is the end of this article on "what aspects should Linux operation and maintenance engineers pay attention to?" Thank you for reading! I believe you all have a certain understanding of the knowledge of "what aspects should be paid attention to by Linux operation and maintenance engineers". If you want to learn more knowledge, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.