From bitter to fierce! 2019 the latest Linux operation and maintenance engineer necessary skills map. 07/11 Update SLTechnology News&Howtos

From bitter to fierce! 2019 the latest Linux operation and maintenance engineer necessary skills map.

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

The operation and maintenance engineer is a very hard job in the early stage, during which he may be doing the work of repairing computers, pinching network cables, and moving machines. Time is also very fragmented, a variety of trivia around you, it is difficult to reflect personal value, gradually confused about the industry, feel that there is no future.

These dull work will indeed make people scarce, from the technical level, these are actually basic skills, and will virtually bring some help to the later operation and maintenance work, because I have come through this way and can deeply experience it. Therefore, in this period, we must maintain a positive attitude and continue to learn. One day in the future, I believe it will be returned to you!

All right, let's get to the point. According to my many years of operation and maintenance experience, I would like to share with you the learning route of senior operation and maintenance engineers.

Primary

1. Linux Foundation

At the beginning, you need to be familiar with Linux/Windows operating system installation, directory structure, startup process, etc.

2. System management

Mainly study the Linux system, basically complete the work in the character interface in the production environment, so it is necessary to master dozens of basic management commands, including user management, disk partition, software package management, file permissions, text processing, process management, performance analysis tools and so on.

3. Network foundation

OSI and TCP/IP models must be familiar with. The basic concepts and implementation principles of switches and routers should be known.

4. The basis of Shell script programming

Master the basic syntax structure of Shell and be able to write simple scripts.

intermediate

1. Network services

The most commonly used network services must be deployed, such as vsftp, nfs, samba, bind, dhcp, etc.

Code version management system is indispensable, you can learn from the mainstream SVN and GIT, can be deployed and easy to use.

Data is often transferred between servers, so use: rsync and scp.

Data synchronization: inotify/sersync.

Repetitive completion of some work, can be written as a script to run regularly, so it is necessary to configure the scheduled task service crond under Linux.

2. Web service

Every company basically has a website, and if you can make it run, you need to build a Web service platform.

If it is developed in PHP language, usually build LAMP, LNMP website platform, which is a combination of technical terms spelling, separately speaking, you have to be able to deploy Apache, Nginx, MySQL and PHP.

If it is developed in JAVA language, Tomcat is usually used to run the project. in order to improve the access speed, we can use Nginx reverse proxy Tomcat,Nginx to deal with static pages and Tomcat to deal with dynamic pages to achieve static and static separation.

It is not as simple as deployment, but also know how the HTTP protocol works and simple performance tuning.

3. Database

The database is MySQL, the most widely used open source database in the world. There must be nothing wrong with learning it! Also need to know some simple SQL statements, user management, common storage engines, database backup and recovery.

If you want to go deeper, you must have master-slave replication, performance optimization, and mainstream cluster solutions: MHA, MGR, and so on. Of course, it is necessary for NoSQL to be so popular. Just learn Redis and MongoDB.

4. Security

Security is very important, do not wait until the system has been invaded, and then do security policy, it is too late! Therefore, when a server comes online, security access control policies should be made immediately, such as using iptables restrictions to allow only trusted source IP access, shutting down some useless services and ports, and so on.

Some common types of attacks must be known, otherwise how to prescribe the right medicine! Such as CC, DDOS, ARP and so on.

5. Monitoring system

Monitoring is essential, and it is a lifesaver to find and trace problems in time. You can choose to learn from the mainstream Zabbix open source monitoring system, which is rich in functions and can meet the basic monitoring needs. The monitoring point includes basic server resources, interface status, service performance, PV/UV, log and so on.

You can also set up a dashboard to display a few real-time key data, such as Grafana, which will be very cool.

6. Shell script programming is advanced

Shell script is a powerful tool for Linux to complete its work automatically, and you must be proficient in writing it, so you have to further learn functions, arrays, signals, email, and so on.

Text processing three Musketeers (grep, sed, awk) have to play 6 ah, text processing under Linux depends on them.

7. Python development basics

Shell scripts can only perform some basic tasks and want to accomplish more complex tasks, such as calling API, multi-processes, and so on. You need to learn an advanced language.

Python is the most widely used language in the field of operation and maintenance, and it is easy to use. You can master the basics at this stage, such as basic syntax structures, file object operations, functions, iterative objects, exception handling, e-mail, database programming, and so on.

Advanced

1. Web static cache

Users are always shouting that it is slow to visit the website and see that the server resources are still very rich. The slow access to the website may not be caused by the saturation of server resources, and there are many factors, such as the network, the number of forwarding layers and so on.

For the network, there is a communication problem between the north and the south, and the access between them will be slow, which can be solved by using CDN. At the same time, the static page can be cached to intercept the request at the top level as much as possible to reduce the back-end request and response time.

If you do not use CDN, you can also use caching services such as Squid, Varnish and Nginx to implement static page caching and put it at the traffic entrance.

2. Cluster

After all, a single server has limited resources, so it is certainly impossible to resist high traffic. The key technology to solve this problem is to use load balancer to scale multiple Web servers horizontally and provide services at the same time, so the performance will be expanded exponentially. The mainstream open source technologies of load balancer are LVS, HAProxy and Nginx. Be sure to be familiar with one or two!

Web server performance bottleneck is solved, the database is more critical, or use clustering, take the learning of MySQL, for example, can be one master multi-slave architecture, on this basis, read and write separation, master responsible for writing, multi-slave responsible for reading, from the library can be expanded horizontally, in front of a four-tier load balancer, carrying tens of millions of PV, all right!

Highly available software also has to be able to avoid single-point weapons, such as Keepalived, Heartbeat and so on.

There are so many pictures on the website! NFS shared storage can not support, the processing is very slow, easy to do! Distributed file system, parallel processing tasks, no single point, high reliability, high performance and other characteristics, the mainstream are FastDFS, MFS, HDFS, Ceph, GFS and so on. In the initial stage, I suggest learning FastDFS to meet the needs of small and medium-sized enterprises.

3. Virtualization

The utilization rate of hardware server resources is very low, which is a waste! More idle servers can be virtualized into many virtual machines, each of which is a complete operating system. It can greatly improve the utilization of resources. It is recommended to learn the open source KVM+OpenStack cloud platform.

Virtual machines can be used as a basic platform, but the flexible scaling of application business is too heavy. It takes several minutes to start up, and the file is so large that it is too hard to expand quickly!

In other words, the main features of the upper container are rapid deployment and environment isolation. A service is encapsulated in an image, and hundreds of containers can be created in minutes.

The mainstream container technology is Docker.

Of course, the stand-alone Docker in the production environment can not meet the business needs in most cases. Kubernetes and Swarm cluster management containers can be deployed to form a large resource pool and centralized management to provide strong support for the infrastructure.

4. Automation

Repeated work, not only can not improve efficiency, value can not be realized.

All operation and maintenance work is standardized, such as environmental version, directory structure, operating system and so on. In order to be more automated on the basis of standardization, it is cool to click a mouse or click a few commands to complete a complex task.

Therefore, all operations are automated as far as possible, reduce human errors and improve work efficiency.

Mainstream server centralized management tools: Ansible, Saltstack

Either of the two will be fine.

Continuous Integration tool: Jenkins

5. Python development is advanced

You can learn more about Python development and master object-oriented programming.

It is also best to learn from a Web framework development website, such as Django and Flask, mainly to develop operation and maintenance management systems, write some complex processes into the platform, and then integrate centralized management tools to create a management platform that belongs to operation and maintenance.

6. Log analysis system

Logs are also very important. Regular analysis can find potential dangers and extract valuable things.

An open source log system: ELK

Learn to deploy and use to provide developers with log viewing requirements.

7. Performance optimization

Deployment alone is far from enough, performance optimization can maximize the carrying capacity of services.

This piece is also quite difficult, and it is also one of the key points of high salary. You have to make some efforts to study for money.

You can think from the dimensions of hardware layer, operating system layer, software layer and architecture layer.

Consciousness

1. Persevere

Learning is a long process, and it is a career that each of us needs to adhere to all our lives.

The most important thing is to persist, the difficulty lies in persistence, the success lies in persistence!

2. Goal

Those without goals are not called work, and those without quantification are not called goals.

At each stage, set a goal.

For example: first set a small goal that can be achieved, earn it 100 million!

3. Share

Learn to share, the value of technology is that it can effectively transfer knowledge to the outside world, so that more people know it.

As long as everyone comes up with something, think about what it will be like.

If you go in the right direction, you won't be afraid of a long way to go!

Ten items of Linux common sense

1. GNU and GPL

The GNU Project (also known as the Gernu Project) is a free software collective collaboration project publicly launched by Richard Stallman (Richard Stallman) on September 27th, 1983. Its goal is to create a completely free operating system. GNU is also known as a free software engineering project.

GPL is GNU's General Public license (GNU General Public License,GPL), the concept of "anti-copyright", is one of the GNU protocols, the purpose is to protect GNU software can be freely used, copied, studied, modified and distributed. At the same time, the software must be released in the form of source code.

GNU system combines with Linux kernel to form a complete operating system: a Linux-based GNU system, which is usually called "GNU/Linux", or Linux for short.

2. Linux distribution

A typical Linux distribution includes the Linux kernel, some GNU libraries and tools, command-line shell, X Window systems with a graphical interface and corresponding desktop environments such as KDE or GNOME, and contains thousands of applications ranging from office suites, compilers, text editors to scientific tools.

Mainstream distributions:

Red Hat Enterprise Linux 、 CentOS 、 SUSE 、 Ubuntu 、 Debian 、 Fedora 、 Gentoo

3. Unix and Linux

Linux is based on Unix and belongs to Unix class. Uinx operating system supports multi-user, multi-task, multi-thread and multi-CPU architecture. Linux inherits the design idea of Unix with network as the core, and is a multi-user network operating system with stable performance.

4. Swap partition

Swap partition, or swap area, where the system swaps with Swap when there is not enough physical memory. That is, when the physical memory of the system is insufficient, a part of the space in the hard disk is released for use by the currently running program. When those programs are about to run, restore the saved data from the Swap partition to memory. Programs that are freed of memory space are generally programs that do not operate for a long time.

Swap space should generally be greater than or equal to the size of physical memory, with a minimum of not less than 64m and a maximum of twice the size of physical memory.

5. The concept of GRUB

GNU GRUB (GRand Unified Bootloader referred to as "GRUB") is a multi-operating system boot manager from the GNU project.

GRUB is a boot manager that supports multiple operating systems. In a computer with multiple operating systems, you can use GRUB to select the operating system that users want to run when the computer starts. At the same time, GRUB can boot different kernels on the Linux system partition, and can also be used to pass startup parameters to the kernel, such as entering single-user mode.

6. Buffer and Cache

Cache (cache) A temporary memory located between CPU and memory. The cache capacity is much smaller than memory, but the exchange speed is much faster than memory. By caching file data blocks, Cache solves the contradiction between CPU operation speed and memory reading and writing speed, and improves the data exchange speed between CPU and memory. The larger the Cache cache, the faster the CPU processing.

Buffer (buffer) cache memory, by caching data blocks of disk (Imax O device), speed up the access to data on disk, reduce Imax O, and increase the speed of data exchange between memory and hard disk (or other Imax O devices). Buffer is about to be written to disk, while Cache is read from disk.

7. TCP three-way handshake

(1) the requester sends SYN (SYN=A) packets and waits for confirmation by the responder.

(2) the responder receives the SYN and returns SYN (Atom 1) and its own ACK (K) to the requester.

(3) the requester receives the SYN+ACK packet of the responder, and sends the acknowledgement packet ACK to the responder again.

The request side and the response side establish the TCP connection, complete the three-way handshake, and start the data transmission.

8. Linux system directory structure

The Linux file system uses a linked tree directory structure, that is, there is only one root directory (usually represented by "/"), which contains information about subordinate subdirectories or files, and subdirectories can contain information about lower-level subdirectories or files.

/: the root of the first hierarchy, the root of the entire file system hierarchy. That is, the entry to the file system, the highest-level directory.

/ boot: contains the files required by the Linux kernel and the system bootstrap, such as kernel and the initrd;grub system boot manager are also in this directory.

/ bin: the commands and functions required by the basic system are similar to "/ usr/bin". The files in this directory are executable. Ordinary users can also execute it.

/ sbin: basic system maintenance commands that can only be used by superusers.

/ etc: all system configuration files.

/ dev: device file storage directory. Such as terminals, disks, optical drives, etc.

/ var: stores frequently changing data, such as logs, emails, etc.

/ home: the default storage directory for ordinary users.

/ opt: the directory where third-party software is stored, such as user-defined packages and compiled packages, are installed in this directory.

/ lib: the library files and kernel modules store directories that contain all the shared library files needed by the system program.

9. Hard links and soft links

Hard links (Hard Link): hard links are links that use the same index node (inode number), that is, you can allow multiple file names to point to the same file index node (hard links do not support directory links or cross-partition links). Delete a hard link without affecting the source file of the index node and multiple hard links under it.

Ln source new-link

Soft links (symbolic links, Symbolic Link): symbolic links are created in the form of paths, similar to windows shortcut links, symbolic links allow you to create multiple file names to link to the same source file, delete the source file, and all soft links under it will not be available. (soft connection supports directories, cross-partition, cross-file system)

Ln-s source new-link

10. RAID technology

Disk array (Redundant Arrays of independent Disks,RAID), cheap redundant (independent) disk array.

RAID is a technology that combines multiple independent physical hard disks in different ways to form a hard disk group (logical hard disk), which provides higher storage performance and data backup technology than a single hard disk. RAID technology can provide disk crossing function by combining multiple disks together as a logical volume; data can be divided into multiple data blocks (Block) to write / read multiple disks in parallel to improve the speed of accessing disks; fault tolerance can be provided through mirroring or check operations. The specific functions are implemented in different RAID combinations.

In the eyes of users, the disk group composed of RAID is like a hard disk, which can be partitioned, formatted and so on. The storage speed of RAID is much higher than that of a single hard disk, and it provides automatic data backup and good fault tolerance.

RAID level. Different RAID combinations are divided into different RAID levels:

RAID 0: known as Stripping stripe storage technology, all disks are read and written in parallel, which is the simplest form of building a disk array. It only requires more than 2 hard disks, which has a low cost and can provide the performance and throughput of the entire disk. But RAID 0 does not provide data redundancy and error repair functions, so the damage of a single hard disk will lead to all data loss. (RAID 0 simply improves disk capacity and performance, does not provide reliability guarantee for data, and is suitable for environments that do not require high data security.)

RAID 1: mirrored storage, data redundancy is achieved by mirroring the data from one disk to the other, resulting in backup data on two disks whose capacity is only equal to the capacity of one disk. When data is written to one disk, a mirror is produced on another idle disk to maximize the reliability and repairability of the system without affecting performance; when the original data is busy, data can be read directly from the mirror copy (read from the faster one of the two hard drives) to improve read performance. In contrast, RAID 1 is slow to write. RAID 1 generally supports "hot swapping", that is, the removal or replacement of hard drives in the array can be carried out while the system is running, leaving the system without interruption. RAID 1 is the highest unit cost of hard disk in the disk array, but it provides high data security, reliability and availability. When a hard disk fails, the system can automatically switch to the mirrored disk to read and write without the need to reorganize the failed data.

RAID 0room1: also known as RAID 10, is actually a combination of RAID 0 and RAID 1, mirroring each disk for redundancy while continuously dividing data in bits or bytes and reading / writing multiple disks in parallel. Through the combination of RAID 0 disk 1, each disk has its own physical mirror disk except for multiple disks, which provides redundancy, allows one disk failure without affecting data availability, and has fast read / write capabilities. RAID 0room1 requires at least 4 hard drives to establish a band set in the disk mirror. RAID 0room1 technology not only ensures the high reliability of data, but also ensures the high efficiency of data reading / writing.

RAID 5: a storage solution that takes into account storage performance, data security, and storage costs. RAID 5 can be understood as a compromise between RAID 0 and RAID 1, and RAID 5 requires at least three hard drives. RAID 5 can provide data security for the system, but the degree of protection is lower than that of mirrors, and the utilization of disk space is higher than mirrors. RAID 5 has a data read speed similar to that of RAID 0, but with one more parity information, and writing data is slightly slower than writing to a single disk. At the same time, because multiple data correspond to a parity information, RAID 5 has higher disk space utilization and lower storage cost than RAID 1, so it is a widely used solution at present.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.