Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the function of linux operation and maintenance?

2025-03-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article shows you the role of linux operation and maintenance, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

First of all, congratulations on choosing to study Linux. You may be about to embark on a work trip to Linux. Before leaving, let me show you everything about Linux and Linux operation and maintenance.

Because of its high efficiency, easy to cut, wide application and other advantages, Linux has become the main operating system of high-end servers, and is in an irreplaceable position. Linux can be installed in a variety of computer hardware devices, such as mobile phones, tablets, routers, video game consoles, desktops, mainframes and supercomputers. With the rapid development of Linux in the Chinese market, the domestic Linux talent gap is becoming more and more prominent. Linux talent recruitment has also become one of the hottest recruitment.

First of all, linux is a very big concept. It's impossible to eat it all. Ideally, by understanding linux, you can do all the work. Individuals are more likely to say what kind of work they want to do and which part of the linux they need to learn.

Introduce the common areas of linux based on your personal experience, and what jobs you are going to do.

1) linux application. Strictly speaking, this part is not linux, but only applications running on linux, such as web, network, IT, etc., occupations include system research and development, background development, server performance optimization, operation and maintenance, etc.

2) linux customization. This part involves the linux version of the user package is more, the kernel will have some involved, mainly a variety of commercial linux customization, services, and so on. For example, redhat and so on, many are foreign companies, domestic mostly recruit on-site support and so on.

3) linux kernel development. This part is mainly the development of linux kernel driver. It's almost all programming. Mainly chip companies, as well as product development companies that use chips. The former is like intel,marvell and the latter is like ZTE Huawei.

4) android derivatives. Because android, including the linux kernel used by the slowly popular tizen, the reason is the same as 3. So mobile chip companies and mobile development companies are also one of the employers of linux developers. Such as Qualcomm, TI, etc.

I. the main work of Linux operation and maintenance

Linux operation and maintenance as a position with the largest number of people and the highest salary among many jobs, this paper focuses on the occupation of Linux operation and maintenance, which is jointly written by Ma GE Education and enthusiasts, an institution specializing in the study and career development of Linux operation and maintenance.

Internet Linux operation and maintenance work, with service as the center, with stability, security and efficiency as the three basic points, ensures that the company's Internet business can provide high-quality services for users in 7 × 24 hours. The responsibility of operation and maintenance covers the life cycle of the product from design to release, operation and maintenance, change and upgrade, and offline.

The responsibilities of operation and maintenance are important and extensive throughout the product life cycle, but the responsibilities of operation and maintenance engineers are not limited to this part of the work, but also need to summarize the problems encountered in the work. Extract relevant technical directions, research and development-related tools and platforms to support / optimize business development and improve the efficiency of operation and maintenance. Related technical work mainly includes:

Service monitoring technology: including the research and development and application of monitoring platform, the guarantee of service monitoring accuracy, real-time and comprehensiveness

Service fault management: including service fault plan design, automatic execution, fault summary and feedback to the product / system design level for optimization to improve product stability.

Service capacity management: measuring the capacity of services, planning the construction of computer rooms for services, capacity expansion, migration, etc.

Service performance optimization: improve service performance and response speed and improve user experience from all directions, including network optimization, operating system optimization, application optimization, client optimization, etc.

Service global traffic scheduling: the traffic of access services, which is allocated among computer rooms according to capacity and service status.

Service task scheduling: scheduling trigger and status monitoring of various scheduled / non-scheduled tasks of the service

Service security: including service access security, anti-attack, access control, etc.

Data transmission technology: including the development and application of P2P and other transmission technologies, as well as the solution of long-distance big data transmission and other problems

Automatic release and deployment of services: research and development of deployment platforms / tools, and the use of platforms / tools to achieve secure and efficient release services

Service cluster management: including service server management, large-scale cluster management, etc.

Service cost optimization: reduce the resources used by service operation as much as possible, and reduce the service operation cost.

Database management (DBA): by designing, developing, and managing high-performance database clusters, database services are made more stable, more efficient, and easier to manage.

Platform development: development and management of docker-like platforms, and service access technology

Development, Optimization and access of distributed Storage platform

And so on, all the work related to service quality, efficiency, cost, security and so on, as well as the technology, components, tools and platforms involved are in the technical category of operation and maintenance. Doing a good job in each technical direction and completing the research and development of the corresponding components, tools and platforms can play a positive role in fulfilling the responsibilities of operation and maintenance and have a key impact on the development of the business.

II. Classification of Linux operation and maintenance work

Operation and maintenance work in more directions, with the continuous development of the scale of business, the more mature Internet companies, the more detailed the division of operation and maintenance positions. At present, many large Internet companies only have system operation and maintenance in the start-up period, and their work is gradually subdivided with the requirements of model and service quality. In general, the classification of work and responsibilities of the operation and maintenance team (see figure 1-1) are as follows.

Figure 1-1 work classification of the operation and maintenance team

2.1-Application Operation and maintenance (SRE): application Operation and maintenance is responsible for online service changes, service status monitoring, service disaster recovery and data backup, as well as routine troubleshooting and emergency handling of services. The responsibilities are as follows: design review, service management, resource management, routine inspection, pre-plan management, data backup.

2.2-system Operation and maintenance (SYS): responsible for the construction of IDC, network, CDN and basic services (LVS, NTP, DNS); responsible for asset management, server selection, delivery and maintenance, responsibilities are as follows: IDC data center construction, network construction, LVS load balance and SNAT construction, CDN planning and construction, server selection, delivery and maintenance, kernel selection and OS related maintenance, asset management, basic service construction.

2.3-database operation and maintenance (DBA): database operation and maintenance is responsible for data storage scheme design, database table design, index design and SQL optimization, database change, monitoring, backup, high availability design, etc. The detailed work is as follows: design review, capacity planning, data backup and disaster preparedness, database monitoring, database security, database high availability and performance optimization, automation system construction, operation and maintenance research and development, operation and maintenance platform, monitoring system, automatic deployment system.

2.4-Operation and maintenance Security (SEC): operation and maintenance Security is responsible for network, system and business security reinforcement, routine security scanning, penetration testing, security tools and system research and development, and emergency handling of security incidents, the work is as follows: safety system establishment, security training, risk assessment, security construction, safety compliance, emergency response.

III. Daily use of software and skills by Linux operators

The operation and maintenance platforms and tools used by operation and maintenance engineers include:

Web servers: apache, tomcat, nginx, lighttpd

Monitoring: nagios, ganglia, cacti, zabbix

Automatic deployment: ansible, sshpt, salt

Configuration management: puppet, cfengine

Load balancing: lvs, haproxy, nginx

Transmission tools: scribe, flume

Backup tools: rsync, wget

Database: mysql, oracle, sqlserver

Distributed platforms: hdfs, mapreduce, spark, storm, hive

Distributed databases: hbase, cassandra, redis, MongoDB

Containers: lxc, docker

Virtualization: openstack, xen, kvm

Security: kerberos, selinux, acl, iptables

Problem tracking: netstat, top, tcpdump, last

Operation and maintenance is based on technology and provides higher quality service through technical guarantee products. The responsibilities of operation and maintenance work and their position in the business determine that operation and maintenance engineers need to have more extensive knowledge and in-depth technical capabilities:

Solid basic computer knowledge, including computer system architecture, operating system, network technology, etc.

General applications need to understand the operating system, network, security, storage, CDN,DB, etc., and know its related principles.

Programming ability, from the development of operation and maintenance tools to the development of large-scale operation and maintenance system / platform, requires good programming ability.

Data analysis ability: be able to sort out and analyze the data of the system, find problems and find solutions.

Rich system knowledge, including system tools, typical system architecture, common platform selection, etc.

Ability to make comprehensive use of tools and platforms

IV. The development process of Linux operation and maintenance work

In the case of few personnel, the early operation and maintenance team mainly carried out data center construction, basic network construction, server procurement and server installation and delivery. It rarely involves the change, monitoring, management and other work of online services. At this time, the operation and maintenance team belongs more to the role of infrastructure, providing a simple and available network environment and system environment.

With the gradual maturity of business products, there are higher requirements for the quality of service. At this time, the OPS team will also undertake some server monitoring work, and will also be responsible for LVS, Nginx and other layer 7 OPS work that has nothing to do with business logic. At this time, service changes are more manual, or there are some simple batch scripts. The focus of monitoring is more on the server status and resource usage, the monitoring of the status of service applications is almost less, monitoring more use of a variety of open source systems such as Nagios, Cacti and so on.

Due to the continuous increase in business scale and complexity, the operation and maintenance team will gradually be divided into two parts: application operation and system operation and maintenance. The application of operation and maintenance begins to take over the online business and gradually carry out the work of service monitoring carding, data backup and service change. With the deepening of the service, the application operation and maintenance engineer has the ability to start some simple optimization of the service. At the same time, in order to cope with a large number of service changes every day, we also began to write all kinds of operation and maintenance tools, which can easily change in batches for some specific services. With the increase of business scale, there are more and more failures in infrastructure due to insufficient capacity planning or weak ability to resist risks, forcing operators to devote more energy to the direction of multi-data center disaster recovery and plan management.

After the business scale reaches a certain extent, the open source monitoring system can no longer meet the business needs in terms of performance and function; a large number of service changes and complex service relationships, the previous way of manual recording and tool changes can not meet the business needs in terms of efficiency or accuracy; in terms of security, there have also been a variety of large and small events, forcing us to invest more energy in security defense. Gradually, the operation and maintenance team formed the five major job categories mentioned earlier, each of which requires specialized personnel. At this time, system operation and maintenance pay more attention to infrastructure construction and operation and maintenance, provide a stable and efficient network environment, and deliver servers and other resources to application operation and maintenance engineers. Application operation and maintenance pay more attention to the running status and efficiency of the service. Database operation and maintenance belongs to the refinement of application operation and maintenance work, which focuses more on automation, performance optimization and security defense in the field of database. Operation and maintenance R & D and operation and maintenance security provide various platforms and tools to further improve the work efficiency of operation and maintenance engineers and make business services run more stably, efficiently and safely.

We divide the development process of operation and maintenance into four stages, as shown in figure 1-2.

Figure 1-2 the development process of operation and maintenance

Manual management stage: the business flow is not large, the number of servers is relatively small, and the system complexity is not high. For day-to-day business management operations, we are more likely to log on to the server one by one for manual operation, which belongs to each on its own. Everyone has their own mode of operation and lacks the necessary operation standards and process mechanisms. For example, the business directory environment is varied.

Tool batch operation stage: with the increase of server scale and system complexity, the full manual operation mode can no longer meet the needs of the rapid development of business. As a result, operators gradually began to use batch operation tools, and different scripts appeared for different types of operations. But each team has its own tools, which need to be adjusted each time the operational requirements change. This is mainly due to the lack of norms for the environment and operation, resulting in weak programmable processing capacity. At this time, although the efficiency improved in part, but soon encountered a bottleneck. The quality of the operation has not improved much, and it may even lead to larger-scale problems as a result of batch execution. We began to establish a large number of process specifications, such as review mechanism, go online to observe a server for 10 minutes before continuing with the later operation, and observe for at least 20 minutes after an upgrade is completed. These mainly rely on human supervision and implementation, but in the actual process, the implementation is often not in place, but reduces the work efficiency.

Platform management stage: at this stage, there are higher requirements for operation and maintenance efficiency and misoperation rate, we decided to start the construction of operation and maintenance platform, through the platform to carry standards, processes, and then liberate manpower and improve quality. At this time, the change action of the service is abstracted, and a unified standard is formed, such as the operation method, service directory environment, service operation mode and so on. For example, the start-stop interface of the program must include start, stop, reload and so on. Restrict the operation process through the platform, such as the above mentioned online a server to observe for 10 minutes. The pause checkpoint is forcibly set in the platform, and after the operation of the first server is completed, the operation and maintenance personnel are required to fill in the corresponding check items before you can continue to perform subsequent deployment actions.

System self-scheduling stage: with a larger number of services, more complex service relationships, and a large number of operation and maintenance platforms, the original way of transforming batch operations into platform operations is no longer suitable. Service changes need to be abstracted to a higher level. Each server is abstracted into a container, and the service is scheduled and deployed to an appropriate server by the scheduling system according to the use of resources, and the linkage with the surrounding operation and maintenance systems is completed automatically. such as monitoring system, log system, backup system and so on. Through the self-scheduling system, the capacity can be dynamically scaled according to the operation of the service, and the common service failures can be handled automatically. The work of the operation and maintenance personnel will also be advanced to the product design stage to assist the R & D personnel to transform the service so that they can be connected to the self-scheduling system.

In the whole process of the development of operation and maintenance, we hope that all the work will be automated, reduce the repetitive work of people, reduce the cost of knowledge transfer, make our operation and maintenance delivery more efficient and safer, and make the product run more stably. For fault handling, we also hope to change from post-processing to early detection, from manual processing to automatic disaster recovery.

5. Cutting-edge skills that must be grasped by Linux operators

This is the tip of the iceberg of the profound changes that are taking place in the technology world, so the problem is? As a traditional operation and maintenance, how should it be transformed?

Here's a little advice: you need to learn these four parts:

Automated operation and maintenance (Ansible,Puppet,Saltstack, etc.)

Devops (Docker,K8s,Jenkins,Jira, etc.)

Cloud service technology (virtualization, OpenStack, AWS and Aliyun various product service architectures, etc.)

Python

The above content is what is the role of linux operation and maintenance? have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report