How fast can the cloud be on the application? 07/13 Update SLTechnology News&Howtos

How fast can the cloud be on the application?

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

1 Summary

This article explains why it is necessary to have an orchestration system to support cloud automation in a good public or private cloud, as well as the difficulties and efforts of implementing this orchestration system. At the same time, it provides a prototype to implement the choreography system, which includes theoretical analysis and agent plug-in framework, and gives some suggestions for detailed control. I hope it will help you to have a better understanding of the concept of "resource orchestration & application orchestration". I also hope to work with you with an open mind to make the cloud as natural and popular as hydropower.

2 Why do you need cloud automation

There is no need to say much about automation in the field of IT, which every programmer knows is a must. Automated scripts, automated testing, automated deployment, and so on, are all for the program and all kinds of programmers around it to run more happily. So do we still need automation on the cloud? To put it simply, you don't need to think about it for the first time; deep users need cloud automation. It is embodied in:

2.1 repetitive execution actions

In the work of verifying the launch of applications on the cloud, there are a lot of things that need to be repeated. For example, destroy and rebuild the environment, or repeatedly complete the configuration of multiple new instances in the scenario of capacity expansion. Once such operations become more frequent, such as once a day or multiple times a day, you will find it tedious and start trying to automate the entire process to ensure that each execution is repeatable. Maybe you will write some Shell or Python scripts, or you will actively call the cloud provider's API, or even with the help of some tools such as Chef, Puppet to do this.

Repetition is the first condition for automation.

2.2 time Saving

Using services on the cloud, some operations are very time-consuming, such as creating a database and creating a VM, you have to wait for minutes. Once you need to create multiple time-consuming tasks in series, you need to wait for a period of time. At this point, if the whole process can be automated, the artificial waiting process can be released and the programmer can complete other more valuable tasks.

After the process on the cloud is automated, the overall time to perform the action will not be reduced, but the waiting time can be transferred, such as late at night. It is also for this reason (the time-consuming has not been reduced, but just transferred), so the time savings after automation should be based on repeatability. If it is only an one-time operation, then the "time saved by automation" vs "time to complete automation" is generally not cost-effective.

2.3 replication of the underlying environment

The basic environment here refers to Infrastructure, which refers to the collection of all the cloud services needed for the application to run on the cloud. For example, the 3-tier structure of a typical Web website, front end + background + database. After a complete system has been built in a certain region on the cloud (such as North China), there will be a need for system replication when the same environment needs to be rebuilt on the cloud of South China or even another cloud provider. Is it up to the programmer to install one component by another manually? Or is it automated and repetitive deployment? When there is the ability of the latter, of course, the latter is the first choice.

Now many cloud vendors are implementing a concept called Infrastructure As Code, which uses machine-understandable configuration files instead of manual interactive configuration actions. And this configuration file can be versioned like code through a version management system. In this way, there are three main benefits to the enterprise: reducing cost, improving efficiency and reducing risk.

Reducing costs is easy to understand, and as mentioned above, automation can shift manpower to other tasks and increase programmer output. The improvement of efficiency is mainly reflected in that the implementation process of environment installation can be shortened through automated configuration, especially if there are multiple components or team interaction. At the same time, automation can eliminate human errors, and repeatable execution characteristics also improve the reliability of the implementation process.

2.4 self-service

Cloud services, if done well, should be self-service, just like tap water and electricity, that is, pay on demand. Only in this way can we support any automated on-demand supply and on-demand expansion, which is the meaning of the cloud itself.

So this reason is actually a requirement for cloud providers, your cloud platform should be able to support self-service on-demand use of a variety of cloud services, and provide corresponding usage measurement information (billing) and usage reports. Only when the back-end implementation process of the platform is fully automated can the user experience be completely self-service. This is the same as Taobao merchants'"shoot whatever you have", otherwise you have to communicate with the store before you place the order, and you can't use it on demand.

2.5 Summary: the cost of automation

Anything that needs automation, the premise is that you need to repeat the implementation, only when the benefits of automation outweigh the cost of repetition, there will be the need for automation. If the task is only one-time, there is no need for automation. On the contrary, we believe that in terms of benefits, careful manual operation is more cost-effective than automating the process.

For example, sometimes you don't want to install a set of tap water when you are thirsty on the road, you might as well buy a bottle of mineral water directly, while at home, you need to install a tap water system, because you need to use water every day.

Automation on the cloud provides a kind of reliability, which makes the behavior of every creation of cloud resources and cloud services consistent, and the execution of any user or organization is repeatable; at the same time, it also eliminates the problems caused by possible human errors, and is a necessity for deep users on the cloud.

3 Evolution of automation on the cloud 3.1 difficulties faced by automation

(1) due to the wide variety of cloud services, it is not easy to achieve comprehensive automation. A typical cloud platform will provide countless services such as ECS (virtual machine), EVS (hard disk), VPC (network), RDS (database), ELB (load balancing), and so on. There is a new term called AWS fatigued, which means that AWS launches a variety of new services & new features every year, making users feel "AWS tired" and tired of using the newly launched services & features.

(2) there are complex dependencies in the creation of cloud services. The most typical example is that when you create EIP, you need to bind VM, and when you create CM, you need to create Subnet first. The premise of building Subnet is that you need to have VPC first. Layers of dependencies, as well as cross-dependencies, set up obstacles for developers in the attempt to automate, making it much more expensive to complete automation. According to the aforementioned costs outweigh the repetitive benefits, automation is abandoned.

(3) the use of resources on the cloud is different from the traditional way. The user changes from the full owner of the resource to the user of the resource. With the reduction of background permissions, you are unable to control everything, which makes it less convenient to locate the cause of resource initialization failure (perhaps caused by the Bug of the cloud platform itself). Sometimes you have to contact the cloud provider for help to find out the cause of the failure. In addition, there will be a slight change in the use process, it turns out that your package will be copied to the verification environment, while in the cloud, you may need a transit springboard to achieve your goal. All these aggravate the difficulties in the implementation of automation.

3.2 attempts to automate

Here directly to a map to summarize the process of cloud automation attempts, can be more intuitive understanding of the development of this field. However, in terms of resource supply automation and resource scheduling, the boundary is not so obvious, and we can see that the main difference is in flexible syntax. In the existing automation template gradually add some flexible syntax, can basically achieve the purpose of flexible layout.

4 the ultimate automation system-choreography

Automation means that there is no need for human intervention in a task process, while scheduling means that multiple task processes can be planned in advance, and tasks can cooperate with each other, parallel or serial execution. It can be seen from the most direct definition that only arbitrary automatic process control can be called orchestration, which is an upgraded version of automation. Thus, if a cloud manufacturer's orchestration system, even some basic automation processes can not meet, then it is not a good orchestration system.

4.1 arrangement benchmarking on the cloud

When it comes to cloud orchestration system, we have to mention Big Brother AWS's Cloudformation, which is basically a standard of AWS cloud ecology, supporting the application market and service catalog to complete the initialization process of any IT software and IT infrastructure.

Its main principle is that the user provides a variety of properties to create the object, and then CFN assists in the creation of the object. Initializing EC2, for example, is equivalent to creating a VM virtual machine. Then the user has to provide attributes: hostname, what image to use, how big the hard disk is, what network to use, host specifications, security groups, etc. With these attributes, CFN can determine how to call EC2's API to create the VM.

The reason why it is very powerful is that it not only provides the ability to control the execution order, but also provides a lot of built-in functions at the syntax level, through which users can reference variables, concatenate variable values, and control execution details. With super-rich choreography objects, the use of CFN can basically satisfy the automatic creation of any AWS resources.

4.2 comparison of cloud orchestration systems

Here we analyze the capability analysis tables of three typical cloud vendors that provide orchestration capabilities, and please contact us to correct any inaccuracies. (Amazon CFN, Alibaba ROS, Huawei AOS)

√ means "strong / well done", O means "general / to be enhanced", and X means "without this feature".

Function

Characteristics

AWS

ROS

AOS

Description

Template syntax

Input parameter / object / output

√

Basic functions of choreography

Look-up table parameters

√

Mapping table syntax to confirm the value of variables in advance

Conditional deployment

√

Condition conditional syntax, which flexibly controls whether an object is created.

Orchestration object

√

Types of cloud servic

Built-in function

√

String concatenation: Fn::Join

Get attribute: Fn::GetAtt

Built-in variable

√

In AWS: AWS::Region

In ROS: ALIYUN::StackName

Resource startup sequence

√

Such as DependOn dependencies

Header file reference

√

Long template files are split into multiple template file management

Stack execution

Resource strategy

√

For example, whether some stack resources are reserved when the stack is destroyed

Metadata definition

√

Add custom extended attributes to the object

Stack nesting

√

The stack contains another stack, and large collaboration scenarios (such as solutions) require

Help tool

√

Such as cfn-init/cfn-hup, an auxiliary tool for deploying VM virtual machine applications

Stack update

√

ChangeSet, give detailed tips for changes

K8S application

√

Application of choreography Kubernetes in Ecology

Designer

Element drag and drop

√

Dependent connection

√

Zoom positioning

√

Interactive editing of picture and text

√

ROS does not support IDE plain text editing

Picture preview

√

Single element editing

√

Element attribute association

√

The cursor is automatically associated with the available attribute fields of the element.

Attribute structure display

√

Complex attribute definitions, memory-free editing

Grammar check

√

Function fast insertion

Element document hint

√

Note: the Heat choreography capability of OpenStack is similar to that of AWS, but there is no graphical designer, which is not listed here.

4.3 shortcomings of the choreography system

The current orchestration system needs a description file to describe the execution process that the user wants. This description file is generally referred to as a "template".

This is determined by the complexity of the orchestrated target object: creating a RDS database requires more control parameters than creating a separate VM. So a new template syntax is equivalent to a new programming language. For those of you who have written code, you must know that if you want to code quickly, you certainly need the right IDE support. For this reason, some powerful choreography systems will launch the corresponding graphical designer, and its positioning is the matching template writing IDE.

For example, AWS, Ali and Huawei all offer online template editing IDE. The evaluation criterion of a designer is whether it can support a convenient writing template.

5 how to implement cloud orchestration system

The core of an orchestration system is a workflow engine, which is responsible for analyzing the dependencies among steps and controlling the execution order of these processes according to the DAG (directed acyclic graph) model. The choreography on the cloud, more specific, is to create each cloud service in the order of dependency.

At the algorithmic level, we can call each cloud service an element. The process of creating various cloud services is the process of creating elements in sequence.

5.1 directed acyclic graph DAG

A directed acyclic graph (Directed Acyclic Graph, DAG) is a kind of directed graph, which literally means that there are no cycles in the graph. It is often used to represent the dependency between events and to manage the scheduling between tasks.

Figure: an example of a directed acyclic graph

The topological sorting of all nodes is an algorithm often used in directed acyclic graph, and our system prototype is also implemented according to this theoretical basis. Is to determine all the elements in accordance with the DAG dependency relationship who first who after the order, the specific algorithm you can search on the Internet or data to get, here will not be introduced in detail. After sorting, the next implementation is to complete the underlying elements first, and then complete the upper elements until all the elements are initialized. The above is the theoretical reference of our choreography system model.

5.2 orchestration system prototype

Here we assume that there is a system initialization process as follows:

To achieve that all elements are created in a set order, we follow two main points: (1) default parallel execution. (2) execution without dependence. In the implementation of the specific algorithm, we first decompose the element startup sequence into a directed graph, and traverse to calculate the number of dependencies of each node. As follows:

Note: dependency only needs to calculate the neighboring nodes.

Follow the previous two principles: then the dependency number of element B and element D is, so these two elements can be initialized first. At the same time, B and D are independent and can be executed in parallel.

After any one of the elements is executed, the number of dependencies on these nodes is reduced by one, and the number of dependencies of all nodes is regained:

The elements that can be executed this time are C and F because their dependencies are. After the execution of these two elements, subtract by one the number of dependencies on the elements that depend on them, and get all node dependencies back:

Execute recursively according to the above logic until all the elements are executed, and the entire workflow is complete. It ensures that the whole process is used in the shortest time in order. From the principle of workflow implementation, we can see that the ability of choreography does not emphasize process control, but the richness of choreography elements and syntax. A good choreography system can quickly complete the drive development of new elements, so as to provide the choreography ability of new services.

5.3 Information transfer between elements

If each element is initialized, the information of the other elements must be recorded, which is very coupled in the implementation. In order to keep each element independent at execution time (that is, when the current element is initialized, you do not have to know the information of other elements). The body framework needs to maintain a global information, and then when initializing an element, just tell it the information that the element needs. It has no idea what other elements it has, but it has all the information it needs.

As an example, the scheduling framework maintains global information that records which parameters are required for each element to initialize. The green one above needs to be provided by the user, while the red one is automatically obtained after the dependent object is created. For example, if the ID of VPC is needed to create VM, then the ID of VPC will be known after the creation of VPC. This field does not need to be provided by the user.

So after element D is initialized, element C can start initializing. At this point, all the parameters that create C should be confirmed values. There is no lack of information when invoking the initialization API of the C service. In this way, when implementing C to create API and destroy API, it is very independent and only deals with the C service itself.

As shown in the figure above, when developing a new service, you only need to understand the new service itself, and all the desired information (which can be provided directly by the user or obtained through dependencies) will be managed and transmitted through the framework.

This is our plug-in framework, which makes it very easy to add a service. Because the service-driven development is completely independent.

5.4 plug-in design 5.4.1 life cycle of elements

Each cloud service object is an element in the view of the orchestration system. To add a new element to the layout, it is necessary for the element to provide basic execution capabilities such as additions, deletions, changes and queries. The plug-in management framework of the orchestration system invokes the API corresponding to the element based on the actions performed by the user, such as creating or destruction.

With the element execution process framework in the previous section, add a choreography object that only needs to complete the various behavior drivers of the element. For example, as long as there is a way to create and destroy VM (API), you can add an EC2 service to the orchestration element, and you can add the choreography of this element to the template. The scheduling framework only treats it as a common element.

5.4.2 user-defined plug-in

Based on the advantage that each element of the plug-in framework is driven independently, and considering that the Resource object in Kubernetes also has a custom extension feature (custom resource definition), we can design an element plug-in to support the ability of users to define their own K8S orchestration objects. That is, the "information" provided by the user is passed to the underlying API intact. The underlying system interprets the user's "information". The choreography system is reduced to only responsible for process control and information transmission channels.

5.4.3 wait for the operation & progress

As mentioned earlier, the operation of some cloud services is very time-consuming, and if you can't provide intuitive feedback on the overall progress, the user experience will be very poor, killing the entire execution process. Therefore, in element-driven writing, we must consider the progress and wait for feedback, so that the choreography framework can perceive the progress of execution. This allows the user to know which element is currently being executed and how the execution of that element is progressing. So as to ensure that the overall choreography process can give users the most direct and friendly response.

5.5 TOSCA model

With the scheduling framework & the plug-in framework, the rest is the syntax of the configuration file, and the main syntax that can be used for reference at present is AWS's Cloudformation and TOSCA syntax. Among them, AWS-CFN is centered on resource initialization, while TOSCA is defined as TOSCA is a specification that aims to standardize how we describe software applications and everything that is required for them to run in the "cloud", which shows that TOSCA is more App-oriented. In view of the popularity of container technology, more and more applications appear as independent containers, and the need for traditional VM is no longer emphasized. We think it is a good choice to use TOSCA in template syntax.

In fact, in the process of automation, you will find that the syntax of the template is not the key point. As long as it can be automated, there will be no big difference in template writing, so the key is to look at the automation ability. This is like the programming language choice, Java and Go, writing binary tree traversal does not care whether to use for or while. The main difference between various programming languages is the built-in function / library, so the goal is to provide rich automation convenience in the syntax of the template. You need to learn from AWS, which provides a lot of built-in functions.

6 Summary

In the cloud, automation is actually a rigid requirement, and only with the completion of the base of automation can a complete cloud ecology be built. As an advanced automation capability, orchestration needs to take on the important task of promoting the integrity of cloud ecology. It is a hard currency to test the strength of a cloud manufacturer.

Huawei PaaS team has years of exploration and accumulation in the field of automation and orchestration on the cloud, especially on the PaaS cloud. I hope to share with the industry and promote the development of the cloud orchestration field, so that the use of the cloud can bring a better user experience, so that cloud automation can be as ubiquitous as the cloud trend.

At present, Huawei Cloud AOS products are holding a challenge for the fastest applications to go to the cloud.

Here, you can see a comprehensive scenario of the solution applied to the cloud template.

From now on, create your application template and win rich gifts.

Https://bbs.huaweicloud.com/forum/thread-11376-1-1.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.