In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
How to continue to define SaaS cloud data warehouse + Serverless, many novices are not very clear about this. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.
A brief introduction to Serverless
The following figure shows the Serverless architecture of MaxCompute, which mainly includes data access services, multi-computing environment, storage services and management modules.
The main features of each module are as follows:
(1) data access service of Serverless
Provide Tunnel batch, streaming import, convert to MC column storage format, automatic scaling and other functions, and free of charge
You can use the LOAD/UNLOAD command for OSS import / export free of charge.
(2) Multi-computing environment of Serverless
Severless computing resource pool, large-scale computing resource pool, On-demand on demand, pay-per-job
Exclusive computing resources: support prepaid payment, Workload management (load isolation, priority, time-sharing scaling, etc.)
The runtime environment (runtime) supports ETL/OLAP/ML and other big data to analyze usage scenarios.
(3) Storage service of Serverless
Independent of computing, it scales independently and provides GB-EB-level storage services.
Pay according to actual storage size to reduce cost
No need to specify, default for analysis optimization (column pressure, compression)
Optimization methods such as differentiation / split bucket / Zorder are supported.
(4) Management of Serverless
Right out of the box, with complete management capabilities built in to API/sdk/web-console management
The platform does not need user operation and maintenance to reduce costs.
The above is a brief description of the Serverless architecture, the focus of this article is how to use MaxCompute's Serverless computing resources to meet the needs of the data warehouse.
The following figure shows the logical model of MaxCompute computing resources and management and usage. For Project in MaxCompute, it actually corresponds to a logical isolation unit of data warehouse. We can create different Project according to different management objectives. For example, we can create test-oriented Project and development-oriented Project respectively. There are independent data and permission management systems between the two projects, which do not communicate with each other, thus achieving the isolation of management. Of course, only such isolation space is not enough, because our computing tasks need to bind computing resources. We can bind Project to the payment method, and set different billing methods for different Project according to the needs, so that different isolation spaces use different computing resources.
Under the above system, MaxCompute has some unique characteristics. First of all, there is a multi-tenant environment. When we activate MaxCompute, we can create multiple isolated data warehouse spaces according to different management requirements. For enterprises, we can purchase multiple sets of logical computing resources. This kind of multi-computing resources and multi-isolated environment can better meet the needs of different scenarios.
As shown in the following figure, the ideal Serverless resource model requires us to plan the use of resources in order to perfectly adapt to our actual needs (black line in the figure).
However, in fact, our customers have different resource requirements and many differentiated demand scenarios. The main scenarios are:
Stable periodic job scene
A scenario of high business growth and rapidly changing demand
Conventional demand is accompanied by sudden demand scenarios
Scenarios for testing / developing requirements.
From various scenarios, we can find that big data's demand for computing resources is not a complete pure Serverless demand based on demand, but different stages have different requirements, and different types of requirements have different requirements. The characteristics of its demand for computing resources are as follows:
(1) Business agility requirements
In the long-term growth period, the processing capacity can meet the needs of natural business growth, especially in the stage of rapid business change.
It can be the initial stage of an enterprise or the entrepreneurial business of the innovation department.
(2) the difference between periodic peaks and valleys is obvious.
The daily and monthly periodic peaks and valleys fluctuate greatly, and it is difficult to balance cost and SLA with peak capacity planning.
Conventional computing power + flexible computing power is required, and job resource policies are specified according to scheduling / artificial
(3) stable business, focusing on mission-critical output by SLA
For baseline operations, unlike SLA requirements for non-critical operations, baseline output time needs to be guaranteed.
Non-critical activities are handled at as low cost as possible without affecting critical activities.
(4) Resource governance: the demand of example changes from rapid change to stable and predictable.
Capacity planning, conversion and calculation of CU
Fine Workload management of fixed resources.
Generally speaking, the most important goal of our computing needs in reality is to achieve the goal of cost minimization on the premise of meeting the differentiated needs in reality.
Second, Serverless helps the business to be agile
So how does MaxCompute's Serverless meet the above scenarios and requirements? If it is a fast-growing and fast-changing enterprise, we recommend using MaxCompute's Serverless on-demand computing resources. From a management point of view, we can establish different Project to do some isolation division, such as establishing a set of development and test environment, a set of production environment. For some analysts, they often randomly need to do a lot of exploration on some detailed data, or do machine learning analysis, and there are often some sudden arithmetic requirements, and the scale of this arithmetic demand may be very large. at this time, these assignments are often isolated from other environments, because they are low-frequency, but they need to analyze huge amounts of data.
We can also divide by organization, for example, many enterprises have relatively large organizations, which can be divided by departments, so that each department has an isolated environment. As an independent organization, each department needs relatively independent data and computing resources. We can use this model of Serverless distribution according to demand. With this model, enterprises do not need to carry out capacity planning, in the initial stage, they can use pay-per-view method, through this huge resource pool to meet the resource needs of various departments and avoid the competition for resources.
Generally speaking, Serverless can meet the demand well under all kinds of job conditions by using Serverless: in the case of single job, regardless of scale, Serverless can well meet the needs of job resources of different scale; in the case of multiple concurrency, Serverless can also meet the needs of multiple jobs and avoid the competition for resources. In some cases where we want to be able to control the cost of activities, MaxCompute can also provide a way of cost estimation and cost control to block high-volume work. Through the above ways, MaxCompute+Serverless can greatly improve business agility and accelerate value realization.
In addition, some enterprises prefer to have a relatively stable resource pool combined with their daily management environment, because they have a certain ability of resource planning and resource management. in this case, we purchase resources of a fixed size, and then isolate the environment according to function or organization, and use the quota group management capability provided by MaxCompute to divide the resources into multiple resource groups. Meet the needs of different businesses and different organizations under the premise of financial predictability. The key technical features of this model are:
Load isolation, avoid competing with each other, and give priority to resources allocated to key projects and organizations
Job priority: ensure critical dynamic job link detection
Time-sharing scaling: set resource allocation strategies during the day and night to maximize the use of resources.
The third scenario is about the integration of cost and business agility. For example, the manager of a data platform may often face a variety of tasks: one is daily tasks, which are usually placed in a fixed size resource, and the cost is controllable and predictable; the other is some key tasks that we are willing to spend a certain price to speed up to meet business needs, for which we hope to get some extra computing power. There is also a kind of exploratory homework for data scientists, which we hope will not interfere with our production operations. At the same time, data scientists can use powerful computing power to quickly complete business assumptions and ideas. We can put this kind of work in an on-demand resource pool. There may also be innovative businesses in complex enterprises, and they need a new data development environment and application innovation environment. We can build an isolated environment for data warehouses and allocate resources as needed. Help them quickly verify business hypotheses.
On the product side, we mainly provide two kinds of capabilities to users:
Pay-by-usage Project: the initiated job uses Serverless resources, and you can switch resource groups bound to Project.
Users take the initiative to set: temporarily specify computing resources and resource routing at the job level as needed.
The above three scenarios are real scenarios in daily operation, and another scenario is that customers stabilize their business after using pay-per-view for a period of time and want to put the project on a fixed, prepaid resource pool. At this time, we will encounter a problem: how to assess the needs of resources? After all, there is no need to estimate the resource demand when you pay by quantity before. MaxCompute provides capacity planning to solve this problem. The principle is to use the metadata service (information schema) provided by MaxCompute to estimate the overall computing power requirements of the project based on historical computing power consumption. The key information is as follows:
Calculate the unit time of recent project job consumption on a daily basis based on information schema (unit of calculation: cu)
Based on the daily statistics of information schema, the most expensive day of recent project work is calculated, and the computing power requirement of each hour is calculated (in cu).
According to the above information, we can predict the computing requirements of the business according to certain rules and carry out capacity planning. For more information about this part, you can go to Aliyun Community to find the corresponding articles to understand.
The above mainly shares how to use Serverless services to better manage resources and meet the resource needs of different businesses at low cost. In general:
(1) the pay-per-view mode is suitable for the rapid development and change of business. With the cost control management means of MaxCompute, it can not only meet the computing needs of the business, but also effectively control the cost.
(2) for prepaid resources, we can split multiple computing resources through quota management, do corresponding load isolation and time-sharing management, and use dw+mc baseline job priority to ensure key jobs SLA.
(3) for the combination of prepaid fixed resources and flexible pay-by-quantity, we can choose different computing resources according to the job level: for sudden jobs, pay-by-quantity is used to supplement the demand for sudden computing power; for the peak demand in periodic jobs, it is also met by pay-by-quantity, so as to achieve the effective use of resources and reduce costs.
(4) We can use metadata to evaluate computing needs and plan capacity, so as to switch between pay-by-quantity and prepaid methods. We can also use metadata to analyze resource consumption and optimize resources. reduce the operation of high resource consumption, and do the corresponding resource management.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.