In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
What this article shares with you is about the practice of Serverless in large-scale data processing. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.
Preface
When you first come into contact with Serverless, there is a less obvious new use: compared to the traditional server-based approach, the Serverless services platform enables your applications to scale quickly and horizontally, making parallel processing more efficient. This is mainly because Serverless does not have to pay for idle resources and does not have to worry about not having enough resources reserved. In the traditional usage paradigm, users must reserve hundreds of servers to do some highly parallel but short tasks, and must pay for each server, even if some servers are no longer working.
Taking Ariyun's Serverless product-function calculation as an example, you can perfectly solve all your above concerns:
If your task itself is not very computational, but there are a large number of concurrent task requests that need to be processed in parallel, such as multimedia file processing, document conversion, etc.
A task itself has a large amount of computation, which requires a single task to be processed quickly, and it can also support parallel processing of multiple tasks.
In this scenario, the only concern of the user is that your task can be divided and disassembled and the subtasks can be processed in parallel, and a long task that takes an hour to complete can be divided into 360 independent 10-second subtasks to be processed in parallel. In this way, tasks that previously took you an hour to complete can now be completed in 10 seconds. Because the postpaid model is adopted, the amount of calculation and cost are roughly the same, while the traditional model is bound to be wasted because of the reserved resources, and the wasted expenses also need to be borne by you.
Next, the practice of Serverless in large-scale data processing will be described in detail.
Extreme elastic expansion and reduction of capacity to cope with computational fluctuations
Before introducing the relevant examples of large-scale data processing, here is a brief introduction to function computation.
1. A brief introduction to function calculation
Developers use programming languages to write applications and services. For development languages supported by function computing, please see the list of development languages.
Developers upload applications to function computing
Trigger function execution: trigger methods include OSS, API gateway, log service, table storage, function calculation API, SDK, etc.
Dynamic capacity expansion to respond to requests: function calculation can automatically expand capacity according to user requests, and the process is transparent to you and your users.
Charge by quantity according to the actual execution time of the function: after the function is executed, you can view the execution fee through the bill, and the charging granularity is accurate to 100 milliseconds.
Details: official website of function calculation
At this point, you can simply understand how function computing works, and then illustrate it with a large number of cases of parallel video transcoding: suppose a company related to home education or entertainment, the teacher's lecture videos or new video sources are usually generated centrally, and you want these videos to be quickly transcoded so that customers can see the video playback quickly. For example, in the current epidemic, there is a surge in courses generated by online education, while class attendance usually peaks at 10:00, 12:00, 16:00, 18:00 and so on. it is a general and common need to process all newly uploaded videos within a specific time (such as half an hour).
two。 Flexible and highly available audio and video processing system
OSS trigger
As shown in the figure above, users upload a video to the OSS,OSS trigger to automatically trigger the function execution, automatically expand the capacity of the function calculation, call the function logic in the execution environment to FFmpeg for video transcoding, and save the transcoded video back to OSS.
Message trigger
As shown in the figure above, the application only needs to send a message, automatically trigger the function to perform the task of audio and video processing, automatically expand the capacity of the function calculation, call the function logic in the execution environment to FFmpeg for video transcoding, and save the transcoded video back to OSS.
Directly manually call SDK to perform audio and video processing tasks
Take python as an example, roughly as follows:
Python #-*-coding: utf-8-*-import fc2 import json client = fc2.Client (endpoint= "http://123456.cn-hangzhou.fc.aliyuncs.com",accessKeyID="xxxxxxxx",accessKeySecret="yyyyyy") # synchronous / asynchronous calls resp = client.invoke_function (" FcOssFFmpeg "," transcode ", payload=json.dumps ({" bucket_name ":" test-bucket ") "object_key": "video/inputs/a.flv", "output_dir": "video/output/a_out.mp4"}) .data print (resp)
We can also see from the above that there are many ways to trigger the execution of functions. At the same time, simply configuring SLS logs can quickly achieve an audio and video processing system with high flexibility and pay by quantity, while providing dashboard with super features such as operation and maintenance-free, specific business data visualization, powerful custom monitoring and alarm.
At present, audio and video cases that have landed include UC, Finch, Luping Design Home, Tiger, and several head customers of online education. Some of these customers flexibly use more than 10,000 cores of CPU computing resources during peak periods, and the video of parallel processing reaches 1700 bytes, while providing a very high performance-to-price ratio.
For details, please refer to:
Simple-video-processing)
Fc-oss-ffmpeg
Task divide and conquer, parallel acceleration
It's interesting to apply this idea of dividing and conquering tasks to function calculation. To give an example, for example, you have a super-large 20g 1080p HD video that needs to be transcoded. Even if you use a high-equipped machine, the time required may still be measured by hours. If something goes wrong, you can only start the transcoding process again. If you use divide-and-conquer idea + function calculation, you can only start the transcoding process again. The process of transcoding evolves into shards-> parallel transcoding shards-> merge shards, so that you can solve the above two pain points:
Shards and synthetic shards are copies at the memory level, which requires a very small amount of computation, and the transcoding that really consumes the amount of computation is split into many sub-tasks for parallel processing. In this model, the maximum time of shard transcoding is basically equal to the transcoding time of the whole large video.
Even if an exception occurs in the transcoding of a part, you only need to retry the transcoding of this part, and you do not need to overturn the whole big task.
By decomposing large tasks reasonably, using function calculation and writing a little code, we can quickly complete a large-scale data processing system with high flexibility and high availability, parallel acceleration and pay-by-quantity.
Before introducing this solution, let's briefly introduce the Serverless workflow, which organizes functions and other cloud services and self-built services in an organized way.
1. Introduction to Serverless Workflow
Serverless Workflow (Serverless Workflow) is a fully managed cloud service used to coordinate the execution of multiple distributed tasks. In Serverless workflow, you can orchestrate distributed tasks in order, branch, parallel, and so on. Serverless workflow will reliably coordinate task execution according to the set steps, track the state transition of each task, and execute user-defined retry logic if necessary to ensure the smooth completion of the workflow. Serverless workflow simplifies the tedious tasks such as task coordination, state management and error handling needed to develop and run business processes, and allows you to focus on business logic development.
Details: Serverless Workflow official website
Next, a case of fast transcoding of a large video is used to illustrate the Serverless work scheduling function, which realizes the decomposition of large computing tasks and parallel processing of sub-tasks, and finally achieves the goal of quickly completing a single large task.
two。 Fast multi-object format transcoding of large video
As shown in the figure above, suppose the user uploads a video in mov format to the OSS,OSS trigger to automatically trigger function execution, the function calls FnF to execute, and FnF transcodes in one or more formats at the same time (controlled by the DST_FORMATS parameter in template.yml). Suppose you are configured to transcode in both mp4 and flv formats.
A video file can be transcoded to various formats and other custom processes at the same time, such as adding watermarking or updating information to the database in after-process.
When multiple files are uploaded to OSS at the same time, the function calculation automatically scales and processes multiple files in parallel, and each file is transcoded into multiple formats in parallel.
Combined with NAS + video slicing, we can solve the transcoding of super-large video. For each video, we first slice, then transcode in parallel, and finally synthesize. By setting a reasonable slicing time, the transcoding speed of large video can be greatly accelerated.
Fnf can track the execution of each step, and can customize the retry of each step to improve the robustness of the task system, such as: retry-example
For more information, please see: fc-fnf-video-processing
In the specific case of task division and parallel acceleration, what is shared above is CPU-intensive task decomposition, but it can also be decomposed into IO-intensive tasks, such as this requirement: a 20-gigabyte file in the OSS bucket of region in Shanghai is transferred back to the OSS Bucket in Hangzhou in seconds. The idea of divide and conquer can also be adopted here. After receiving the rollover task, the Master function allocates the range of the super-large file to each Worker sub-function, and the Worker sub-function restores its own part of the shard in parallel. After all the sub-Worker has finished running, the Master function submits a request for merging sharding to complete the entire rollover task.
For more information, please refer to: using functions to calculate multiple instances concurrently to save super-large files in seconds.
This article discusses that the Serverless service platform can enable your application to scale quickly and horizontally and make parallel processing more effective, and gives specific practical examples. No matter in CPU-intensive or IO-intensive scenarios, function computing + Serverless can perfectly solve your concerns as follows:
You don't have to pay for idle resources
Don't worry about not having enough computing resources reserved.
Tasks with a large amount of computation need to be processed quickly.
Better task flow tracking
Perfect monitoring and alarm, free operation and maintenance, business data visualization, etc.
... .
This article is only an example of Serverless audio and video processing, which shows the ability and unique advantages of function computing and Serverless workflow in offline computing scenarios. We can use divergent methods to expand the boundaries of Serverless in large-scale data processing practice, such as AI, genetic computing, scientific simulation and so on.
These are the practices of Serverless in large-scale data processing, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.