How to solve the pain Point of data acquisition and Analysis by Serverless 04/26 Update SLTechnology News&Howtos

How to solve the pain Point of data acquisition and Analysis by Serverless

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

What this article shares with you is about how Serverless solves the pain points of data collection and analysis. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

Introduce: as we all know, the game industry is an evergreen tree in today's Internet industry. In 2019 before the outbreak, revenue in China's game market was about 288.48 billion yuan, up 17.1% from a year earlier. Because of the epidemic in 2020, the game industry is growing by leaps and bounds. Playing games is already one of the most common forms of entertainment for Chinese netizens, especially during the epidemic. According to incomplete statistics, as of 2019, China Mobile had about 660 million game users, accounting for 77.92% of China's total Internet users. It can be seen that games, as a low-threshold and low-cost means of entertainment, have become a habitual part of most people's lives.

As we all know, the game industry is an evergreen tree in today's Internet industry. In 2019 before the outbreak, revenue in China's game market was about 288.48 billion yuan, up 17.1% from a year earlier. Because of the epidemic in 2020, the game industry is growing by leaps and bounds. Playing games is already one of the most common forms of entertainment for Chinese netizens, especially during the epidemic. According to incomplete statistics, as of 2019, China Mobile had about 660 million game users, accounting for 77.92% of China's total Internet users. It can be seen that games, as a low-threshold and low-cost means of entertainment, have become a habitual part of most people's lives.

For players, there are a large number of games on the market, so how can players discover and recognize a game and continue to play it is probably a question that all game manufacturers need to think about. Coupled with the suspension of the game version number in 2018, game manufacturers cherish every game product that has obtained the version number even more. so this also makes "deep polishing product quality" and "improving operational fineness" the development direction of the game industry has become the development ideas of the majority of game manufacturers, whether new games or old games are trying to implement these two points:

New games: for players need to provide more adequate promotion resources and more complete game content.

Old games: through user behavior analysis, invest more energy and cost to produce higher quality version content.

Here we focus on the new game. A game company has been working hard for three years, waiting for the launch of a new game to soar. So the question is, how can the new game be seen by the majority of players?

First of all, let's take a look at the classification of companies in the game industry:

Game developer: a company that develops games, produces and produces game content. For example, all Arena of Valor's hero designs, game battle scenes and battle logic are all provided by the game research and development company.

Game publisher: the main work of game publisher is divided into three parts: marketing work, operation work, and customer service work. Game publishers control the lifeblood of the game, the core of market work is to introduce players, and the core of operation work is to maximize user value and earn more profits.

Game platform / channel merchant: the core purpose of the game platform and channel merchant is to expose the game so that as many people as possible can find your game.

These three types of business, there are independent companies that specialize in one of these areas, and there are companies that can undertake all the business, but either way, the relationship between the three will not change:

So it's not difficult to understand that if you want more players to see your game, game release and operation is the key. Generally speaking, if your game appears in all the well-known platform ads, then at least the number of new users signing up for the game is considerable. So this introduces a key word: buying volume.

According to the data, the average number of mobile games purchased per month reached 6000 + in 2019, compared with 4200 in 2018. On the other hand, as the resources of super APP such as Douyin and Weibo tilt in the game buying market, which also helps to improve the effectiveness and efficiency of mobile buying, game manufacturers are more willing to use buying to attract users.

However, it should be noted that with the continuous improvement of the accuracy of game purchase, the cost of purchase is also rising. Only by reasonably configuring the relationship between purchase, channel and integrated marketing, can we maximize the effect of publicity resources.

In popular terms, buying volume is actually placing advertisements on major mainstream platforms. after the majority of users see the game advertisements, they may click on the advertisements, then enter the publicity pages of game manufacturers, and collect some information from users at the same time. Then game manufacturers conduct big data analysis of the collected user information for further targeted promotion.

Core demands of game operation

Game manufacturers spend money to buy quantity, in exchange for user information and new user registration information is for the continuous operation of the game, then the core demand of this scenario is to collect the integrity of user information.

For example, if a game manufacturer spends 5000w a day on advertising and generates an ad click rate of 1w per second in a certain period of time on a certain platform, then the user information of each click on the ad should be collected completely during this period, and then stored for follow-up analysis. This puts forward high requirements for the data acquisition system.

Among them, the core point is that the system exposure interface should be able to smoothly carry irregular flow pulses during the purchase period. During the buying period, game manufacturers usually place advertisements on multiple platforms, and the time of advertising on each platform is different, so there will be traffic pulses throughout the day. If there is a problem with this link, the money equivalent to the amount of purchase will be wasted.

Traditional architecture of data acquisition system

The above picture is a relatively traditional data acquisition system architecture, the most important thing is to expose the HTTP interface return data part, if something goes wrong in this part, then the link to collect data will be cut off. But this part often faces two challenges:

When the traffic pulse comes, whether this part can be rapidly expanded to cope with the traffic impact.

Game operation has tidal characteristics, which is not carried out every day, so we need to consider how to optimize the utilization of resources.

Usually, before there are operational activities in the game, the operation and maintenance students will be notified in advance to add nodes to the service of this link, but it is impossible to predict the amount of increase, and only about one number can be taken. This is a common scenario under traditional architecture, which leads to two problems:

The traffic is too large, and the nodes are added less, so that part of the data of the traffic is not collected.

The traffic is not as large as expected, and there are more nodes, resulting in a waste of resources.

Serverless architecture of data acquisition system

We can use the function calculation FC to replace the part of the traditional architecture that exposes HTTP return data, so as to solve the problems in the traditional architecture perfectly.

Both of the two problems in the traditional architecture can be solved by calculating the elasticity of 100 milliseconds. We do not need to estimate how much traffic marketing activities will bring, nor do we need to worry about and consider the performance of the data acquisition system, and operation and maintenance students do not need to prepare ECS in advance.

Because of the extreme elasticity of function calculation, when there is no purchase or marketing activity, the running example of function calculation is zero. When there is a buying activity, in the case of traffic pulse, the function calculation will quickly pull up the instance to carry the traffic pressure; when the traffic decreases, the function calculation will release the unrequested instances in time to reduce the capacity. So the advantages of Serverless architecture are as follows:

Without the intervention of operation and maintenance, R & D students can build it quickly.

No matter how big or small the flow is, it can be accepted smoothly.

The number of instances pulled up by the function calculation can be close to the curve of the traffic size to optimize the resource utilization. Coupled with the postpaid mode, the cost can be optimized to the maximum extent.

Architecture analysis

As can be seen from the architecture diagram above, the whole data collection phase is divided into two functions. The first function simply exposes the HTTP interface to receive data, the second function is used to process the data, and then sends the data to the message queue Kafka and database RDS.

1. Receive data function

We open the function calculation console and create a function:

Function type: HTTP (that is, trigger is HTTP)

Function name: receiveData

Operating environment: Python3

Function instance type: elastic instance

Function execution memory: 512MB

Function run timeout: 60 seconds

Concurrency of single instance of function: 1

Trigger type: HTTP trigger

Trigger name: defaultTrigger

Authentication method: anonymous (i.e. no authentication is required)

Request method: GET,POST

After creating the function, we write the code through the online editor:

#-*-coding: utf-8-*-import loggingimport jsonimport urllib.parseHELLO_WORLD = b'Hello world!\ n'def handler (environ, start_response): logger = logging.getLogger () context = environ ['fc.context'] request_uri = environ [' fc.request_uri'] for k V in environ.items (): if k.startswith ('HTTP_'): # process custom request headers pass try: request_body_size = int (environ.get (' CONTENT_LENGTH') 0) except (ValueError): request_body_size = 0 # received and returned data request_body = environ ['wsgi.input'] .read (request_body_size) request_body_str = urllib.parse.unquote (request_body.decode ("GBK")) request_body_obj = json.loads (request_body_str) logger.info (request_body_obj ["action") "]) logger.info (request_body_obj [" articleAuthorId "]) status = '200 OK' response_headers = [(' Content-type') 'text/plain')] start_response (status, response_headers) return [HELLO_WORLD]

The code at this time is very simple: it receives the parameters sent by the user, and we can call the API for verification:

You can see the log of this call in the log query of the function:

At the same time, we can also check the link trace of the function to analyze the call time of each step. For example, when the function receives a request for → cold start (when there is no active instance), → prepares the code → executes the initialization method → executes the entry function logic:

As you can see from the call link diagram, the request just now included the cold start time, because there was no active instance at that time, the whole process took 418 milliseconds, and the actual execution time of the entry function code was 8 milliseconds.

When the interface is called again, you can see that the logic of the entry function is executed directly, because there is already an instance running, and the whole time is only 2.3 milliseconds.

two。 A function that processes data.

The first function is created on the interface in the function calculation console. If the running environment is Python3, we can check which modules are built in the preset Python3 runtime environment in the official documents. Because the second function operates on Kafka and RDS, we need to confirm the corresponding modules.

As you can see from the documentation, the built-in module contains the SDK module for RDS, but there is no SDK module for Kafka, so we need to install the Kafka SDK module manually and create the function in another way.

1) Funcraft

Funcraft is a command-line tool to support the deployment of Serverless applications, which can help us to easily manage functional computing, API gateways, log services and other resources. It assists us in developing, building, and deploying operations through a resource configuration file (template.yml).

So for the second function, we need to use Fun to operate, which is divided into four steps:

Install the Fun tool.

Write a template.yml template file to describe the function.

Install the third-party dependencies we need.

Upload the deployment function.

2) install Fun

Fun provides three ways to install:

Manage the installation through the npm package-suitable for developers on all platforms (Windows/Mac/Linux) who have npm pre-installed.

By downloading binary installation-suitable for all platforms (Windows/Mac/Linux).

Install through the Homebrew package manager-suitable for the Mac platform, more in line with the habits of MacOS developers.

The text sample environment is Mac, so use npm to install it. It is very simple, and one-line command can be done:

Sudo npm install @ alicloud/fun-g

After the installation is complete. Enter the fun command in the control terminal to view the version information:

$fun-version3.6.20

Before using fun for the first time, you need to execute the fun config command to configure. Follow the prompts to configure Account ID, Access Key Id, Secret Access Key, and Default Region Name in turn. Account ID and Access Key Id can be obtained from the top right of the home page of the function calculation console:

Fun config

? Aliyun Account ID * 01

? Aliyun Access Key ID * qef6j

? Aliyun Access Key Secret * UFJG

? Default region name cn-hangzhou

? The timeout in seconds for each SDK client invoking 60

? The maximum number of retries for each SDK client 3

3) write template.yml

Create a new directory and create a YAML file named template.yml in this directory, which mainly describes the configuration of the function to be created. To put it bluntly, the configuration information configured on the function calculation console is written in the file in YAML format:

ROSTemplateFormatVersion: '2015-09-01'Transform:' Aliyun::Serverless-2018-04-03'Resources:FCBigDataDemo:Type: 'Aliyun::Serverless::Service'Properties:Description:' local invoke demo'VpcConfig:VpcId: 'vpc-xxxxxxxxxxx'VSwitchIds: [' vsw-xxxxxxxxxx'] SecurityGroupId: 'sg-xxxxxxxxx'LogConfig:Project: fcdemoLogstore: fc_demo_storedataToKafka:Type:' Aliyun::Serverless::Function'Properties:Initializer: index.my_initializerHandler: index.handlerCodeUri:'. / 'Description:' 'Runtime: python3

Let's parse the core content of the above file:

FCBigDataDemo: custom service name. Indicate that it is a service, that is, Aliyun::Serverless::Service, by the following Type attribute.

The properties under Properties:Properties are all configuration items for the service.

VpcConfig: VPC configuration of the service, including: VpcId:VPC ID. VSwitchIds: switch ID, here is the array, you can configure multiple switches. SecurityGroupId: security group ID.

LogConfig: log service (SLS) configuration bound to the service, including: Project: log service project. Logstore:LogStore name.

DataToKafka: the name of the function customized under this service. It is marked as a function, that is, Aliyun::Serverless::Function, by the following Type attribute.

The properties under Properties:Properties are all configuration items of this function.

Initializer: configure the initialization function.

Handler: configure the entry function.

Runtime: function runtime environment.

4) install third-party dependencies

After the templates for the services and functions are created, let's install the third-party dependencies that we need to use. In this example scenario, the second function needs to use Kafka SDK, so it can be installed through the fun tool in conjunction with the Python package management tool pip:

Fun install-runtime python3-package-type pip kafka-python

There is a prompt message after executing the command

At this point, we will find that a .fun folder is generated under the directory, where the dependent packages we installed are located:

5) deploy the function

Now that we have written the template file and installed the Kafka SDK we need, we need to add our code file, index.py, as follows:

#-*-coding: utf-8-*-import loggingimport jsonimport urllib.parsefrom kafka import KafkaProducerproducer = Nonedef my_initializer (context): logger = logging.getLogger () logger.info ("init kafka producer") global producer producer = KafkaProducer (bootstrap_servers='XX.XX.XX.XX:9092,XX.XX.XX.XX:9092,XX.XX.XX.XX:9092') def handler (event Context): logger = logging.getLogger () # data received and returned event_str = json.loads (event) event_obj = json.loads (event_str) logger.info (event_obj ["action"]) logger.info (event_obj ["articleAuthorId"]) # send message global producer producer.send ('ikf-demo') to Kafka Json.dumps (event_str) .encode ('utf-8') producer.close () return' hello world'

The code is simple, and here is a simple parse:

My_initializer: when the function instance is pulled up, the function will be executed first, and then the handler function will be executed. When the function instance is running, subsequent requests will not execute the my_initializer function. It is generally used for the initialization of various connections. Here, the method of initializing Kafka Producer is put here to avoid initializing Produer repeatedly.

Handler: this function has only two logic, receiving the returned data and sending the data to the specified Topic of the Kafka.

The function is deployed through the fun deploy command, which does two things: create services and functions based on the configuration in template.yml. Upload index.py and .fun to the function.

If you enter the function, you can also clearly see the directory structure of the third-party dependency package.

3. Call between functions

Now that both functions have been created, the following work is for the first function to receive the data and pull up the second function to send a message to Kafka. We only need to make some changes to the first function:

#-*-coding: utf-8-*-import loggingimport jsonimport urllib.parseimport fc2HELLO_WORLD = b'Hello world!\ n'client = Nonedef my_initializer (context): logger = logging.getLogger () logger.info ("init fc client") global client client = fc2.Client (endpoint= "http://your_account_id.cn-hangzhou-internal.fc.aliyuncs.com", accessKeyID=" your_ak " AccessKeySecret= "your_sk") def handler (environ, start_response): logger = logging.getLogger () context = environ ['fc.context'] request_uri = environ [' fc.request_uri'] for k, v in environ.items (): if k.startswith ('HTTP_'): # process custom request headers pass try: request_body_size = int (environ.get (' CONTENT_LENGTH') 0) except (ValueError): request_body_size = 0 # received and returned data request_body = environ ['wsgi.input'] .read (request_body_size) request_body_str = urllib.parse.unquote (request_body.decode ("GBK")) request_body_obj = json.loads (request_body_str) logger.info (request_body_obj ["action") "]) logger.info (request_body_obj [" articleAuthorId "]) global client client.invoke_function ('FCBigDataDemo' 'dataToKafka', payload=json.dumps (request_body_str), headers = {' xmurffcMurinvocationMethods: 'Async'}) status =' 200 OK' response_headers = [('Content-type',' text/plain')] start_response (status, response_headers) return [HELLO_WORLD]

As shown in the code above, three changes have been made to the code of the first function:

Import the library of function calculation: import fc2

Add an initialization method to create a function to calculate Client:

Def my_initializer (context): logger = logging.getLogger () logger.info ("init fc client") global client client = fc2.Client (endpoint= "http://your_account_id.cn-hangzhou-internal.fc.aliyuncs.com", accessKeyID=" your_ak ", accessKeySecret=" your_sk ")

It is important to note that when we add the initialization method to the code, we need to specify the entry of the initialization method in the function configuration.

Client calls the second function through function calculation

Global client client.invoke_function ('FCBigDataDemo',' dataToKafka', payload=json.dumps (request_body_str), headers = {'x house FCC invocation house type rooms: 'Async'})

The invoke_function function takes four parameters:

The first parameter: the name of the service where the function is called.

The second parameter: the function name of the calling function.

The third parameter: the data passed to the calling function.

The fourth parameter: call the second function Request Header information. Here, it is mainly through the Key of x-fc-invocation-type to set whether the call is synchronous or asynchronous. Here, Async is set to call asynchronously.

With this setting, we can verify the process of initiating a request for → to collect data through the HTTP API provided by the first function, calling the second function, →, to send the data to Kafka as a message.

The purpose of using two functions

Some students here may wonder, why do you need two functions instead of sending data directly to Kafka in the first function?

When we call a function asynchronously, by default, the requested data is put into the message queue for the first peak cutting and valley filling, and then each queue is in the corresponding function instance. Through the elasticity of the function instance, multiple instances are pulled up for the second peak cutting and valley filling. So this is one of the core reasons why this architecture can stably hold large concurrent requests.

4. Configure Kafka

In the game operation scenario, the amount of data is relatively large, so the performance requirements of Kafka are relatively high. Compared with open source self-building, using Kafka on cloud saves a lot of OPS operations, such as:

We no longer need to maintain the individual nodes of the Kafka cluster.

There is no need to care about the data synchronization of master-slave nodes.

It can quickly and dynamically expand Kafka cluster specifications, dynamically increase Topic, and dynamically increase the number of partitions.

Perfect index monitoring function, message query function.

In general, all SLA has a background on the cloud, we only need to focus on message sending and message consumption.

Therefore, we can open the Kafka activation interface, activate the Kafka instance with one click according to the needs of the actual scenario, log in to the console after activating Kafka, and you can see the access point of Kafka in the basic information:

Default access point: access point for VPC intranet scenarios.

SSL access point: access point for public network scenarios.

Configure the default access point to the second function calculated by the function.

.. producer = KafkaProducer (bootstrap_servers='XX.XX.XX.XX:9092,XX.XX.XX.XX:9092,XX.XX.XX.XX:9092').

Then click Topic Management on the left console to create a Topic

Just configure the created Topic into the second function that the function evaluates.

... # the first parameter is the Topic name producer.send ('ikf-demo', json.dumps (event_str) .encode (' utf-8')).

The advantages of Kafka on cloud have been listed above, such as dynamically increasing the number of partitions of Topic. We can dynamically adjust the number of partitions of Topic in the Topic list.

A single Topic supports a maximum of 360 partitions, which cannot be done by self-built open source.

Next, click Consumer Group Management on the left side of the console to create a Consumer Group.

At this point, even if the Kafka on the cloud is configured, that is, Producer can send messages to the newly created Topic. Consumer can set the newly created GID and subscribe to Topic for message acceptance and consumption.

Flink Kafka consumers

In this scenario, Kafka is often followed by Flink, so here's a brief introduction to how to create Kafka Consumer and consume data in Flink. The code snippet is as follows:

Final ParameterTool parameterTool = ParameterTool.fromArgs (args); String kafkaTopic = parameterTool.get ("kafka-topic", "ikf-demo"); String brokers = parameterTool.get ("brokers", "XX.XX.XX.XX:9092,XX.XX.XX.XX:9092,XX.XX.XX.XX:9092"); Properties kafkaProps = new Properties (); kafkaProps.put (ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers); kafkaProps.put (ConsumerConfig.GROUP_ID_CONFIG, "ikf-demo") FlinkKafkaConsumer kafka = new FlinkKafkaConsumer (kafkaTopic, new UserBehaviorEventSchema (), kafkaProps); kafka.setStartFromLatest (); kafka.setCommitOffsetsOnCheckpoints (false); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment (); DataStreamSource dataStreamByEventTime = env.addSource (kafka)

The above is the code snippet for building Flink Kafka Consumer and adding Kafka Source, which is still very simple.

Pressure test verification

At this point, the entire data acquisition architecture has been built, and let's test the performance of the entire architecture through pressure testing. Ariyun PTS is used here for stress testing.

Create a stress test scene

Open the PTS console and click the left menu to create a stress test / create a PTS scene

In the scenario configuration, the HTTP interface exposed by the first function calculation function is used as a series link.

After the interface is configured, let's configure the pressure.

Stress mode: concurrent mode: specifies how many concurrent users send requests at the same time. RPS mode: specifies the number of requests per second.

Incremental mode: in the process of pressure testing, you can adjust the pressure manually, or you can automatically increase the pressure by percentage.

Maximum concurrency: how many virtual users initiate requests at the same time.

Incremental percentage: if it is automatically incremented, increase by the percentage here.

Duration of a single order of magnitude: the length of time that the pressure of each gradient is maintained when the full pressure is not fully reached.

Total duration of pressure test: the total amount of time that needs to be tested.

Here, because of the resource cost, the number of concurrent users is set to 2500 for verification.

Judging from the situation in the pressure test in the figure above, the TPS reached the 2w ceiling, and 99.99% of the 549w + requests were successful. The 369 exceptions can also be viewed by clicking, all of which are caused by the timeout of the request from the stress test tool.

So far, the whole architecture of big data acquisition and transmission based on Serverless has been built, and has been verified by pressure test, the overall performance is also good, and the whole architecture is also very simple and easy to understand. This architecture is not only applicable to the game operation industry, but also applicable to any scenario that big data collects and transmits. At present, many customers are running in the production environment based on the Serverless architecture, or are on the way to transform the Serverless architecture.

The above is how Serverless solves the pain points of data collection and analysis, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.