A brief discussion on the Architecture of Lambda and Kappa and the Computing Exploration of immutable data 04/24 Update SLTechnology News&Howtos

A brief discussion on the Architecture of Lambda and Kappa and the Computing Exploration of immutable data

2025-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Lambda architecture is also very simple, which is to design a robust, scalable, low-latency distributed computing system through the components of the distributed system. The reason why it is called Lambda architecture is that its core point is that it is immutable and independent in the process of data processing.

First, let's take a look at what the Lambda architecture is. Lambda calculus is a programming paradigm in a programming language that follows the following characteristics:

The immutability of ♦ data, any operation on the data has no side effects.

The non-dependency of ♦ data, that is, providing the same input to the function, then the function always returns the same result.

The function is First Class. Like other data types, the function is on an equal footing. It can be assigned to other variables, passed into another function as a parameter, or as a return value of another function.

Nathan Marz,Marz from Twitter thinks that the essential logic of big data framework for computing processing coincides with the idea of functional programming, so Marz puts forward the Lambda architecture according to his many years of experience in distributed data system development. (Marz is the author of Storm, the AFS top-level project, and Storm is an excellent distributed stream processing system.) so let's take a look at the Lambda architecture proposed by Marz:

As you can see from the figure above, the core of a typical Lambda architecture is divided into three levels: Batch Layer,Speed Layer and Serving Layer.

Batch Layer

Speed Layer

Serving Layer

Let's sort out how they assist in the division of labor: first, new data as the data source of the whole data system, Batch Layer as the batch processing level of the data, processes and processes the original data, and inputs the Batch View of the processed data into Serving Layer. (this corresponds to full data.)

Speed Layer processes the real-time added data and generates the Realtime Views of the calculation result of the incremental data. (here corresponds to incremental data)

The end-user query presents the final result through the combination of Batch View and Realtime View.

And with the passage of time, the calculation results of Batch View will gradually replace Realtime View, and the business layer can access the Batch View provided by Serving Layer with low delay, and can also feedback the business results in real time through Realtime View.

We can see that in the Lambda architecture, all data need to meet the requirements of immutability and non-dependency, when there are any data problems, (such as error, loss, etc.) only need to run the algorithm again to recover the required data.

Below, the author uses a business scenario to briefly explain the Lambda model. The following business scenario is only based on the author's understanding of e-commerce recommendation, and the corresponding e-commerce may not actually adopt the model described by the author:

1: the following picture shows the advertising page displayed on the home page of XBao:

For this recommendation data, it can be understood as the Batch View recommendation obtained after processing my personal history data through Batch Layer. (for example, running Spark Mllib or Hadoop Mahout to analyze and recommend the results of historical data, such algorithms are usually time-consuming and laborious, and can be stored in MySQL in advance, which can be called directly when subsequent users visit)

2: next, the author searched MacBook pro and ThinkPad x207 on XBao, and the data searched in real time can be processed as stream data through Speed Layer in real time. (stream processors such as Storm)

3: the author switched back to the home page of XBao and found that there was one more recommended advertising project: Dell 8-generation CPU professional graphics card, and send Aiqiyi card for half a year. Obviously, the recommended home page content of the x treasure network, which is composed of real-time streaming Realtime View and Batch View, gives a good feedback on users' real-time needs:

Lambda architecture combines the results of real-time processing and batch processing, gives a good feedback on query requirements, and achieves a balance between speed and reliability, so it is scalable enough. In the Lambda architecture, all queries can be located as a function:

The Lambda architecture subdivides data and computing systems:

However, this architecture also has some problems: it requires two different computing systems for operation and maintenance, and merges query results, which must have brought about an increase in complexity.

After the birth of the Lambda architecture, Jay Kreps, the technical director from Linkedln, raised some questions and put forward his own improved version of the Lambda architecture, which was named Kappa architecture.

The most troublesome problem with the Lambda architecture is that the new logic needs to be encoded twice, and the code is run and debugged on both systems, requiring an additional system with additional operators. So Kreps thinks it's very difficult for the Lambda architecture to try to build an abstraction layer on top of two different programming paradigms.

While the Kappa architecture tries to handle the above two kinds of logic through a stream processing system, let's take a look at how the Kappa architecture is designed:

Kappa architecture improves parallelism to achieve repetitive processing through the parallel mechanism of the stream processing system. But many people will think that streaming will not be able to achieve the high throughput of historical data. Here the solution given by Kreps is to repeat only the complete log data. If you need to process the data repeatedly for 30 days, you will retain it for 30 days using Kafka.

So here is to open up another stream to process the new data, and the output data is output directly to a new output table. When this second streaming is complete, switch to the new table for reading, then stop the old streaming, and then delete the old output table.

Similarly, the example given by the author above can also realize the advertising display of shopping through the Kappa framework. The core of Kappa architecture is to solve the problems that need to be solved together through a paradigm. At the same time, there is no need to introduce additional computing system for operation and maintenance.

So far, the author has also roughly talked about the architecture of two different distributed computing systems. The author believes that Lambda architecture is an excellent architecture to solve distributed computing, but it needs to deal with different big data systems of operation and maintenance, and additional coding logic, which is a great test for developers and operation and maintenance personnel. The Kappa architecture simplifies this model, but for data processing, it is difficult to do a complete data calculation with heavy batch processing, so the accuracy of the calculation results is limited. (that is, it is picky about business scenarios. I don't think any architecture is a silver bullet to solve the problem. The trade-off between them requires a complete evaluation by our developers.)

However, Spark can solve the problems of batch computing and flow computing at the same time through a computing framework, which is worthy of the attention of developers and operators.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.