How to deal with data flow from RxJS to Flink 07/03 Update SLTechnology News&Howtos

How to deal with data flow from RxJS to Flink

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

How to deal with data flow from RxJS to Flink, this article introduces the corresponding analysis and answer in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.

What is a front-end developer developing?

In the process of front-end development, you may have thought about such a question: what on earth is front-end development? In my opinion, the essence of front-end development is to enable web page views to respond correctly to relevant events. There are three keywords in this sentence: "Web page view", "respond correctly" and "related events".

"related events" may include page clicks, mouse slips, timers, server requests, etc., "correct response" means that we have to modify some state based on related events, and "web view" is the most familiar part of our front-end development.

From this point of view, we can give the formula of view = response function (event):

View = reactionFn (Event)

In front-end development, events that need to be handled can be classified into the following three categories:

The user performs page actions, such as click, mousemove, and so on.

The remote server interacts with local data, such as fetch, websocket.

Local asynchronous events, such as setTimeout, setInterval async_event.

In this way, our formula can be further deduced as:

View = reactionFn (UserEvent | Timer | Remote API) logic processing in the second application

In order to better understand the relationship between this formula and front-end development, we take a news website as an example, which has the following three requirements:

Click Refresh: click Button to refresh the data.

Check refresh: automatically refresh when Checkbox is checked, otherwise stop automatic refresh.

Drop-down refresh: refreshes data when the user pulls down from the top of the screen.

From the front-end point of view, these three requirements correspond to:

Click Refresh: click-> fetch

Check refresh: change-> (setInterval + clearInterval)-> fetch

Drop-down refresh: (touchstart + touchmove + touchend)-> fetch news_app

1 MVVM

In MVVM mode, the response function (reactionFn) corresponding to the above is executed between Model and ViewModel or View and ViewModel, while events (Event) are processed between View and ViewModel.

MVVM can well abstract the view layer and the data layer, but the response function (reactionFn) will be scattered in different conversion processes, which will make it difficult to accurately track the data assignment and collection process. In addition, because the handling of Event is closely related to the view part of the model, it is difficult to reuse the logic of event handling between View and ViewModel.

2 Redux

In the simplest model of Redux, the combination of several events (Event) corresponds to an Action, and the reducer function can be directly thought of as corresponding to the response function (reactionFn) mentioned above.

But in Redux:

State can only be used to describe intermediate states, not intermediate processes.

The relationship between Action and Event is not one-to-one correspondence, which makes it difficult for State to track the actual source of change.

3 responsive programming and RxJS

This is how responsive programming is defined in Wikipedia:

In computing, responsive programming or reactive programming (English: Reactive programming) is a declarative programming paradigm for data flow and change propagation. This means that static or dynamic data streams can be easily expressed in programming languages, and the relevant computing models automatically propagate the changed values through the data streams.

Reconsider the process of using the application in terms of the data flow dimension:

Click button-> trigger refresh event-> send request-> Update View

Check automatic refresh

Finger touch the screen

Automatic refresh interval-> trigger refresh event-> send request-> update view

Fingers slide on the screen

Automatic refresh interval-> trigger refresh event-> send request-> update view

Fingers stop sliding on the screen-> trigger drop-down refresh event-> send request-> update view

Automatic refresh interval-> trigger refresh event-> send request-> update view

Turn off automatic refresh

Represented by a Marbles diagram:

Split the logic above and you get three steps to develop the current news application using responsive programming:

Define the source data stream

Combine / convert data streams

Consume data streams and update views

Let's describe it in detail separately.

Define the source data stream

Using RxJS, we can easily define all kinds of Event data streams.

1) Click action

Involves click data streams.

Click$ = fromEvent (document.querySelector ('button'),' click')

2) check the operation

Involves change data streams.

Change$ = fromEvent (document.querySelector ('input'),' change')

3) drop-down operation

It involves three data streams: touchstart, touchmove and touchend.

Touchstart$ = fromEvent (document, 'touchstart'); touchend$ = fromEvent (document,' touchend'); touchmove$ = fromEvent (document, 'touchmove')

4) regular refresh

Interval$ = interval (5000)

5) Server request

Fetch$ = fromFetch ('https://randomapi.azurewebsites.net/api/users');

Combine / convert data streams

1) Click to refresh the event stream

When clicking refresh, we want multiple clicks in a short period of time to trigger only the last one, which can be achieved through RxJS's debounceTime operator.

ClickRefresh$ = this.click$.pipe (debounceTime)

2) automatically refresh the stream

The switchMap using RxJS works with the previously defined interval$ data flow.

AutoRefresh$ = change$.pipe (switchMap (enabled = >) enabled? Interval$: EMPTY)

3) drop-down refresh stream

Combine the previously defined touchstart$touchmove$ and touchend$ data streams.

PullRefresh$ = touchstart$.pipe (switchMap (touchStartEvent = > touchmove$.pipe (map (touchMoveEvent = > touchMoveEvent.touches [0] .pageY), takeUntil (touchend$)), filter (position = > position > = 300), take (1), repeat ()

Finally, we merge the defined clickRefresh$autoRefresh$ with pullRefresh$ through the merge function to get the refresh data flow.

Refresh$ = merge (clickRefresh$, autoRefresh$, pullRefresh$))

Consume data streams and update views

Flattening the refresh data stream directly through switchMap to the defined fetch$, in the first step, we get the view data flow.

You can map a view flow directly to a view by directly async pipe in the Angular framework:

In other frameworks, the real data in the data stream can be obtained through subscribe, and then the view can be updated.

So far, we have completed the current news application using responsive programming, and the sample code [1] is developed by Angular with no more than 160lines.

Let's summarize the relationship between the three processes experienced in developing a front-end application using the idea of responsive programming and the formula in section 1:

View = reactionFn (UserEvent | Timer | Remote API)

1) describe the source data stream

Corresponding to event UserEvent | Timer | Remote API. The corresponding functions in RxJS are:

UserEvent: fromEvent

Timer: interval, timer

Remote API: fromFetch, webSocket

2) Combinatorial conversion data stream

Corresponding to the response function (reactionFn), some of the corresponding methods in RxJS are:

COMBINING: merge, combineLatest, zip

MAPPING: map

FILTERING: filter

REDUCING: reduce, max, count, scan

TAKING: take, takeWhile

SKIPPING: skip, skipWhile, takeLast, last

TIME: delay, debounceTime, throttleTime

3) update the view of consumer data stream

Corresponding to View, it can be used in RxJS and Angular:

Async pipe

What are the advantages of responsive programming over MVVM or Redux?

Describes the occurrence of the event itself, rather than the computational process or intermediate state.

It provides a way to combine and transform data streams, which also means that we have a way to reuse continuously changing data.

Since all data streams are obtained by layers of composition and transformation, this means that we can accurately track the source of events and data changes.

If we blur the timeline of RxJS's Marbles diagram and add vertical sections each time the view is updated, we will find two interesting things:

Action is a simplification of EventStream.

State is the correspondence of Stream at some point.

No wonder we can have such a sentence on the Redux website: if you have already used RxJS, chances are that you no longer need Redux.

The question is: do you really need Redux if you already use Rx? Maybe not. It's not hard to re-implement Redux in Rx. Some say it's a two-liner using Rx.scan () method. It may very well be!

At this point, can we further abstract the statement that the web page view can correctly respond to the relevant events?

All events-- find-- > related events-- make-- > respond

Events that occur in chronological order are essentially data streams, which can be further extended to:

Source data stream-transform-> Intermediate data stream-subscription-> consumption data stream

This is the basic idea that responsive programming works perfectly at the front end. But is this idea only applied in front-end development?

The answer is no, this idea can be applied not only to front-end development, but also to back-end development and even real-time computing.

Third, break the wall of information

The front and back end developers are usually separated by an information wall called REST API. REST API separates the responsibilities of the front and back end developers and improves the development efficiency. But it also separates the horizons of front and rear developers by this wall, so let's try to tear down this wall of information and take a glimpse of the application of the same idea in real-time computing.

1 Real-time computing and Apache Flink

Before moving on to the next section, let's introduce Flink. Apache Flink is an open source stream processing framework developed by the Apache Software Foundation for stateful computing on borderless and bounded data streams. Its data flow programming model provides single event (event-at-a-time) processing capabilities on finite and infinite datasets.

In practical applications, Flink is usually used to develop the following three applications:

Event-driven application event-driven applications extract data from one or more event streams and trigger calculations, status updates, or other external actions based on incoming events. Scenarios include rule-based alarm, anomaly detection, anti-fraud, and so on.

Data analysis application data analysis task needs to extract valuable information and indicators from the original data. For example, Singles' Day turnover calculation, network quality monitoring and so on.

Data Pipeline (ETL) Application extract-transform-load (ETL) is a common method for data conversion and migration between storage systems. ETL jobs are usually triggered periodically to copy data to an analytical database or data warehouse.

Here we take the calculation of the hourly turnover of the e-commerce platform Singles Day as an example to see whether the scheme can still be used in the previous chapter.

In this scenario, we first need to obtain the user purchase order data, then calculate the hourly transaction data, then transfer the hourly transaction data to the database and be cached by Redis, and finally display it on the page after obtaining it through the API.

The data flow processing logic in this link is:

User order data stream-transform-> hourly transaction data stream-subscribe-> write to database

As described in the previous chapter:

Source data stream-transform-> Intermediate data stream-subscription-> consumption data stream

Thinking is exactly the same.

If we use Marbles to describe this process, we will get this result, it seems very simple, it seems that the same function can be done using RxJS's window operator, but is this really the case?

2 hidden complexity

Real real-time computing is much more complex than responsive programming in the front end. Here are a few examples:

Events out of order

In the process of front-end development, we will also encounter events out of order. In the most classic case, the request initiated first and then the response is received, which can be represented by the following Marbles diagram. There are many ways to deal with this situation at the front end, and we will skip it here.

What we want to talk about today is the time disorder faced in data processing. In front-end development, we have a very important premise, which greatly reduces the complexity of developing front-end applications, that is, the occurrence time and processing time of front-end events are the same.

Imagine the development complexity of the entire front end if the user performs page actions, such as click, mousemove, and other events become asynchronous events, and the response time is unknown.

However, the occurrence time of the event is different from the processing time, which is an important premise in the field of real-time computing. We still take the calculation of hourly turnover as an example, when the original data stream is transmitted layer by layer, the sequence of the data at the computing node is probably out of order.

If we still divide the window according to the arrival time of the data, the final calculation will produce an error:

In order for the window2 window to calculate correctly, we need to wait for the late event to arrive, but we are faced with a dilemma:

Wait indefinitely: the late event may be lost in transit, and the window2 window will never produce data.

The waiting time is too short: the late event has not arrived yet, and the calculation result is wrong.

Flink introduces Watermark mechanism to solve this problem. Watermark defines when to stop waiting for late event, which essentially provides a compromise between the accuracy of real-time computing and real-time performance.

There is a vivid analogy about Watermark: at school, the teacher will close the door of the class and say, "all the students who come from this point are late and will be punished." In Flink, Watermark acts as the teacher closes the door.

Data backpressure

When using RxJS in browsers, I wonder if you have considered such a situation: when observable is generated faster than operator or observer consumption, a large amount of unconsumed data will be cached in memory. This situation is called backpressure, and fortunately, the creation of data backpressure at the front end will only result in a large amount of browser memory consumption, but there will be no more serious consequences.

But in real-time computing, what should be done when the speed of data generation is faster than the processing capacity of intermediate nodes, or exceeds the consumption capacity of downstream data?

For many streaming applications, data loss is unacceptable, and to ensure this, Flink designed a mechanism:

Ideally, data is buffered in a persistent channel.

When the speed of data generation is higher than the processing capacity of intermediate nodes, or exceeds the consumption capacity of downstream data, the slower receiver will slow down the sender immediately after the buffer of the queue is exhausted. A more vivid analogy is that when the flow rate of the data flow slows down, the entire pipeline is "back-pressurized" from the sink to the water source, and the water source is throttled so that the speed can be adjusted to the slowest part, thus reaching a stable state.

Checkpoint

In the field of real-time computing, there may be billions of data processed per second, and the processing of these data cannot be done independently by a single machine. In fact, in Flink, the operator operation logic will be executed by different subtask on different taskmanager, so we are faced with another problem: when something goes wrong on a machine, how to deal with the overall operation logic and state to ensure the correctness of the final operation result?

Checkpoint mechanism is introduced into Flink to ensure that the state and computing location of the job can be restored, and checkpoint makes the state of Flink have good fault tolerance. Flink uses a variant of the Chandy-Lamport algorithm algorithm called Asynchronous barrier Snapshots (asynchronous barrier snapshotting).

When you start checkpoint, it asks all sources to record their offsets and inserts numbered checkpoint barriers into their streams. These barriers mark the part of the stream before and after each checkpoint as they pass through each checkpoint.

When an error occurs, Flink can restore the state according to the state stored in checkpoint to ensure the correctness of the final result.

The model is universal in both responsive programming and real-time computing. I hope this article can make you think more about the idea of data flow.

This is the answer to the question about how to deal with the data flow from RxJS to Flink. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.