Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to optimize the data repository for Java multithreaded applications

2025-03-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article shows you how to optimize your data repository for Java multithreaded applications. It is concise and easy to understand. It will definitely make your eyes shine. I hope you can learn something from this article.

Data repositories are often the bottleneck of demanding systems. In these systems, the number of queries being executed is very high. DelayedBatchExecutor is a component used to reduce the number of required queries by batching them in Java multithreaded applications.

n queries with 1 parameter Vs. 1 query with n parameters

Suppose you have a Java application that performs a query against a relational database to retrieve a Product entity (row) given its unique identifier (id).

The query is as follows:

SELECT * FROM PRODUCT WHERE ID =

Now, to retrieve n Products, there are two ways:

Perform n independent queries with 1 parameter:

SELECT * FROM PRODUCT WHERE ID = SELECT * FROM PRODUCT WHERE ID = ... SELECT * FROM PRODUCT WHERE ID =

Perform 1 query on n parameters to retrieve n Products simultaneously using the IN operator or concatenation of ORs

-- Example using IN OPERATOR SELECT * FROM PRODUCT WHERE ID IN (, , ..., )

The latter is more efficient in terms of network traffic and database server resources (CPU and disk) because:

The number of roundtrips to the database is 1, not n.

The database engine optimizes the data traversal process for n parameters, meaning that each table may only need to be scanned once instead of n times.

This applies not only to SELECT operations, but also to other operations such as Inserts, UPDATEs, and DELTEs. In fact, the JDBC API includes batch processing operations for the operations described above.

The same applies to NoSQL repositories, most of which explicitly provide BULK operations.

DelayedBatchExecutor

Java applications that need to retrieve data from a database, such as REST microservices or asynchronous message processors, are typically implemented as multithreaded applications (*1) where:

Each thread executes the same query (each query has different parameters) at some point in its execution.

The number of concurrent threads is high (tens or hundreds per second).

In this scenario, the database is likely to execute the same query multiple times within a short time interval.

As mentioned earlier, if n queries with 1 parameter are replaced with a single equivalent query with n parameters, the application will use fewer database servers and network resources.

The good news is that it can be implemented through the timewindows mechanism, as follows:

The thread that first attempts to execute the query opens a time window, so its parameters are stored in a list, and the thread is suspended. The remaining threads executing the same query within the time window add their parameters to the list and are also paused. At this point, no queries are executed on the database.

When the time window ends or the list is full (the maximum capacity limit was previously defined), a single query is executed using all parameters stored in the list. Finally, once the database provides the results of the query, each thread receives the corresponding results, and all threads automatically resume.

The author has built a simple and lightweight application mechanism (Delayed Batch Executor) that is easy to use in new or existing applications. It is based on the Reactor library and uses time-out Flux buffer publishers for parameter lists.

Throughput and Delay Analysis with DelayedBatchExecutor

Suppose a REST microservice for Products exposes an endpoint that retrieves Product data for a given productId in the database. Without the DelayedBatchExecutor, if there are 200 hits per second on endpoints, the database executes 200 queries per second. If the DelayedBatchExecutor used by the endpoint is configured with a time window of 50 ms and maximum capacity =10 parameters, the database will execute only 20 queries with 10 parameters per second at the cost of increasing latency by up to 50 ms per thread executed (*2).

In other words, to increase latency by 50 ms (* 2), the database receives 10 times fewer queries per second, yet maintains the overall throughput of the system. Not bad!

Other interesting configurations:

Window time = 100 ms, max capacity = 20 parameters → 10 queries with 20 parameters (20 times less queries)

Window time = 500 ms, maximum capacity = 100 parameters →2 queries 100 parameters (query reduction of 100 times)

DelayedBatchExecutor in execution

Dive deeper into the Product microservice example. Assuming that for every incoming HTTP request, the microservice controller asks to retrieve a Product(Java Bean) with an existing id, the following method will be invoked:

public Product getProductById(IntegrproductId) of DAO component (ProductDAO) .

The following are DAO executions with and without DelayedBatchExecutor.

No DelayedBatchExecutor

public classProductDAO { public Product getProductById(Integer id) { Product product= ...// execute the query SELECT * FROM PRODUCT WHERE ID= // using your favourite API: JDBC, JPA, Hibernate... return product; } ... }

There is DelayedBatchExecutor

// Singleton publicclass ProductDAO { DelayedBatchExecutor2 delayedBatchExecutorProductById = DelayedBatchExecutor.define(Duration.ofMillis(50), 10, this::retrieveProductsByIds); public Product getProductById(Integer id) { Product product = delayedBatchExecutorProductById.execute(id); return product; } private List retrieveProductsByIds(List idList) { List productList = ...// execute query:SELECT * FROM PRODUCT WHERE ID IN (idList.get(0), ..., idList.get(n)); // using your favourite API: JDBC, JPA, Hibernate... // The positions of the elements of the list to return must match the ones in the parameters list. // For instance, the first Product of the list to be returned must be the one with // the Id in the first position of productIdsList and so on... // NOTE: null could be used as value, meaning that no Product exist for the given productId return productList; } ... }

First, you must create a DelayedBatchExecutor instance in the DAO, in this case delayedBatchExecutorProductById. The following three parameters are required:

Time window (50 ms in this example)

Maximum size of parameter list (10 parameters in this example)

The method to be invoked with the parameter list (see below for details). In this example, the method is retrieveProductsByIds

Second, the DAO method publicProduct getProductById(Integer productId) has been refactored to simply call the execute method of the delayedBatchExecutorProductById instance. All "magic" is done by DelayedBatchExecutor.

The reason why delayedBatchExecutorProductById is DelayedBatchExecutor2

DelayedBatchExecutor3 is defined if the execute method takes two arguments (for example, an Integer and a String) and returns a Product instance

Finally, the retrieveProductsByIds method must return a List and accept a List as an argument.

If DelayedBatchExecutor3 is used

That's it.

Once running, concurrent threads executing controller logic call the method getProductById(Integrid) at some point, and this method returns the corresponding Product. Concurrent threads do not know that they have been suspended and resumed by the DelayedBatchExecutor.

"digressions" extended by data repositories

Although this article relates to a data repository, DelayedBatchExecutor can also be used elsewhere, for example, to make microservice requests to REST. Furthermore, initiating n GET requests with 1 parameter is much more expensive than initiating 1 GET with n parameters.

Optimization of Delayed BatchExecutor

The author created DelayedBatchExecutor and used it for a period of time, which effectively solved the execution problem of multiple queries started by concurrent threads in personal projects. Believing it might be useful to others, he decided to make it public.

That said, there is plenty of room for improvement and expansion of DelayedBatchExecutor functionality. Most interesting is the ability to dynamically change the DelayedBatchExecutor parameters (window time and maximum capacity) depending on the specific conditions of execution, in order to greatly reduce latency when utilizing queries with n parameters.

That's how to optimize your data repository for Java multithreaded applications. Have you learned anything or skills? If you want to learn more skills or enrich your knowledge reserves, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report