How to handle large objects with high concurrency 07/19 Update SLTechnology News&Howtos

How to handle large objects with high concurrency

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "how to use high concurrency object processing". In daily operation, it is believed that many people have doubts about how to use high concurrency object processing. The editor has consulted all kinds of data and sorted out simple and useful operation methods. I hope it will be helpful for you to answer the doubts about "how to use high concurrency object processing"! Next, please follow the editor to study!

Students who have been immersed in the high concurrency of the Internet for years will have some established rules when writing code: they would rather split the request into 10 1-second requests than make a 5-second request; they would rather split the object into 1000 10KB, but also try to avoid generating a 1MB object.

Why? This is the fear of "big".

"large object" is a generalized concept that may be stored in JVM, may be being transmitted over the network, or may exist in a database.

Why do large objects affect the performance of our applications? There are three reasons.

Large objects take up a lot of resources, and the garbage collector takes part of the energy to recycle it.

Large objects are exchanged between different devices, which consumes network traffic, as well as expensive Imax O

Parsing and processing large objects is time-consuming, and if the object's responsibilities are not focused, it will incur additional performance overhead.

Next, xjjdog will take a step-by-step look at some strategies to make objects smaller and focus operations in terms of the structural latitude and time dimension of the data.

1. Substring method of String

As we all know, String is immutable in Java, and if you change its contents, it will generate a new string.

If we want to use part of the data in the string, we can use the substring method.

As shown in the figure, when we need a substring. Substring generates a new string, which is constructed by the Arrays.copyOfRange function of the constructor.

This function is fine after JDK7, but in JDK6, there is a risk of memory leaks. We can take a look at this case to see the problems that may arise from the reuse of large objects.

This is a screenshot of me from the official JDK. As you can see, when it creates a substring, it doesn't just copy the objects it needs, but references the entire value. If the original string is large, memory will not be freed even if it is no longer used.

For example, an article may have several MB content, and we only need the summary information in it, and we must not maintain the whole large object.

String content = dao.getArticle (id); String summary=content.substring (0100); articles.put (id,summary)

The significance of this for us is. If you create a larger object and generate some other information based on that object. At this point, be sure to remove the reference relationship with this large object.

two。 Aggregate large objects to expand capacity

Object expansion is a common phenomenon in Java. Such as StringBuilder, StringBuffer,HashMap,ArrayList and so on. In a nutshell, the data in a collection of Java, including List, Set, Queue, Map, and so on, is uncontrollable. When the capacity is insufficient, there will be expansion operations.

Let's first take a look at the expansion code of StringBuilder.

Void expandCapacity (int minimumCapacity) {int newCapacity = value.length * 2 + 2; if (newCapacity-minimumCapacity)

< 0) newCapacity = minimumCapacity; if (newCapacity < 0) { if (minimumCapacity < 0) // overflow throw new OutOfMemoryError(); newCapacity = Integer.MAX_VALUE; } value = Arrays.copyOf(value, newCapacity); } 容量不够的时候，会将内存翻倍，并使用Arrays.copyOf复制源数据。下面是HashMap的扩容代码，扩容后大小也是翻倍。它的扩容动作就复杂的多，除了有负载因子的影响，它还需要把原来的数据重新进行散列。由于无法使用native的Arrays.copy方法，速度就会很慢。 void addEntry(int hash, K key, V value, int bucketIndex) { if ((size >

= threshold) & & (null! = table [bucketIndex]) {resize (2 * table.length); hash = (null! = key)? Hash (key): 0; bucketIndex = indexFor (hash, table.length);} createEntry (hash, key, value, bucketIndex);} void resize (int newCapacity) {Entry [] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity = = MAXIMUM_CAPACITY) {threshold = Integer.MAX_VALUE; return } Entry [] newTable = new Entry [newCapacity]; transfer (newTable, initHashSeedAsNeeded (newCapacity)); table = newTable; threshold = (int) Math.min (newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);}

The code of List can be checked by yourself. it is also obstructive, and the expansion strategy is 1.5 times the original length.

Because collections are used very frequently in code, if you know the upper limit of data items, you might as well set a reasonable initialization size. For example, HashMap requires 1024 elements and needs to be expanded seven times, which will affect the performance of the application.

Note, however, that for collections with load factors such as HashMap, the initialization size = the number of required / load factor + 1. If you don't know the underlying structure, you might as well keep it by default.

3. Maintain appropriate object granularity

I have encountered a business system with very high concurrency, which needs to use the user's basic data frequently. Because the basic information of the user is stored in another service, it is necessary to have a network interaction every time the basic information of the user is used. What is even more unacceptable is that even if you only need the gender attribute of the user, you also need to query and pull all the user information.

In order to speed up the query speed of the data, the data is initially cached and put into the redis. Query performance has been greatly improved, but there is still a lot of redundant data to be queried each time.

The original redis key was designed like this.

Type: string key: user_$ {userid} value: json

There are two problems with this design: (1) to query the value of one of the fields, you need to query all the json data and parse it yourself. (2) to update the value of one of the fields, you need to update the entire json string, which is expensive.

For this kind of large-grained json information, it can be optimized by breaking up, so that every update and query has a focused goal.

Next, the data in redis is designed as follows, using hash structure instead of json structure:

Type: hash key: user_$ {userid} value: {sex:f, id:1223, age:23}

In this way, we can use the hget command or the hmget command to get the data we want and speed up the flow of information.

4. Bitmap makes objects smaller

Can it be further optimized? For example, our system frequently uses the user's gender data to give out some gifts, recommend some friends of the opposite sex, regularly cycle users to do some cleaning actions, and so on. Or, store some user status information, such as whether online, whether to check in, whether to send information recently, count active users, and so on.

The operation of yes and no values can be compressed using the Bitmap structure.

As the code shows, it can save 32 boolean values by judging each bit in the int!

Int a = 0b0001_0001_1111_1101_1001_0001_1111_1101

Bitmap is a data structure that uses Bit to record, in which the data stored is either 0 or 1. The related structure class in Java is java.util.BitSet. The underlying BitSet is implemented using an long array, so its minimum capacity is 64.

A Boolean value of 1 billion requires only the memory of 128MB. The following is a logic for determining the gender of a user that takes up 256MB, which can cover an id of 1 billion in length.

Static BitSet missSet = new BitSet (010000000000); static BitSet sexSet = new BitSet (010000000000); String getSex (int userId) {boolean notMiss = missSet.get (userId); if (! notMiss) {/ / lazy fetch String lazySex = dao.getSex (userId); missSet.set (userId, true); sexSet.set (userId, "female" .equals (lazySex)) } return sexSet.get (userId)? "female": "male";}

This data, put in the heap memory, is still too large. Fortunately, Redis also supports the Bitmap structure, and if there is pressure on memory, we can put this structure into redis, and the judgment logic is similar.

There are many more questions: given a machine with 1GB memory that provides 6 billion int of data, how can you quickly tell which data is duplicated? You can think about it by analogy.

Bitmap is a relatively low-level structure, and there is a structure called Bloom filter (Bloom Filter) on top of it. A Bloom filter can determine that a value does not exist or may exist.

Compared with Bitmap, it has one more layer of hash algorithm. Since it is a hash algorithm, there will be conflicts, so it is possible that multiple values fall on the same bit.

There is a BloomFilter class in Guava, which can easily implement related functions.

5. Hot and cold separation of data

In essence, the above optimization method is also a way to turn large objects into small objects, and there are many similar ideas in software design. Like a newly published article, summary data is frequently used, so there is no need to query the entire article; users' feed information only needs to ensure the speed of visible information, and the complete information is stored in a slow large storage.

In addition to the horizontal structural latitude, the data also has a vertical time dimension. The most effective way to optimize the time dimension is the separation of hot and cold.

The so-called hot data is the data that is close to the user and is frequently used, while the cold data is the data with very low access frequency and very old age. The same complex SQL, running on tens of millions of data tables, and running on millions of data tables, the former must be very poor. So, although your system is very fast when it first comes online, as the amount of data increases over time, it will gradually become very slow.

Hot and cold separation is to divide the data into two parts. As shown in the picture, a full copy of data is generally maintained and used to do some time-consuming statistical operations.

The following is a brief introduction to the three schemes of cold and heat separation.

(1) double writing of data. Put the insert, update and delete operations of the cold and hot storage into a unified transaction. Due to the different types of hot storage (such as MySQL) and cold storage (such as Hbase), this transaction is likely to be a distributed transaction. At the beginning of the project, this approach is feasible, but if it is to transform some legacy systems, distributed transactions are basically unchanged. I usually abandon this plan directly.

(2) write to MQ distribution. Through the publish and subscribe function of MQ, when dealing with data, it does not drop into the library, but sends it to MQ. Start the consumption process separately, and drop the data in MQ into cold storage and hot storage respectively. The business transformed in this way has clear logic and elegant structure. Systems such as orders, which have a clear structure and low requirements for sequencing, can be distributed by MQ. But if you have a very large number of database entities, you have to consider the complexity of the program in this way.

(3) using binlog synchronization needle for MySQL, you can synchronize in the way of Binlog. Using the Canal component, you can continuously obtain the latest Binlog data, and combined with MQ, you can synchronize the data to other data sources.

End

With regard to large objects, we can give two more examples.

Like our commonly used database index, it is also a kind of reorganization and acceleration of data. B + tree can effectively reduce the number of interactions between database and disk. Through a data structure similar to B + tree, it indexes the most commonly used data and stores it in limited storage space.

And serialization, which is commonly used in RPC. Some services adopt the WebService of SOAP protocol, which is a protocol based on XML, which has the advantages of slow transmission of large content and low efficiency. Nowadays, most Web services interact with json data, and json is more efficient than SOAP. In addition, you should all have heard of google's protobuf, because it is a binary protocol, and the data is compressed, the performance is very superior. After the data is compressed by protobuf, the size of json is only 1 / 10 / 10 / 10 / 10 / 10, but the performance is improved by 5-100 times. The design of protobuf is worth using for reference. It has a very compact processing of data through the three segments of tag | leng | value, and the parsing and transmission speed is very fast.

For large objects, we have two methods: structural latitude optimization and time dimension optimization. In terms of structural latitude, by dividing the object into appropriate granularity, the operation can be focused on the small data structure and the time processing cost can be reduced; by compressing, transforming, or extracting hot data, the storage and transmission costs of large objects can be avoided. In terms of time latitude, the commonly used data can be stored in high-speed equipment by means of hot and cold separation, so as to reduce the collection of data processing and speed up the processing speed.

At this point, the study on "how to use high concurrency large object processing" is over. I hope to be able to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.