Example Analysis of data loading in Open Source Levin 07/01 Update SLTechnology News&Howtos

Example Analysis of data loading in Open Source Levin

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail about open source Levin: data Lightning loading solution, the content of the article is of high quality, so the editor will share it with you for reference. I hope you will have some understanding of the relevant knowledge after reading this article.

In some business scenarios on the Internet, we often encounter this situation: the service needs to load a large amount of data into memory, the data scale is up to tens of gigabytes, the data update frequency is low (sky level, hour level, minute level), and the use method is static query. Such as business order data, offline mining strategy rules, map network data and so on. On the other hand, online services usually load at least two versions of data based on stability, and it usually takes several minutes for the service to start. The exposed problems include long time on-line service, high labor cost, unable to iterate quickly in demand queue, high time cost, slow rollback speed, and the hidden danger of stability due to the failure instances that have to be loaded with a large number of data.

Levin is a fast loading scheme for low-frequency updates, static use and large-scale data mentioned above, which efficiently hosts large-scale static data and speeds up the cold start and hot load of large memory services.

1. Principle

The service starts quickly without breaking, but in simple service change scenarios (such as launch, rollback, fault recovery), although no data changes are involved, the service process restart causes the heap and stack memory data to die, and the data needs to be reloaded after startup. So can data be transferred and reused between processes? The most efficient way to transfer data between processes is shared memory, which can break through the process life cycle to achieve cross-process reuse, and has memory object access efficiency and sufficient available address space (figure Memory Mapping Area below). You can have both fish (startup speed) and bear paw (query efficiency).

Levin1.jpg

Consider the data update scenario, which usually refers to data version switching, when disk data reading is inevitable, so is there a more efficient conversion method from tens of gigabytes of data files to in-memory data objects (usually STL containers)? It is considered that if the data object memory layout is directly compiled off-line and written into the binary file, the one-time shared memory allocation and IO reading when the online service is started can further improve the loading efficiency.

After determining the use of shared memory and offline compilation of container data, the key question arises: how do you put the container into shared memory? The biggest obstacle is pointer and container memory discontinuity. The weapon of Levin is dimensionality reduction: the memory layout of container objects is one-dimensional, and the whole container object can be expressed, read and copied by adding length to the first address in one-dimensional world. Since the same piece of shared memory is mapped to different virtual addresses of different processes, offsets are used instead of pointers in the container to implement address-independent containers.

We also investigated the built wheel (Boost interprocess container) and found that its baseline test performance was poor: the most commonly used vector/hashmap query efficiency was about 10% / 20% slower than that of the standard container. Finally, Levin chooses a custom shared memory container and makes a series of optimizations under the premise of static data use, which has the advantages of easy to use, high efficiency, good performance and memory saving. It also realizes the indispensable functions of engineering application, such as shared container memory check and version management.

two。 Function and characteristic

▍ STL-like shared memory container

Containers hosted on shared memory fragments are supported, including common containers vector, set, map, hashset, hashmap, and so on. And support the use of adaptation, combination, specialization and other means to customize the shared memory container. The baseline test shows that the query performance of the Levin container is better than that of the standard container, and the memory usage efficiency is obvious (see benchmark for details)

Levin2.jpg

▍ offline data compilation

Efficient use of data is the goal of online services, and data standardization is the premise of building complex systems. Many systems with data as the core will divide the data flow into offline compilation and online loading, in which offline data compilation is an important part of efficient data conversion. Offline single node is used to convert data into a format that is convenient to use, eliminating the repetitive conversion and construction work of online services with multiple nodes. Levin supports offline data compilation and compiles the original data into memory layout binaries of shared container objects that can be read directly by the process, providing more efficient data services for online services.

▍ online data loading

Load the data files compiled in the offline phase into the named shared memory area, and support the application, verification, loading and release of shared container objects. Levin online data load for one-time shared memory allocation and reading, saving a large number of brk/mmap memory allocation system calls during the construction process, reducing the number of IO, and further speeding up the loading of online service data. The following figure shows the data usage process described above, which is recommended.

Levin3.jpg

▍ management module

When users use a large number of shared containers, the panorama of the use of shared containers is opaque, release one by one, and service exceptions will lead to useless data residing in memory space and wasting node memory resources. Levin provides a management module, which supports the management of shared containers in the way of group. Shared containers with the same lifecycle can be hosted by the same management module instance for unified creation, loading and release. The management module also supports the shared container global search function. Support safe release and cleaning functions to avoid abnormal destruction of container data or useless container hosting system. Support customizable data file verification mode to reduce file verification time.

▍ version switching

Use the Levin management module to manage the container data of the same version with group. When you uninstall a specific version, you can safely and uniformly release the corresponding collection of shared containers, which perfectly supports the hot swap of user data versions.

3. Internal practice

The practical effect of Levin internal application: the cold start and hot load time of landing service are reduced from minutes to seconds. The optimization of memory usage is obvious. The static data of Levin container is hosted by shared memory, which is separated from the dynamic data of service session. Observing the data version switching scenario, the number of disk IO has been greatly reduced, and the cpu jitter caused by switching has also been significantly alleviated. Above, the problems of waste of manpower and time cost and hidden dangers of stability mentioned at the beginning of this article can be easily solved.

On open source Levin: data lightning loading program is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.