How to parse Nacos configuration Center 04/21 Update SLTechnology News&Howtos

How to parse Nacos configuration Center

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to analyze the Nacos configuration center. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Background

Last time we talked about Nacos registry, we talked about the consistency protocol of registry, the principle of subscription and registration. If you are interested, you can take a look at the previous article: Nacos registry that you should know. In Nacos, there is another function that is particularly important, that is, the configuration center. Let's not specifically introduce what the configuration center is, but let's think about it for a while.

When we first do some simple learning projects, we will encounter some things that need to be configured, such as the size of the database connection pool, the blacklist of users, and so on. We all write these things in the code, such as if (userId = 123) {do something}, this kind of code can be seen everywhere in the project. Later, when I took part in the work, I found that this method of writing did not manage the configuration very well. The configuration can be seen everywhere and could not be adjusted according to the code environment. For example, both online and offline can only use the same configuration. Although it can be done through if,else, this is very troublesome, so when I work, I start to use xml,yaml and other methods to configure it in the file. Read different configurations in different operating environments. This approach basically meets most of the needs, but later we encounter a situation where we need to modify these configurations dynamically. If we can only modify the files and then re-launch the service through the file way, this is very troublesome, so the configuration center was born.

Here we can think about, if you want to implement the configuration center, what functions should you have? I would like to list some:

The configuration can be modified dynamically.

The failure of the configuration center does not affect the use of the configuration.

The configuration can be shared by multiple services.

Rights management is supported, and only those who grant permissions can view and modify the configuration

The configuration can be rolled back, and when we encounter a problem with the configuration, we can roll back the configuration like a rollback service.

Grayscale release, you can let several machines first use this configuration if there is no problem, in full capacity.

The QPS of the configuration center itself can be guaranteed to be sufficient. If it is a company's basic service, it needs to be guaranteed.

In fact, there are many open source configuration centers in open source projects, such as spring cloud config, Apollo, etc., among which Apollo is the open source configuration center of Ctrip, which is also very famous in the industry. our article here mainly introduces the configuration center of Nacos. Of course, students who are interested can come down and check the relevant introductions of other registries.

Basic concept

Similarly, let's first introduce some basic noun concepts related to the registry:

Namespaces (namespace): like registries, namespaces belong to the top-level structure of Nacos and are used for tenant-level isolation. We most often isolate different environments such as test environments and online environments.

Configuration management: system configuration editing, storage, distribution, change management, historical version management, change audit and other configuration-related activities.

Configuration item: a specific configurable parameter and its range, usually in the form of param-key=param-value. For example, we often configure the log output level of the system (logLevel=INFO | WARN | ERROR) is a configuration item.

Configuration set: a set of related or unrelated configuration items is called a configuration set. In a system, a configuration file is usually a configuration set, which contains the configuration of all aspects of the system. For example, a configuration set may contain configuration items such as data source, thread pool, log level, and so on.

Configuration set ID: the ID of a configuration set in Nacos. The configuration set ID is one of the dimensions that the organization divides into configurations.

Configuration grouping: a set of configuration sets in Nacos that is one of the dimensions of organizational configuration.

Configuration snapshot: the client SDK of Nacos generates a snapshot of the configuration locally. When the client cannot connect to the Nacos Server, you can use the configuration snapshot to display the overall disaster recovery capability of the system. Configuration snapshots are similar to local commit in Git and are also similar to caches, which are updated at the appropriate time, but there is no concept of cache expiration (expiration).

The architecture of the configuration center is as follows:

Users can add or modify the configuration in the background interface, or modify the configuration through client-api

All modified data take effect first in Leader via raft, and then synchronized to other copies.

If the user wants to subscribe to this configuration, subscribe through long polling.

Consistent storage

The most important thing in the configuration center is how to do a good job of storage. generally speaking, we store in two ways, either full memory storage, which can ensure very high performance, but the complexity of maintaining the memory consistency of different machines is relatively high. another way is to use the database, which does not maintain any state in memory, and every machine can write, which is relatively low in complexity and does not need to consider the problem of consistency. But because all reads and writes go to the database, performance is not guaranteed. Some improvements are made to these two storage methods in Nacos, which ensures both performance and complexity consistency.

After Nacos1.3, two storage methods, mysql and raft + derby, are provided. Next, we will introduce these two storage methods one by one.

Mysql + Asynchronous full Notification

At the beginning, nacos provides mysql. All machines can read and write, and there is no distinction between master and slave, as shown in the following figure:

Data are read and written directly through Mysql, the specific code is in ExternalStoragePersistServiceImpl, in which JdbcTemplate is directly used, so the code is not introduced in detail here. If you are interested, you can go here to see it.

If you only use mysql, some students will ask questions, how can only use mysql to ensure that database performance will not become a bottleneck? The easiest way is to use high-configuration Mysql, with money to do it for me, it is obvious that this is not very reliable, only suitable for tuhao players. So how to do this optimization? Generally speaking, business students usually put a cache layer in front of Mysql, such as redis,memcached and so on.

The concept of caching is also used in Nacos to help us ease the pressure on the database. But it's a little different from normal caching:

There is a HashMap in ConfigService that caches all Config metadata (MD5, type of these data)

However, the specific stored values are not directly stored in memory, but are stored to the local disk. The advantage of this is that we cannot guarantee the size of the values configured by our config. If the value of each config is very large, then our memory will inevitably be insufficient. At this time, the two open source middleware Nacos and Apollo provide two solutions:

Apollo's approach is to use a guavaCache and use a phase-out strategy to phase out those that are not used frequently.

The practice of Nacos is to fully cache metadata, and the specific values are stored in disk space. Separate storage is adopted. Nacos uses this method. If you only access metadata, then full memory can be used. Unlike Apollo, you may not encounter obsolete reasons to access the database.

Dump

Nacos uses full cache metadata to memory, and specific values are stored in disk space, but there is a problem, that is, when the data of one machine changes, how does the memory of other machines change? This requires our full asynchronous notification, which sends a ConfigDataChage event each time the data is modified, then accepts it locally and processes it, and then sends the change message to all other machines.

After other machines receive this change notification, they perform a dump operation:

We will first query the MD5,MD5 in the metadata, which is actually calculated based on the value in our configuration, so we can quickly determine whether the value has changed this time. If there is a change, we will store the value on disk.

If our machine is newly started, there will not be any cache and dump files at this time, then DumpService will traverse all the data in the database and cache all the data on the machine so that we can use it.

Raft + derby

Nacos provides a new storage mode after 1.3.0, which is to use the raft protocol to ensure data consistency and to use apache derby for embedded data storage. The purpose of providing this way is to reduce the cost of maintaining the mysql database cluster, and simplify the cost of cluster deployment. When deploying Nacos, you can package the Nacos image directly, and there is no need to deploy a separate database.

Sofa-jraft is used in Nacos, which is a high-performance raft implementation of an java version of Ant Open Source. Students who are not familiar with raft can read the following raft papers. Students who have known raft should know that raft greatly strengthens the concept of Leader:

Must exist in the system and there can only be one leader at a time. Only leader can accept requests from clients.

Leader is responsible for actively communicating with all followers, sending 'proposals' to all followers, and collecting followers responses from the majority.

Leader also needs to actively send a heartbeat to all followers to maintain leadership.

We found that everything is related to leader, so our performance must be limited to leader, so we chose sofa-jraft in Nacos, which has a lot of optimizations for raft itself, and made the following optimizations in sofa-jraft:

Batch: batch operation is an optimization strategy for many systems. Batch operation is also used in jraft. Batch consumption through disruptor's MPSC model realizes some of the following batch operations and improves a lot of performance:

Batch submission of task

Batch network transmission

Local IO batch write

Apply batch to the state machine

Pipeline replication: pipeline is a pipeline technology that helps us not to continuously put requests into the pipeline, as in the previous request-response model, without waiting for a reply to the request, and then read the results at the end. Turning on pipeline in jraft increases performance by 30%.

Parallelization: leader persists log and sends Log to follower in parallel, as well as to different follower.

Linear read: in the raft protocol, read requests are processed according to Log, read results are obtained through Log replication and state machine execution, and then the results are returned to Client. The disadvantage of this method is that it requires Log storage and replication, which will bring disk flushing overhead, storage overhead and network overhead, so it has a great impact on performance in scenarios with a large number of read operations. ReadIndex,Lease Read optimizations have been made in Sofajrat so that all reads can be performed locally, which is a significant performance improvement.

Apache Derby is also a lightweight database written by Java. Through this design, Nacos actually builds a lightweight distributed database. There will be a database to store data on every machine, and then ensure the consistency of all machine data through the raft protocol.

The way of embedding database is no better than that of Mysql. In terms of performance, the way of Mysql saves a lot of cache, and content is also saved to disk, and basically does not go to the library when reading, so the way of Mysql is actually better, but the way of embedded database is very advantageous in the way of operation and maintenance deployment. How to make a choice here requires the user to make a choice.

Client subscription change

We mentioned in the previous section that subscriptions in Nacos registries are obtained through udp broadcast + scheduled rotation training, while long rotation training is used in the configuration center to change subscriptions. Why are the two implementation subscriptions implemented in different ways? The data stored in our registry are all small data such as node Ip, port and other information, but you cannot control the size of the configuration in our configuration center. For example, a service subscribes to 100m configurations, and the data size of each configuration is 1m. It is obviously unreliable to pull 100m of data each time according to the practice of regular rotation training. Therefore, long rotation training is adopted here. The specific method of long rotation training is as follows:

Step1: the client sends out requests for long rotation training regularly, and the timeout is 30 seconds by default. The request issued is the MD5 of all the subscription configuration content. Here, we will not send the whole content as a request, otherwise there will be a lot of data sent out each time as mentioned above.

Step2: after receiving this request, the server uses the features of Servlet3.0 to enable asynchronous AsyncContext.

Step3: the server stores the AsyncContext and waits for the configuration change, which is triggered by the DataChangeEvent event, and then determines whether the md5 in the previous request is the same as the newly updated md5. If consistent, the change information is written to the response of the AsyncContext.

Ste4: if the timeout has not yet arrived, it means that no configuration has been updated this time, and you will return to Step1.

In this way, we have a small number of requests each time, and the real data will be returned to us only when the data is really updated.

Configure grayscale

We may have such a requirement that we need to verify whether a certain configuration has an impact on the business. Usually, the configuration center is directly modified, and all machines will be updated. If there is a problem with this configuration, then there will be a total failure. A grayscale feature is provided in Nacos, and we can only use a certain configuration for some machines, so that we can complete some small traffic verification.

Grayscale publishing in Nacos is also called beta publishing, as shown in the following figure:

As long as we have modified the configuration, check the beta release box and select the version that requires grayscale.

The specific implementation in nacos is to use a separate table to store beta-related information:

Use the beta_ips field to save the machines we need for grayscale. When the client subscribes for long-rotation training, it will also filter whether it is a grayscale machine. If so, it will be updated. Here is the code of LongPollingService:

History rollback

The historical version is also available in Nacos, just like the commit of git. As long as you have commitid, you can roll back to the corresponding version. A history_config table is used to save it in Nacos. We can get all the history of our configuration through this table to roll back.

There are many other features in Nacos, such as rights management, which I won't cover here. The most ingenious design in the configuration center of Nacos is storage and subscription. Storage Nacos provides two modes, one is Mysql+ cache + this disk, and the other is through raft+derby, all of which have their own advantages and disadvantages. If you subscribe, Nacos uses a completely different approach from the registry. It solves the real-time notification of updates through long-rotation training, and does not require a lot of resources. If you are interested in Nacos, you can still read the Nacos code.

The above is the editor for you to share how to analyze the Nacos configuration center, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.