A tutorial on the Design method of 100 million user Center 04/15 Update SLTechnology News&Howtos

A tutorial on the Design method of 100 million user Center

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

100 million-level user center design method tutorial, in view of this problem, this article introduces the corresponding analysis and solutions in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

User center, as its name implies, is the place where users are managed, and it is one of the core subsystems of almost all Internet companies. Its core functions are login and registration, the main functions are to change passwords, change mobile phone numbers, obtain user information, modify user information and some extended services, as well as generate Token and verify Token after login. Let's disassemble the user center from several dimensions.

I. Service architecture

The user center not only needs to provide services for users, but also undertakes frequent calls from other businesses; since it needs to provide services for users, it will bring some business logic, such as risk control or SMS verification during login, so there is a risk of unavailability. For example, the interface for obtaining user information does not have so many dependencies and may only need to call the database or cache. The API for obtaining user information needs to be stable, and the core login and registration interface also needs to be stable, but when we add some policies or modifications at the interface level, we do not want to make the entire service unavailable because of the online problem, and after it is launched, it is necessary to make a full return to the entire service function, resulting in a serious waste of resources.

Therefore, based on the business characteristics, we can split the user center into three separate micro-services: gateway services, core services, asynchronous consumer services. Gateway services, which provide http services, aggregate a variety of business logic and service calls, such as risk control or SMS messages that need to be verified when logging in; core services, which handle simple business logic and data storage, are located at the terminal of the call link, and almost do not rely on calling other services, such as verifying Token or obtaining user information, they only rely on redis or database Asynchronous consumer services, on the other hand, process and consume asynchronous messages. It will be described in more detail below.

After this design, when there are new features online, the core services and asynchronous consumer services hardly need to be republished, only the gateway services need to be released. The third parties who rely on our core services are very assured and the hierarchy is very clear. Of course, the price of this is that the calling link of the service becomes longer. Since gateways and core services are involved, two services need to be released and compatibility tests need to be done.

Second, interface design

The interface of the user center involves the core information of the user, which requires high security; at the same time, it undertakes more calls from third parties, and the availability requirements are also high. Therefore, the user center interface is designed as follows:

First, the interface can be split into Web-oriented and App-oriented interfaces. The Web API needs to achieve single sign-on across domains, and the methods of encryption, signature verification and token verification are also different from those of the App side.

Secondly, do special treatment to the core interface. For example, the login interface makes some optimizations on logic and links. Why do you want to do special treatment for these interfaces? If the user can not log in, the user will be very panic, the number of customer complaints will come immediately.

So how do you do that? On the one hand, we make the user core information table simple. The user's information will contain fields such as userId, mobile phone number, password, avatar, nickname, etc. If all the user's information is saved in one table, the table will be extremely large and it will be extremely difficult to change the fields. Therefore, it is necessary to split the user table and store the core information in the user table, such as userId, username, mobile phone number, password, salt value (randomly generated), etc., while some information such as gender, profile picture, nickname and so on are stored in the user table.

On the other hand, we need to make the core link of the login so short that it only depends on reading the library. In general, after logging in, users need to record their login information and call services such as risk control or SMS. For the login link, any problem in any link may cause the user to be unable to log in, so how can we achieve the shortest link? The method is that dependent services can be automatically degraded. For example, if there is a problem with the anti-fraud check, then it automatically degrades and uses its default policy. In extreme cases, it only does password verification, and after the main library is hung up, it can read user information from the library.

Finally, there is the security check of the interface. We need to do anti-playback and signature verification for the App interface. Visa verification may be familiar to you, but the concept of anti-replay may be relatively unfamiliar. Anti-replay, as the name implies, is to prevent requests from being sent repeatedly. A user request can only be requested once in a specified period of time. Even if the user request is hijacked by the attacker, the request cannot be repeated for a period of time. If the attacker wants to tamper with the user's request and send it again, sorry, the request will not pass. Thanks to the support of big data, combined with the terminal, we can also store each user behavior profile in the system (or call a third-party service). After the user initiates the request, our API will check the user's mobile phone number, real name authentication, face or living body according to the user's profile.

III. Sub-database and sub-table

With the growth of users, the data exceeds 100 million, what should we do? The common way is to divide the database and table. Let's analyze some common table structures in the user center: the user information table, the third party login association table, and the user event table. As can be seen from the above table, the growth of user-related data tables is relatively slow because there is a ceiling for user growth. The growth of the user event table is exponential, because each user logs in, changes passwords, and changes mobile phone numbers without limit.

Therefore, first of all, we can split the user information table vertically. As mentioned above, common fields such as user ID, password, mobile phone number and salt value are separated from the user information table, and other user-related information is used in a separate table. In addition, migrate the user event table to another library. Compared with horizontal segmentation, vertical segmentation costs relatively less and is relatively simple to operate. Due to the relatively small amount of data in the user core information table, even if it is 100 million-level data, the performance problem can be solved by using the database caching mechanism.

Secondly, we can make use of the characteristics of front and background business to treat them differently in different ways. For user-side foreground access: users log in through username/mobile or query user information through uid. The access to user-side information is usually a query of a single piece of data, and we can solve the problem of consistency and high availability by indexing multiple queries. For the background access of the operation side: query based on age, sex, login time period, registration time period, etc., which are basically paged in batches. However, due to the internal system, the query volume is low and the requirement for consistency is low. If the query on the user side and the operation side uses the same database, then the sorting query on the operation side will increase the CPU of the whole database, decrease the query efficiency, and affect the user side. Therefore, the database used by the operation side can be the same MySQL offline database as the user side. If you want to increase the query efficiency on the operation side, you can use ES non-relational database. ES supports sharding and replication to facilitate horizontal segmentation and expansion. Replication ensures high availability and high throughput of ES, and can meet the query needs of the operation side.

Finally, if we still want to split horizontally to ensure the performance of the system, what kind of sharding should we take? The common methods are indexed table method and genetic method. The main idea of the index table method is that UID can locate to the library directly, but the mobile phone number or username can not be located directly to the library. It is necessary to establish an index table to record the mapping relationship between mobile and UID or username and UID to solve this problem. Usually, this kind of data is less, and there is no need to divide the database and tables, but compared with the direct query, there is one more database query and one more mapping insertion when the new data is added, and the transaction becomes larger. The idea of genetic approach is that we integrate username or mobile into UID. The specific practices are as follows:,

When the user registers, according to the user's mobile phone number, the function is used to generate the N bit gene mobile_gen, which makes mobile_gen=f (mobile)

Generate a globally unique id for M bit as the user identity

Stitching M and N, assigned to the user as UID

Insert the remainder into a specific database according to N bit

When looking up user data, take the remainder of the last N bit of the user UID to fall into the final library.

Judging from the above process, the genetic method is only suitable for certain scenarios that are frequently queried, such as logging in with a mobile phone number, but it is more troublesome for users to log in using username. Therefore, people choose different ways to split horizontally according to their own business scenarios.

IV. Flexible downgrade of Token

After the user logs in, another important thing is the generation and verification of Token. Users' Token is divided into two categories, one is the Token generated by web login, this Token can be combined with Cookie to achieve the effect of single login, which will not be discussed in detail here. The other is the Token generated by login on the APP side. After the user enters the user name and password in our APP, the server will verify the user name and password. After success, the server will obtain the version of the encryption algorithm and the secret key from the system configuration center, and arrange the user's ID, mobile number, random code and expiration time according to a certain format. After a series of encryption, the Token is generated and stored in the Redis cache. The verification of Token is to combine user ID and Token and verify whether it exists in Redis. So what if Redis is not available? Here is a highly available and automatically degraded design. When Redis is not available, the server generates a specially formatted Token. When validating Token, a judgment is made on the format of the Token.

If the Token generated when the Redis is not available, then the server will decrypt the Token, and the generation of the Token is based on the user ID, mobile phone number, random code, expiration time and other data arranged and encrypted in a specific order, then the decrypted data also contains ID, mobile phone number, random code and expiration time. The server will query the database according to the data obtained, and then tell the user whether the login is successful or not. Due to the gap between memory cache redis and database cache performance, when redis is not available, downgrading may cause the database to fail to respond in time, so it is necessary to add current restriction to the degraded method.

V. data security

Data security is very important to the user center. Sensitive data needs to be desensitized, and passwords need to be encrypted multiple times. Although the application has its own security policy, if the hacker is limited to login, the security of the application will be greatly improved. There are frequent cases in which plaintext data of users are leaked on the Internet, so the major enterprises' understanding of data security has also been raised to an unprecedented height. Even if the encryption methods of MD5 and salt are used, the rainbow table can still be used to crack it. So how does the user center save the user information?

First of all, as mentioned above, login information such as user passwords and mobile phone numbers are separated from other information and are in different databases. Secondly, the password set by the user is blacklisted and will refuse to submit any weak password that meets the conditions, because no matter what encryption method is used, the weak password is extremely easy to crack. Why? Because people have a poor memory, most people are always most inclined to choose birthdays, words and so on as passwords. Six pure numbers can generate 1 million different passwords, and a combination of eight lowercase letters and numbers can produce about 2.8 trillion different passwords. A password library of 7.8 trillion is enough to cover most users' passwords, and it is possible to have such a password library for different encryption algorithms, which is why most websites recommend that users use more than 8-digit numeric plus alphabetic passwords. Of course, if the salt value is added on the one hand, and the key is kept separately on the other hand, the difficulty of cracking will increase exponentially.

Finally, you can encrypt it in bcrypt/scrypt. Bcrypt algorithm is based on Blowfish block key algorithm. Bcrypt implements random salt processing. After using bcrypt, each encrypted ciphertext is different. At the same time, memory is used to initialize the hash process. Although it runs fast on CPU due to the use of memory, parallel computing on GPU is not fast. As the new FPGA integrates large RAM, the problem of memory-intensive IO is solved, but it is still difficult to crack. The scrypt algorithm makes up for the shortcomings of the bcrypt algorithm, which increases the CPU computing and memory usage overhead exponentially. Bcrypt and scrypt algorithms can effectively resist rainbow tables, but the improvement of security leads to the decline of user login performance. User login registration is not a highly concurrent interface, so the impact will not be significant. Therefore, security and performance need to be balanced according to business type and size, and not all applications need to use this encryption to protect user passwords.

VI. Asynchronous consumption design

The asynchronous consumption here is the asynchronous consumption service mentioned above. After logging in and registering, the user needs to record the user's operation log. At the same time, after the user has registered and logged in, the downstream business needs to add points to the user and give gift vouchers and other reward operations. If these systems are synchronously dependent on the user center, then the whole user center will be extremely large, the link will be very long, and it does not conform to the principle of "big systems do small" in the industry. The unavailability of dependent services will prevent users from logging in and registering. Therefore, the user center sends the user event to MQ after the user operation is finished, and the third-party service monitors the user event. The user center is decoupled from the downstream business, and after the user operation event is stored in the database, compensation can be done when the MQ is not available or the message is lost. The user's portrait data also comes from the data here to a large extent.

VII. flexible and diverse monitoring

The user center involves the core functions such as the user's login registration and password change, and whether the problem of the system can be found in time has become a key indicator, so the monitoring of the business is particularly important. It is necessary to monitor in detail the QPS of the important interface of the user center, the memory usage of the machine, the time of garbage collection, the call time of the service and so on. When the call volume of an interface drops, the monitor will issue an alarm in time. In addition to these monitoring, there are database Binlog writing, front-end components, and ZipKin-based full-link call time monitoring, to achieve comprehensive monitoring from the user initiator to the end, even if there is a problem, the monitoring will tell you what went wrong at any time. For example, when the number of operational interactive promotion registration drops, the user center will issue an alarm, which can timely notify the business side to correct the problem and recover the loss.

The answers to the questions about the 100 million-level user center design method tutorial are shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.