Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How was QQ created? -- the design of decrypting friend system

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the first background system that the author has come into contact with. Starting from his own knowledge, the content involved is relatively basic. Please skip the background Daniel consciously.

What is the friend system?

Simply put, a friend system is a system that maintains user friend relationships. Our most familiar friend system case is QQ, in fact QQ is an instant messaging tool, with friends system precipitation of a large number of friend relationship chain, thus casting an indestructible business empire. The importance of the friend system is evident.

Those familiar with Internet products know that when a product has a certain number of users, it often develops a friend system. The main goal is to increase user stickiness (having friends will often come) or increase community activity (having friends will communicate more).

My background development career started with such a system.

At that time, the friend system was a completely new thing for most of our team, because most of us were freshmen. The architecture of the whole system is certainly not something that we, a bunch of yellow-haired kids, can create. The architecture diagram of that year could not be found, but with a little memory and years of experience accumulation, it could still outline the architecture of that year.

As shown in the figure, the architecture of the friend system is a common three-layer structure, including the access layer, the logical layer and the data layer.

Let's start with the data layer.

Because we are too familiar with QQ, we can easily list the data of friend system, including user profile, friend relationship chain, message (chat message and system message), online status, etc.

Internet products often face massive concurrent requests, and traditional relational databases are difficult to meet the read and write needs. In storage, relational databases such as MySQL are generally used for data that reads more and writes less, and often cache is needed to ensure performance;NoSQL (Not Only SQL) should be the mainstream at present.

For the friend system, the user profile and friend relationship chain are stored in kv, while the message is stored in tlist (which can be replaced by redis list) developed by the company itself. The online status is described below.

Then there is the logic layer.

The most complex of these is probably the messaging service (which I didn't develop).

In message service, messages are divided into chat messages and system messages by type (system messages include friend adding messages, global tips push, etc.), and online messages and offline messages by status. In the implementation, three lists are maintained: chat messages, system messages, and offline messages. Chat messages are shared between two users, system messages and offline messages are exclusive to each user. When the user is online, chat messages and system messages are sent directly; if the user is offline, the messages are stored in an offline message list and pulled when the user logs in again.

So messaging isn't complicated? In fact, the conventional process design in system design is often relatively simple, but for Internet products, abnormal conditions are normal, when all kinds of abnormal conditions are considered, the system will be very complex.

In this example, message packet loss is an abnormal situation, how to ensure that in the case of packet loss, but also normal operation is a big problem.

A common solution is for the receiver to reply with an acknowledgement and for the sender to retransmit if no acknowledgement is received. However, the acknowledgement packet may be dropped, which can add an acknowledgement packet to the acknowledgement packet, which is an endless acknowledgement.

The solution can refer to TCP retransmission mechanism. So the question is, why don't we use TCP? Because TCP is still slow, the reliability of chat messages is not as high as the transaction data requirements, dropping a few messages does not cause serious consequences, but if the user has to wait for a long time to be received after each message is sent, the experience is very poor.

A compromise solution is that the receiver replies to the acknowledgement packet and retransmits it if the sender does not receive the acknowledgement within a certain period of time; if the receiver receives two identical packets (the same as the custom seq), the duplicate can be removed.

A discussion triggered by an interview question:

During interviews, I often ask candidates a question: how to achieve a user in a distributed system can only have one terminal online at the same time (the user logs in to the account in two places, and the latter login can kick the previous login offline)? This is a very basic function in Internet products, and it examines the candidate's basic architectural design ability.

The design starts with the access server (hereinafter referred to as the interface machine). The interface machine is the window of the friend system to the outside world. Its main functions are to maintain user connection, login authentication, encrypt and decrypt data and transmit data to the backend service. When a user connects to a friend system, the user first connects to the interface machine. After authentication succeeds, the interface machine maintains the user session in memory, and subsequent operations are performed based on the session.

As shown in the figure, if the user tries to log in twice, the interface machine can kick the first login offline through the session, thus ensuring that only one terminal is online.

Is the problem solved?

No. Because the actual system will certainly not have only one interface machine, in the case of multiple interfaces, the above method is not feasible. Because each interface machine can only maintain part of the user's session, if the user connects to different interface machines one after another, it will cause the user to log in multiple times.

Naturally, the solution is to maintain a global view of user state. In our friend system, it's called a presence service.

Presence service, as its name implies, is a service that maintains the user's online status (login time, interface machine IP, etc.). User logins and logouts trigger state changes here via the interface machine. Because login packets and exit packets may be dropped, heartbeat packets are also used for online status maintenance (receiving a heartbeat is marked as online, and not receiving n heartbeats is marked as offline).

A common method is to use bitmap to store online status, specifically refers to the allocation of a space in memory, 32-bit machine on a total of 4294967296 natural numbers, if a bit to represent a user ID (such as QQ number), 1 means online, 0 means offline, then all natural numbers stored in memory as long as 4294967296 / (8 1024 1024) = 512MB (8bit = 1Byte). Of course, the implementation can also allocate more bits to each user as needed.

Therefore, kick offline function as shown in the figure.

When a user logs in, the interface machine first looks up whether there is a session on the machine, updates the session if there is, and then sends a login packet to the online status service. The online status service checks whether the user is online, updates the status information if online, and sends a kick offline packet to the IP of the interface machine logged in last time. When receiving the kick offline packet, the interface machine checks whether the user ID in the packet exists, and if there is, sends a kick offline packet to the client and deletes the session.

In practice, there are still many details to pay attention to in the kick offline function.

Back to the case where users log in to the same interface machine one after another:

The kick-off process is correct, but what happens if steps 10 and 13 are reversed (as is common in UDP transmissions)? Everyone can deduce for themselves that the latter kick offline package will kick A'offline for the second login. This is not what we expected. What to do?

The solution is divided into several details. ① When the interface machine receives the successful login packet No.13, it first replaces session A with session A', and then sends a kick offline packet to client A.(Avoid multiple survival to cause each other to kick offline);② The kick offline package must contain other identification information besides the user ID, and the unique identifier of the session should be in the form of ID+XXX (I initially used ID+LoginTime), XXX is to distinguish a certain login;③ When the interface machine receives the kick offline package, it only needs to judge whether ID+XXX matches to decide whether to send the kick offline package to the client.

In reality, problems are always strange, but there are always more ways than problems.

For example, I have encountered instances where the interface machine and online status service time drift (by a few seconds) in my projects. In this way, the unique identifier of the kicked offline cannot be in the form of User ID+LoginTime. A unique UUID solution can be generated for each login. There are many similar problems and I will not repeat them.

To sum up, this article mainly introduces the overall architecture of the friend system and the implementation of some modules. The implementation of each module in a distributed system is actually not difficult, and the difficulty mainly lies in dealing with problems caused by complex network environments (such as packet loss, delay, etc.) and problems caused by server anomalies (such as increasing server redundancy in order to cope with server downtime, which will cause other problems).

Although the friend system is simple, it is small and complete, and various technologies of architecture design are basically involved. For example, hierarchical structure, Load Balancer, parallel scaling, disaster recovery, service discovery, server development framework, etc. I will introduce these technologies in different projects later. Please stay tuned.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report