In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
This article comes from the official account of Wechat: low concurrency programming (ID:dibingfa). Author: flash.
There is a scenario that requires you to do an architectural design.
First, there is a group of authors who can publish articles.
Second, there is a group of users who can follow the author.
Third, users have a page that shows the articles published by the authors they follow, and sorts them by release time.
Well, it's actually a simplified official account message list.
The author is the official account one by one, and the user is the lovely you, and this follow-up flow list is the part of our mobile Wechat after clicking on the message of Subscription account.
Are you familiar with it this time?
How would you design it?
First of all, it is necessary to design three tables to store meta-information.
Author table: stores the author's information
User tables: storing user information
Article table: article information published by the author
Then there is the correlation table.
Users and authors can have a relationship of concern, so there can be a table of concerns.
Note that this concern table is abstract and can be split into positive and negative tables (user followers and author fans). In short, the purpose is that users can query the authors they follow according to this table.
The author can also query his fans according to this table.
There can be a publishing relationship between the author and the article, so there can be an author publishing table.
The ID of the author, the ID of the article, and the time of publication are recorded in the table. We can look up the articles of a certain author in chronological order according to this table. We can vividly call it the author's outbox.
Users can have a subscription relationship with articles, so there can be a table of user subscriptions to articles.
The user ID, article ID, and post time are recorded in the table. According to this table, we can look up the articles that a user needs to display on his follow stream page, that is, the articles posted by all the authors he follows. We can vividly call it the user's inbox.
OK, so far.
Three metadata tables: user table, author table, article table
Three related tables: concern relationship table, author outbox, user inbox
It's all ready, and of course some tables are redundant. Let's take a look at the architecture evolution of the key stream, and you'll see why it's designed this way.
At the beginning of the push model, there were fewer authors of official accounts, and fewer readers read official accounts.
For the reader's experience, it must be to minimize the time it takes for the reader to refresh the stream of attention as much as possible.
At this point, readers have two ways to get their own attention stream display results:
Readers first pay attention to the relationship table and find out all the authors who follow them. Then go to each author's outbox to query the articles, and then put them together in chronological order, and finally display them on the mobile phone.
In this way, it has gone through the process of looking up the table for many times, and the efficiency is very low.
There is, of course, another way.
Readers go directly to their inboxes and get the articles at once.
This efficiency is obviously quite high, and when it can be found out directly, it is arranged in order and displayed directly on the mobile phone.
Of course, this requires that every time the official account author posts an article, he writes his own article into the inbox of all the users who follow him.
Obviously, although the reader's logic when reading the follow stream is simple and efficient, it is at the expense of the author's logic of publishing the article.
From this point of view, this is at the expense of write performance for read performance.
On the other hand, the original articles published by the author have been stored in the author's outbox, at this time redundant storage in the user's inbox, but also multiple copies, taking up extra space.
From this point of view, this is exchanging space for time.
Of course, you can also think about it from the perspective of the entire timeline, which is equivalent to sharing the time spent reading the logic of the follow stream in the logic of each official account author's post.
From this point of view, this is an even sharing of complexity.
Shit, with such a broken logic, is it necessary to pretend to be forced to say so many terms at the architectural level? Then we must keep in mind that this stupid thing is called push model in the design of focus flow architecture.
The implication is that the author pushes his published articles to every reader who pays attention to him.
This is no problem at all when there are not many official accounts and readers in the early stage.
The push model in the last lecture on the delayed push model has been able to hold up for a long time in the early days.
But gradually, more and more readers know the official account, also began to pay attention to more official account, and gradually there are more than ten thousand fans of the official account owner.
Once the official account has a large number of fans, every time it publishes an article, it will be pushed to a considerable number of readers.
The cost of publishing an article is quite high.
However, the reader's experience cannot be ignored, so what should we do?
We divide users into two categories, one is active users who often browse official account articles, and the other is inactive users who may only open it once in several days.
After the author publishes the article, he only pushes it to active users immediately, regardless of those inactive users.
What about inactive users? Delay push.
That is, when an inactive user opens his official account follow stream again, he will first trigger the push logic of the article, push the articles that are not synchronized to his inbox, and then follow the normal logic to extract the follow stream data from his inbox.
This can be said to be a kind of thought of separating hot and cold.
Cold data is those inactive users, hot data is those active users.
This design approach is the second stage of the focus flow, delaying the push model.
Of course, it also comes at the expense of something, that is, the experience of inactive users refreshing the official account follow stream for the first time.
But this does not matter, inactive users originally open the official account a few times, and only affect the first time, and may also evolve into active users.
Pull model delay push model, already enough to support a considerable amount of traffic, but further development, monsters appear.
There will be a large number of fans, frequent posts, and their fans are all active users, big V account.
For example, the rich cow and cat, the big boss in Japan, 100000 + read the article.
This kind, no matter is push model or delay push model, can not bear, how to do?
Do you remember saying that there is another way for readers to query their own stream of attention in addition to pushing the model?
Readers first pay attention to the relationship table and find out all the authors who follow them. Then go to each author's outbox to query the articles, and then put them together in chronological order, and finally display them on the mobile phone.
In this way, it is the opposite of pushing the model, pulling the model.
In this way, we can solve the problem that the author of Big V publishes an article that needs to be pushed to too many people.
But again, the drawback of this approach has been mentioned before, that is, if users follow a lot of authors, they need to look up the table several times to put the final data together.
The problem we have right now is that big V writers have the tedious troubles of tweeting chapters, and non-big V writers can still use push models.
So we divide it according to the dimension of the author, which is divided into big V author and ordinary author.
And this model obviously can not be a single push model or pull model, it needs the combination of push and pull.
What kind of push-pull combination is push-pull combination?
First of all, the Big V author no longer tweets articles to fans and only keeps them in his own outbox.
When readers refresh the official account follow stream, the first thing is to go to their own inbox and check out all the articles. This is the push model.
After that, we should also find out all the big V authors among the authors of the official account that he follows.
If not, at this point, go directly back to the mobile follow stream data for display.
If so, go to the big V author's outbox and look up the article (note that here only check the most recent articles, there is a time limit, otherwise.), this is the pull model.
Okay, now there are two pieces of data, one is the push model, checked from the reader's inbox, and the other is the pull model, from the big V author's outbox.
Mix them back together in chronological order and return them to the phone.
This is the combination of push and pull.
The architecture ideas involved are re-combed. At first, in order to enable users to read the response time of the attention stream quickly, the push model is adopted. After the official account author posts the article, it is immediately pushed to all fans.
From three angles, we talk about this kind of design idea in architecture design.
Sacrifice write performance for read performance
Trade space for time
Spread the complexity equally
Later, the number of fans became more and more, and the contradiction in the push stage of the push model was highlighted, so we should find a way to flatten it.
Considering that only a small number of readers are active users, and most readers are inactive (if most readers are active, the number of articles read on our official account has already exceeded 10,000), the push phase is only pushed to active users, and inactive users use delayed push, which is the delayed push model.
From an architectural point of view, this is
Hot and cold separation
The mind of.
In the future, the division of the user level can not bear the big V author, and then consider dividing the author into the big V author and the ordinary author. The ordinary author uses the push model or the delayed push model, and the big V author uses the pull model. This is a push-pull model.
Of course, this is also the idea of the separation of hot and cold.
For more specific details, if the outbox of the author of the official account is stored in a database such as mysql or hbase, then the published articles of the author of Big V can be further stored in Redis, and the reader will become faster when pulling the data of the author of Big V, which is a more subtle separation of hot and cold.
For example, in fact, most users may not pay attention to the Big V author, but they still have to go to the user's own concern table, find out all the authors they follow, and then judge whether they are Big V authors one by one.
When the result in most cases is that there is no big V author, we can optimize it by storing a field in the user table indicating whether the user is related to the big V author.
In this way, after the user pays attention to a big V author, there is one more step to modify the logic of this field, but in return, most readers may not need to query when querying the big V author, which is obviously appropriate. This is another idea of sacrificing writing performance for reading performance, or sacrificing space for time.
And as a whole, we keep finding optimization points and sacrificing this for that, which is a tradeoff in architectural design, in a more pretentious word, Trade-Off. In fact, it is a tradeoff in English.
If you can always be surrounded by the idea of Trade-Off in architectural design, whether it's understanding the design of many components, the architecture of the business, or answering the interviewer's questions during the interview, it will make people feel that your mind is at least clear and that communication with you is smooth.
Remember it, Trade-Off!
Oh, I almost forgot what I was talking about today. I can barely add that this stupid thing is the stream of attention.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.