How to analyze the Evolution History of LinkedIn Architecture 07/11 Update SLTechnology News&Howtos

How to analyze the Evolution History of LinkedIn Architecture

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

Today, I will talk to you about how to analyze the evolution history of LinkedIn architecture. Many people may not know much about it. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something from this article.

LinkedIn was founded in 2003, the main goal is to connect your personal contacts to get better job opportunities. There were only 2700 members in the launch week, and over the next few years, LinkedIn's products, membership, and server load all grew rapidly.

Today, LinkedIn has more than 350 million global users. We have tens of thousands of pages visited every second every day, and mobile traffic accounts for more than 50%. All these API requests are obtained from the background, up to * per second.

So, how do we do it?

Early years-Leo

LinedIn began to be like many websites, with only one application service doing all the work. We call this app "Leo". It contains all the website pages (JAVA Servlets), as well as business logic, while connecting to a lightweight LinkedIn database.

Ha! The form of the website in the early years-simple and practical

Membership chart

* what needs to be done is to manage the social network between members. We need a system to graphically traverse the connected data between users while residing memory to achieve efficiency and high performance. Judging from this different requirement, it is clear that we need Leo that can be independently extensible. A separate graphical system for members, called "Cloud", gave birth to LinkedIn's * services. In order to make the chart service independent of Leo, we use Java RPC to communicate.

Probably in the meantime, we also have a demand for search services, and the membership chart service is also providing data to the search engine-Lucene.

Copy a read-only database

As the site continues to grow, so does Leo, adding roles and responsibilities, and naturally increasing complexity. The multi-instance Leo of load balancing is running. But the new load has also affected other key LinkedIn systems, such as the membership information database.

A common solution is to scale vertically-that is to add more CPU and memory! Although we have bought time, we have to expand it in the future. The membership information database accepts read and write requests. In order to expand, a copy of the slave database appears, which is a copy of the member master database, using early databus to ensure data synchronization (now open source). The slave library takes over all read requests and adds logic to ensure that the slave library is consistent with the master library.

When the master-slave architecture worked for a while, we turned to database partitioning

When the website starts to look like more and more traffic, our application service Leo often goes down in the production environment, it is difficult to troubleshoot and recover, it is also difficult to release new code, high availability is critical to LinkedIn, it is clear that we have to "kill" Leo and then split it into small functional blocks and stateless services.

Service-oriented Architecture-SOA

Engineers start by decomposing micro-services that bear the burden of API interfaces and business logic, such as search, personal information, communications, and group platforms, followed by presentation layer decomposition, such as recruitment functions and public introduction pages. New products and services are placed outside the Leo, and soon the vertical service stack for each functional area is complete.

We have built a front-end server that grabs data models from different domains for presentation, such as generating HTML (through JSPs). We also build a back-end data service that provides API interface access data model and database consistency access for middle-tier services. By 2010, we had more than 150 individual services, and today we have more than 750 services.

Because it is stateless, new service instances can be stacked directly and extended with hardware load balancing. We draw a red line for each service to know its load, so as to develop early countermeasures and performance monitoring.

Caching

LinkedIn is growing predictably, so further expansion is needed. We know that centralized access can be reduced by adding more caches. Many applications begin to use middle-tier caching such as memcached/couchbase, while caching is also added in the data layer, and some scenarios begin to use useVoldemort to provide pre-result generation.

After a while, we actually removed a lot of middle-tier caches, which often come from multiple domains, but caching is only useful to reduce load at the beginning, but updating cache complexity and generating charts become more difficult to control. Keeping the cache closest to the data layer reduces the potentially uncontrollable impact while allowing horizontal scaling to reduce the load on the analysis.

Kafka

In order to collect growing data, LinkedIn has started many custom data channels to stream and queue data. For example, we need to save data to a data warehouse, we need to send bulk data to Hadoop workflows for analysis, we aggregate a large number of logs from each service, and we track a lot of user behavior. For example, click on the page, we need to team up the InMail message service, we need to ensure that the data searched is * when the user updates his profile, and so on.

When the site is still growing, more customized pipeline services appear, because the site needs to be extensible, separate pipes also need to be scalable, and sometimes it is difficult to choose. The solution is to use kafka, our distributed publish-subscribe messaging platform. Kafka becomes a unified pipeline service that generates summaries based on submitted logs and is fast and extensible from the start. It can access all data sources in real time, drives Hadoop tasks, allows us to build real-time analysis, widely improves our site monitoring and alarm capabilities, and supports visualization of calls. Today, Kafka handles more than 500 million events a day.

Reverse

Expansion can be measured in multiple dimensions, including organizational structure. In late 2011, an innovation began within LinkedIn called "Inversion". We suspended the development of new features, allowing all development departments to focus on improving tools and deployment, infrastructure and practical development. This is very helpful for today's agile development of new extensible products.

In recent years-Rest.li

When we move from Leao to service-oriented architecture, the previous JAVA-based RPC interface has begun to split in the team and is too tied to the presentation layer, which will only get worse. To solve this problem, we have developed a new API model called Rest.li, which is a data-centric architecture while ensuring data consistency and stateless Restful API throughout the company's business.

Based on HTTP JSON data transfer, our new API is finally very easy to support to non-java written clients. LinkedIn still mainly uses Java today, but also makes use of Python, Ruby, Node.js and C++ according to the existing technology distribution. Apart from RPC, it also decouples us from front-end presentation layer and back-end compatibility issues. In addition, using Rest.li based on dynamic discovery technology (D2), each service layer API gains the ability of automatic load balancing, discovery and expansion.

Today, all LinkedIn data centers have more than 975 Rest.li resources and more than 100 billion Rest.li calls per day.

Rest.li R2/D2 technology stack

Super block

Service-oriented architecture is very helpful for domain decoupling and service independent scalability, but the disadvantage is that most of our applications need a lot of different types of data, which will produce hundreds of extended calls in sequence. This is commonly referred to as the "call diagram" or the "fan-out" that accompanies so many extended calls. For example, any personal information page contains much more than albums, links, groups, subscriptions, followers, blogs, networking dimensions, and recommendations. Invoking diagrams can be difficult to manage and will only make events more and more irregular.

We used the concept of "super block"-a grouped background service with only a single API interface. This allows a team to optimize a "block" while ensuring that the invocation of each client is controllable.

Multiple data centers

As a global company with fast-growing membership, we need to expand our data center, and we have worked hard for several years to solve this problem. First, we have provided public and personal information from two data centers (Los Angeles and Chicago). This shows that We can already provide enhanced services for data replication, remote calls from different sources, separate data replication events, and assigning users to geographically closer data centers.

Most of our databases run on Espresso (a new internal multi-user data warehouse). Espresso supports multiple data centers, provides master-master support, and supports complex data replication.

Multiple data centers are incredibly important for high availability, and you need to avoid a single point of failure that causes not only one service failure, but also the failure of the entire site. Today, LinkedIn runs three primary data centers, as well as global PoPs services.

What else have we done?

Of course, our expansion story is always more than that. Our engineers and operations teams have done countless work over the years, including these major pioneering ones:

Over the years, most key systems have their own rich extended evolution history, including membership chart services (* * services outside Leo), search (the second service), news feeds, communication platforms and member profile backend.

We also built a data infrastructure platform to support the growth for a long time, which was the actual combat of Databus and Kafka. Later, we used Samza for data flow services, Espresso and Voldemort for storage solutions, Pinot for system analysis, and other custom solutions. In addition, our tools have improved, such as engineers can automate the deployment of these infrastructure.

We have also used Hadoop and Voldemort data to develop a large number of offline workflows for intelligent analysis, such as "people you might know", "similar experiences", "alumni who are interested" and "resume browsing maps".

We reconsidered the front-end approach and added client templates to the mixed page (personal Center, my college page) so that the application could be more interactive, as long as JSON or part of the JSON data was requested. Also, template pages are cached through CDN and browsers. We also started using the BigPipe and Play frameworks to transform our model from a threaded server to a non-blocking asynchronous one.

Outside of the code, we use Apache's multi-tier proxy and HAProxy for load balancing, data center, security, intelligent routing, server rendering, and so on.

Finally, we continue to improve server performance, including optimizing hardware, memory and system advanced optimizations, taking advantage of the new JRE.

After reading the above, do you have any further understanding of how to analyze the evolution history of LinkedIn architecture? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.