Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to build Reddit Advertising Service system

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article is to share with you about how the Reddit advertising service system is constructed, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

How Reddit uses Go to build its advertising service system and learn from the process.

Summary

The Reddit engineering team recently introduced Go into its stack to write a new advertising service system to replace third-party systems. Deval Shah introduced us to the architecture of the new service, the experience of the Reddit team using Go for the first time, and all the lessons they learned to build the advertising server using Go.

Introduction to Reddit

Reddit is the home page of the Internet, a social network with tens of thousands of interest communities where people can discuss things that are important to them.

Reddit's numerical ranking:

No. 5 / 18 (USA / World) Alexa Rank

330m + MAU

138k active community

1200w articles per month

Monthly 2B voting

Any system built by Reddit must be able to handle this level of traffic.

Overview of Advertising Architecture

The advertising server needs to handle the entire advertising process. The advertising server handles all content from the display of the advertisement to any post-processing after the advertisement.

Advertising service @ Reddit

Reddit Advertising Server has several requirements:

Extension: every request on Reddit goes into the advertising system, so it has to deal with large-scale demand.

Speed: the advertising server must be fast. They don't want advertising to be a performance bottleneck that reduces the user experience. They asked for a reply to the advertisement within 30 milliseconds.

Auction: make sure that the server can choose the best advertisement based on the bid.

Pace: the server must be able to allocate advertisements in the best way.

The previous advertising service @ Reddit:

Previously, every time a user visited reddit.com, the monolith backend of reddit sent a request to a third-party advertising server. The third-party server will respond with one or more advertisements of its choice and return it to the user.

After a while, they realized that continuing to use a third-party advertising server would not be useful to them because of it:

Slow

Less customizable: third parties do not support many of the changes they want to make.

Opaque in operation: they cannot know how certain things are implemented, can not control the quality of advertising, and so on.

We decided to set up an advertising server and a team of three people. Starting with infra, write the service, and then extend it to the system in production.

Advertising service infrastructure:

Some noteworthy tools used in the advertising server infrastructure:

Apache Thrift for all RPC. Thrift has been around since 2007, and Reddit has been using it from the very beginning.

RocksDB is used for data storage. It is an OSS key-value store built by Facebook. It is an embeddable data store that avoids network jumps and is optimized for high reads and writes.

They also decided to use Go as the main back-end language. This is the first time that Reddit has used Go in production. Before that, Reddit mainly used Python and Java. The team wants to make sure that Go becomes a first-class citizen in the language set used by Reddit and supports everything Reddit needs.

Advertising server architecture:

This is the architecture of the new advertising server:

A brief overview of how it works:

Reddit.com invokes a service called the advertisement selector. This is the first service in the advertising infrastructure. This is a Thrift service and receives requests from reddit.com. It then calls a function called getAds, which processes getting and returning advertisements to be displayed to the user. The advertisement selector then invokes the enrichment service.

The enrichment service is responsible for obtaining more data and information about the requests, users, and other information needed to find and select the most relevant ads. It collects all this information and returns it to the ad selector.

After receiving a response from the enrichment service, the ad selector selects to add, and then returns the ad to reddit.com for display to the user. It also sends a reply to Kafka.

After showing the advertisement to the user, some post-processing is needed. The client sends an event HTTP request to the event tracker service. This activity confirms that the advertisement has been launched. This event notification is also taken to Kafka.

Kafka provides data for two Apache Spark jobs:

The event statistics flow job is always running, and it writes enhanced services to provide advertising information for learning to choose better.

There is also the Pacing loop, which involves Pacing Spark work. This involves a streaming job, calculating the number of ads displayed by each advertiser, and another job to ensure the best display of ads.

In this architecture, Go services are:

Ad selector:

There are P99 requirements of 30ms

Involves complex business rules for location and selection

Bidding: all advertising, business logic rules, is in the competition to get the ad display, and the ad selector will deal with this problem.

Event tracking:

1ms P99 requirement

Confirm logs and events

Need to be highly reliable

Enrich the service:

Frugal service

Return the data to the ad selector

There is an embedded RocksDB database

4mm P99

For each request, it does a prefix scan in Go and fetches a pile of data for calculation and aggregation. Our idea is to avoid network jumps to get information to ensure that we provide a quick response.

Some of Reddit's other Go tools and services are not discussed in depth:

Reporting service

Vault management tools

Advertising event generation service

Our Go experience

This is the first experience of Reddit and Go. It's been a great experience so far, says Mr Deval. This work started with two or three engineers using Go and has grown to about a dozen engineers working on Go.

The main advantages they see at Go are:

Speed up developers: new engineers can join and quickly familiarize themselves with the code. Go's emphasis on simplicity, rapid deployment, and compile time means a tight feedback loop, which helps a lot.

Excellent out-of-the-box performance: there are not many tools or optimizations to run quickly except to follow best practices. Compared with his past experience of adjusting JVM and dealing with garbage collection, this is a good experience for Deval.

Easy to focus on business logic: business logic is the difficult part, and the simplicity and out-of-the-box performance of Go help the team focus on it.

Finally, advertising service latency drops sharply: response time drops from 90 milliseconds to less than 10 milliseconds.

Learn a lesson

This is a series of problems, how Reddit deals with them, and the knowledge learned from these challenges.

Question1: how to build production-ready microservices?

Reddit has worked for Python before, but not for Go.

The original prototype works through a lot of StackOverflow reads and Google searches, but obviously won't expand with developers.

Some of the problems they see are:

Records, indicators, etc. are everywhere.

It is difficult to change the transport layer

We need repeatable patterns.

They realized that the Go community had solved these problems, so they studied the existing framework for solving them. Some of the choices they encounter:

They think Go-Kit is the most meaningful. The main reasons why Reddit chose Go-Kit are:

Support for Thrift

It's flexible, not very descriptive. If Reddit wants to move to gRPC, they want to be able to migrate easily.

There are tools for recording, measuring, rate limiting, tracking, disconnection, and so on, which are standard requirements when running micro-services in production.

Go-Kit @ Reddit. This is a diagram using Go-Kit:

There are some noteworthy things about this architecture. The central service has two implementations: an in-memory implementation (which is good and can be used for prototypes), and a RocksDB implementation for production implementations. Local development is still implemented in memory.

There are several middleware layers: tracing, logging, and measurement. Finally, Thrift transport is at the top level. This structure makes it easier to change. For example, if they want to change the transport layer from Thrift to gRPC, they only need to change the top layer.

Using Go-Kit is beneficial because it provides a good example of how to build Go code for the team. They have no previous experience in this area, so using Go-Kit helps to understand the typical structure of Go services.

Lesson 1: use frameworks / toolkits. It is not necessary for everything you use Go, but for production services that require metrics, logging, and so on, use a library that has solved the problem instead of trying to do it yourself.

Question 2: how to launch a new system safely and quickly?

The ultimate goal is to launch a new advertising server with minimal impact on Reddit users, payment advertisers, and other internal teams that rely on the advertising team. The third-party advertising server is a black box, and Reddit needs a quick way to iterate, learn, and improve.

It's like changing the plane in mid-flight. They slowly add new infrastructure around their third-party services, and when it's ready, they tear it up:

They first inject the ad selector into the request path, using it purely as an agent. The system performs the same action as before, but the ad selector is in place. This allows them to extend the request through the advertisement selector without actually doing anything.

Then, they are not only agents, but also implement and launch native advertising choices in the advertising selector service. Now, the advertisement selector will process the request internally, but it will still act as an agent and pass the request to a third party, and the system will still use the third party response.

Then they added Event Logger to log native responses and set up Kafka.

They continue to build the rest of the service, starting with the stub service, and add logic in the process.

Eventually, once everything is in place, they will cut off the third-party advertising server.

With the help of these Go features, Go allows them to safely and easily migrate to a new advertising server:

The Go compiler is fast.

Support for cross-platform compilation

Self-contained binary file

Strong concurrency primitive

Teach 2:Go to make fast iterations simple and safe.

Question 3: how to debug latency problems?

After deploying the new advertising server, they did see some problems such as slowness, network failure, poor deployment and so on.

Pprof is great if you know exactly which service has a problem. On the other hand, distributed tracing allows you to view services. They don't support distributed tracking on advertising, but they do support it elsewhere on the Reddit stack.

Why is tracking useful?

Identify hotspots that cause high overall latency

Help find other errors / unexpected behavior

Tracking is usually easy, you have a client and a server. On the client side, you extract the trace identifiers and inject them into the requests of the server you send. On the server side, when you get the request and identifier, put them into the context object and pass them. Using HTTP and gRPC is very simple, and there is no reason not to do so.

However, reddit is dealing with Thrift, so they have some problems.

They looked at Thrift alternatives, Facebook Thrift and Apache Thrift. The two key features they are looking for are support for title and context objects:

They try to use FB thrift, but there are some problems, mainly the lack of context objects, which leads to code confusion and complexity. In Apache thrift, context objects are supported, but it does not support header files. Therefore, the solution is to add headers to the Apache Thrift. This has been done for other languages, but does not apply to Go. Therefore, they add THeader to Apache Thrift. This means that context objects are now supported, and the header file can store trace identifiers.

If you want to see these changes, you can check https://github.com/devalshah88/thrift. Deval wants to get the changes through the contribution process and merge them upstream.

This is a tracking code. The client wrapper only extracts trace information from the context object and adds it to the headers:

The server wrapper takes information from the header and injects it into the context object so that it can pass:

This code is from https://github.com/devalshah88/thrift-tracing.

After all this work, distributed tracing has proved to be very useful in debugging latency issues. However, the conclusion we have come to is the third lesson: it is difficult to use frugality and Go for distributed tracking.

Question 4: how to deal with slowness / timeout?

At Reddit, they want the system to handle slowness gracefully. They never want users to be affected, so if the speed is slow, Reddit would rather not display ads than reduce the user experience.

Their two goals are:

Don't keep users waiting too long.

Don't waste resources doing unnecessary work.

Use context objects to enforce timeouts in the service: this is code from the enrichment service that adds a due date to the context object, passes it, and exits early when the deadline expires.

The result is good, but not enough:

The first figure shows the time it takes to get a response from a fleshing out service. This particular time frame is a bit slow, but it doesn't keep users waiting for more than 25 milliseconds.

The second chart shows that on the server side, the enhanced service is processing requests up to 70 milliseconds, so the server wastes resources after the client has timed out and no longer needs a response.

What you usually do is use HTTP to propagate deadlines. This code adds a timeout, which is passed to the server through the context object:

Thrift makes this difficult. Context objects are not used here. If the client times out, goroutine does not know and does not exit:

This method is not very good, but there is a way to solve the problem:

One option is to add a deadline for the request payload. The customer needs to include the due date in the request. The server injects the due date into the context object and uses it. This is not good because this change must be made at all endpoints.

Instead, they adopted deadlines as the title of frugality. This is similar to the way they pass trace identifiers. After this change, on the server side, they see a delay similar to that of the client:

Lesson 4: use deadlines within and between services.

Question 5: how do I ensure that the new features do not degrade performance?

Fast iterations and complex business logic can cause performance problems. Advertising service teams need processes and tools to ensure that they can move quickly without violating the latency SLA. To do this, they used load testing and benchmarking.

Use the bender for load testing:

This is the response you get from Bender:

Load testing is useful for testing changes under heavy loads and allows developers to optimize new features to achieve high loads before pushing them to production.

They also use benchmarks for all critical systems. This benchmark code:

Get this output:

Benchmarking helps:

Prevent demotion by changing and slowing down

Let you know how things change over time.

Inform developers of tradeoffs that exist between different implementations

Lesson 5: benchmarking and load testing are easy. Do it!

Review:

Use frameworks / toolkits

Go makes fast iterations simple and secure

It is difficult to use Thrift and Go for distributed tracking

Use deadlines within and across services

Use load testing and benchmarking

Conclusion:

Go helps reddit build and expand a new advertising service platform-easy to build and fast

We shared five important lessons we learned in the process.

Try to use at least one of them in the next Go project

The above is how the Reddit advertising service system is constructed, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report