Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Fragment layer Analysis of rewriting Schemaless Database with Go in Uber

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "Uber uses Go to rewrite the fragmentation layer analysis of Schemaless database". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Abstract: in 2014, Uber built a scalable fault-tolerant database Schemaless, but with the growth of business, the original implementation consumed more resources and the request delay increased. In order to maintain the performance of Schemaless, Uber rewrote the fragmentation layer of Schemaless database with Go without affecting production services, and completed the Frontless project to migrate the product system from the old implementation to the new implementation.

In 2014, Uber Project built a scalable fault-tolerant database Schemaless, which facilitates the rapid development of the company. We deployed more than 40 Schemaless instances and thousands of storage nodes in 2016 alone.

As our business grows, so does our resource consumption and latency; in order to maintain the performance of Schemaless, we need a solution that can support large-scale applications. After making it clear that if the Python worker nodes of the existing Schemaless "cluster" are rewritten in Go (a language that supports lightweight concurrency), our database can achieve a significant performance improvement, we have completed the task of migrating the product system from the old implementation to the new implementation without affecting normal production. This task is called the Frontless project, and it proves that we can rewrite the front end of a large database without affecting production services.

In this article, we will discuss how to migrate the Schemaless sharding layer from Python to Go, a change that allows us to handle more traffic with fewer resources, thereby improving the user experience of our service.

Background of Schemaless

As a Mezzanine project, Schemaless was first launched in October 2014 with plans to migrate Uber's core trip database from a stand-alone Postgres instance to a highly available database.

The Mezzanine database containing core trip data is built as the first Schemaless instance. Since then, more than 40 Schemaless instances have been deployed for many client services. For the complete historical evolution of our internal database, please refer to our three articles series, Schemaless design, architecture, and triggers overview.

In 2016, there are thousands of worker nodes running in Schemaless instances, and each worker node consumes a lot of resources. The worker node was originally built using Python and a Flask microframework in the uWSGI application server process delivered by NGINX, with each uWSGI process processing one request at a time.

The model is simple and easy to build, but it can not effectively meet our needs. In order to handle additional synchronization requests, we must add more uWSGI processes, each as a new Linux process that requires additional overhead, thus fundamentally limiting the number of concurrent threads. In Go, goroutines is used to build concurrent programs. Goroutine is a lightweight design and is a thread managed by Go's runtime system.

In order to study the optimization gain of rewriting Schemaless fragmentation layer, we create an experimental working node, which implements an endpoint with high frequency and high resource consumption. The results of the rewrite show that latency is reduced by 85%, and resource consumption is reduced even more.

Figure 1: the figure depicts the latency of median requests for endpoints implemented in Frontless form

After this experiment, we made it clear that rewriting will enable Schemaless to support dependent services and worker nodes in all its instances by releasing CPU and memory. With this knowledge base, we launched the Frontless project to rewrite the entire Schemaless sharding layer with Go.

Frontless architecture design

In order to successfully rewrite this important part of the Uber technology stack, we need to ensure that our reimplementation is 100% compatible with existing worker nodes. We made a key decision to verify the relationship between the new implementation and the original code, which means that each request to the new Go worker node gets the same result as the previous request for the Python worker node.

We estimate that a complete rewrite will take us six months. In the meantime, new features and bug fixes implemented in Uber's production system will take place in the case of Schemaless, so our migration has a dynamic goal. We chose iterative development so that we could continuously migrate functionality from the legacy Python code base at one endpoint and validate it in the new Go code base at the same time.

Initially, the Frontless worker node is just an agent in front of the existing uWSGI Schemaless worker node, through which all requests pass. The iteration will start by re-implementing an endpoint and then validating it in production; when there are no more errors, the new implementation will be officially online.

From a deployment perspective, Frontless and uWSGI Schemaless work together to build and deploy, which makes it possible to achieve a unified Frontless in all instances and support validation of all production scenarios at the same time.

Figure 2: during our migration, a service called worker node, where Frontless and Schemaless are running in the same container. Frontless then receives the request and decides whether it should be forwarded to Schemaless or processed by Frontless. Finally, Schemaless or Frontless takes the result from the storage node and returns it to the service.

Read endpoints: comparison verification

We first focus on re-implementing the read endpoint with Go. In our initial implementation, the read endpoint processing on the Schemaless instance consumed an average of 90% of the traffic, and it was also the most resource-intensive.

When an endpoint is implemented in Frontless, a validation process is initiated to detect differences with the Python implementation. When Frontless and Schemaless perform the request operation, validation is triggered and the response results are compared.

Figure 3: when a service sends a request to Frontless, it forwards the request to Schemaless, which generates a response by querying the storage node. The response made by Schemaless is then returned to Frontless and forwarded to the service. Frontless will also create the response by querying the storage node. These two responses are built by Frontless and Schemaless, and if there are any differences, the results are sent to the Schemaless development team as an bug report.

Using this method of authentication doubles the number of requests sent to the storage worker node; in order for the number of requests to work properly, we add configuration flags to activate authentication for each endpoint and adjust the percentage threshold for request validation. This enables validation of any part of the specified endpoint to be enabled or disabled in seconds.

Write endpoints: automatic integration testing

Write requests for Schemaless can only succeed once, so in order to verify these, we can no longer use the previous strategy. However, because writing endpoints in Schemaless is much easier than reading endpoints, we decided to test them through automated integration testing.

We have set up an integrated test environment so that Schemaless Python and Frontless Go can run the same test scenario. Tests are automated and can be performed locally or through continuous integration within minutes, which can speed up the development cycle.

In order to test our implementation on a large scale, we set up a Schemaless test example, where the traffic test simulates production traffic. In this test example, we migrated the Python write implementation of Schemaless to Frontless and ran verification to ensure that the write was correct.

Finally, once all the implementations meet the production environment, we can slowly migrate the traffic writes implemented by Schemaless's Python to Frontless through the run-time configuration, so that some of the traffic writes can be moved to the new implementation in a matter of seconds.

The achievements of Frontless

As of December 2016, all Mezzanine databases were processed by Frontless. As shown in figure 4, the median delay for all requests has been reduced by 85%, and the request delay for p99 requests has been reduced by 70%:

Figure 4: the above figure shows the processing time of database requests when implemented by Python (the working language of Schemaless, represented in red) and Go (the working language of Frontless, shown in blue).

With the implementation of our Go, the CPU usage of Schemaless has decreased by more than 85%. This increase in efficiency allows us to reduce the number of working nodes used in all Schemaless instances, which are also based on the same QPS as before, which improves node utilization.

Figure 5: the figure above shows the CPU usage in a stable request flow handled by Python (Schemaless working language, red) and Go (Frontless working language, blue) in our database.

The Future of Frontless

The Frontless project shows that it is possible to rewrite a critical system in an entirely new language with zero downtime. By reimplementing the service without changing Schemaless's existing clients, we can implement, validate, and enable endpoints in days rather than weeks or months. The point is that the validation process (the new endpoint implementation compares to the existing one in production) gives us confidence because Frontless and Schemaless can get the same results.

Most importantly, however, our ability to rewrite critical systems in production proves the scalability of the Uber iterative development process.

This is the end of the introduction of "Uber rewrites the fragmentation layer analysis of Schemaless database with Go". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report