How online DDL works and how Vitess helps handle schema migration 07/03 Update SLTechnology News&Howtos

How online DDL works and how Vitess helps handle schema migration

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly analyzes the principle of online DDL and how Vitess helps to deal with the relevant knowledge points of pattern migration. The content is detailed and easy to understand, and the operation details are reasonable. It has certain reference value. If you are interested, you may wish to follow Xiao Bian to have a look. Let's follow Xiao Bian to learn more about the principle of online DDL and how Vitess helps deal with pattern migration.

Vitess introduces a new approach to running pattern migration: non-blocking, asynchronous, scheduled online DDL. With online DDL, Vitess simplifies the schema migration process by taking ownership of the operational overhead and providing users with a simple, familiar interface: the standard ALTER TABLE statement.

Let's start with some background and explain why schema migration is such an important issue in the database world, and then dive into the implementation details.

Relational models and operational overhead

The relational model is one of the oldest models in the software world, introduced decades ago and still widely used today. SQL is equally old and reliable, and SQL or SQL-like languages can be found even in non-relational databases.

Relational models make sense for many common use cases, entities with attributes (tables and columns, respectively) map well to popular structures such as users, products, membership, messages, etc., and SQL is expressive enough to construct simple and complex problems.

But historically, relational models have come at a cost. While many database systems are optimized for reads and writes, they are not optimized for metadata changes, especially schema changes. One of the biggest challenges with such changes is that they require an operating procedure and are mostly outside the developer's domain.

In the early days, it was common for database administrators (DBAs) to act as bodyguards for databases. They delay "crazy requests" from developers. Requests for changes require lengthy procedures and paperwork.

Fortunately, these days are over and we are much more collaborative on continuous deployment and rapid development. However, new changes have exacerbated the problem. In the past, you needed to change patterns once a month; maybe once a few months. You should be prepared for this by launching a new version. It's not uncommon for the world's busiest database deployments to run multiple schema migrations each day.

This reintroduces and reinforces the schema migration problem: the process is largely outside the developer's domain. It requires them to be database experts. With multiple migrations per day, they need to collaborate and synchronize with other developers in ways that are incompatible with their own development processes (this is completely different from comparing and merging git branches, for example). In small companies, you'll see developers owning and running their migrations only when they see fit, but that doesn't scale, and the larger the product and organization, the more a more formal process is needed.

In the MySQL world, direct schema migration is blocking, if not on the master server, then on the replica. They are aggressive with resources and cannot be interrupted or suppressed. Online schema change tools have been around for more than a decade, but they introduce their own complexity: you need to install them with the database, allow access, schedule execution, log in, execute, inform the tools how to restrict, handle errors, provide visibility for their operations, and so on. In larger companies, it is common to have dedicated DBAs or Ops teams performing schema changes manually. These teams can spend hours a week or even a day dealing with the operational overhead of pattern migration.

For developers, this is a loss of ownership. While they have the idea of adding columns to a table, they need to ask for help from outside teams and often wait without much visibility into progress status. This breaks their process. Perhaps one of the greatest attractions of NoSQL databases is that they don't impose this level of constraint on a developer's processes.

For DBAs, schema migration is a burden. Some developers have unexpected interruptions to their own workflows.

some of the complexities of the operation

The operational overhead begins with the fact that the schema migration spans multiple domains. Let's look at the incomplete decomposition of the pattern migration flow:

Regularization: Some people (developers?) Regularized migration is required. Usually someone proposes a CREATE, DROP, or ALTER TABLE statement. Is this correct? Is it grammatically valid? Does it conflict with existing practices? Discovery: Where does this statement need to run in the production environment? Developers may not know how patterns are deployed across different clusters. What is the discovery mechanism? And, if we find the right cluster, which server is the primary server for that cluster? Is the data fragmented? If so, how do we detect all the debris? Scheduling: Has the migration been run on the required clusters? Databases respond poorly to concurrent migrations; it is best to run them sequentially. Do we have to wait? How long? If we're going to sleep, who's gonna take our place? Are we gonna lose another day's work? Execution: Do we need to log on to a server? Where should we run our online schema migration tool? What command line flags should we pass? Monitor: Can we talk about progress? Can we make sure everyone sees it? How do we notify interested parties when the migration is complete? Clean-up: MySQL's schema migration tools leave behind artifacts: large tables that need to be deleted. Deleting tables is a problem in itself. How do we automatically clean these artifacts? Recovery: If migration fails, how do we proceed? Is there any other cleanup to be done?

For multiple clusters (some of which are fragmented) in a production environment, what is the discovery mechanism used to identify the correct cluster? What is the host of the cluster used to apply the schema change?

Vitess helps with the schema migration process

Vitess 'architecture puts it in a unique position to handle most schema migration processes. For example:

Discovery is trivial. Vitess internally maps all patterns to shards and clusters and knows where to apply the migration (or query) at any given time. Vitess simulates a single database. Users access Vitess via vtgate, which is an intelligent agent that can semantically understand queries. When Vitess intercepts a query, it does not have to strictly send the query to the backend database server. For online DDL, Vitess notices the schema change request and schedules it afterwards. The appropriate back-end tablet will receive the request and schedule it individually to avoid running concurrent migrations. Tablet performs its own online schema migration tool. Today, Vitess supports pt-online-schema-change and ghost-ost. Tablet decides how to execute these tools, provides command line flags, and any necessary hooks/plugins. Also, on Linux/amd64, ghost-ost is precompiled and bundled with Vitess, so no installation is required. Vitess instructs the tool to use its own internal throttling mechanism, so the tool doesn't have to worry about which copies to monitor. Throttling mechanism can dynamically adapt to topology changes. Vitess provides an interface to query migration progress across all fragments. In addition, it provides an interface for aborting a migration or retrying an aborted or failed migration. Vitess knows which artifacts are generated by the schema migration tool. In fact, it instructs them what artifacts to generate. Success or failure, Vitess can be cleaned up after migration. It will send the artifact list to the garbage collection mechanism. It will clean up legacy triggers for pt-osc migration. Vitess creates a temporary account for each migration and destroys it after the migration is complete. Vitess knows when a migration fails and runs an appropriate cleanup, even if Vitess itself fails during the migration process. At this point, Vitess provides a one-time automatic retry for migration failures caused by failover.

Vitess knows where the schema is deployed, what fragments exist, who is the master at any given time, and can apply DDL on the right database server without user intervention.

What does it look like for users?

The goal of developing Vitess online DDL is to hide as much complexity as possible from the user. Therefore, all users need to run are:

SET @@ddl_strategy='gh-ost';

ALTER TABLE my_table ADD COLUMN some_column INT NOT NULL;

The ALTER TABLE statement itself is perfectly normal, but the response is different. It returns immediately with a job ID that users can use to track the progress of the migration. Users can choose between a ghost-ost policy, a pt-osc policy, or a plain direct policy (not online DDL).

About "the principle of online DDL and how Vitess helps deal with schema migration" introduced to this, more related content can be searched for previous articles, I hope to help you answer questions, please support the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.