What is the use of autovacuum in postgresql 07/01 Update SLTechnology News&Howtos

What is the use of autovacuum in postgresql

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Xiaobian to share with you what is the use of autovacuum in postgresql, I hope you have gained something after reading this article, let's discuss it together!

Those who understand the principle of postgresql database know the importance of vacuum, and among the related threads, autovacuum is important, or the most important is deserved.

Autovacuum works consistently as a process of postgresql.

In addition to doing the same job as his name, this autovacuum is responsible for gathering more information about dead tuples and bloats, analyzing tables that update table statistics so that the optimizer can choose the best execution plan for SQL statements.

There is another process in PG, the stats collector, that tracks usage and activity information. This information will be used by autovacuum. To better carry out related cleaning work.

The switch between these two functions in postgresql.conf is

At the same time in the system vacuum at the same time, as DBAs are expected to be able to analyze, so the log can be recorded that is the best. So log_autovacuum_min_duration can log vacuum for more than 250 milliseconds.

After the restart, the log entries we can see are as follows. So Postgresql's logging is comprehensive, which is much better than some databases (SQL SERVER and MYSQL).

Therefore, it is necessary to have a good and fast space for POSTGRESQL logs on large and frequent systems.

The next question is what kind of tables will be autovacuum, or alternate objects.

actual dead tuple >= autovacuum_vacuum_scale_factor * number of tuples + autovacuum_vacuum_threshold

The formula above is the method that the table will be selected into autovacuum after insertion, update, and deletion. The following parameters are the options that trigger analysis or autovacuum

For example, the above operation, for example, the number of rows in a table is 1000 rows (1000*0.2)+50 = 250 rows. When the number of dead rows in this table exceeds 250 rows, vacuum will be triggered. So adjusting this parameter is very important, and if there are large tables, you will find that the speed of cleaning dead T is getting slower and slower, so you have to customize some large tables.

How do you define it? First of all, make sure that there are often endless dead tuples.

SELECT n_tup_ins as "inserts",n_tup_upd as "updates",n_tup_del as "deletes", n_live_tup as "live_tuples", n_dead_tup as "dead_tuples"

FROM pg_stat_user_tables

These two parameters can be adjusted manually to achieve more or less frequent vacuums for this table.

With that said, there is a question of whether there is a limit to the number of threads that clean up the vacuum of the table. autovacuum_max_workers is a parameter that controls the number of worker threads. If you have four databases, another database has to wait until the next cycle to operate on the tables of this database. So you can adjust this parameter appropriately if you have more databases in a cluster.

Having said that, there is still a problem to consider at present, that is, whether to carry out AUTOVACUUM during busy business periods, whether it will make things worse when the business is busy, which is more unfavorable for performance.

Automatic vacuum reads 8KB(default block_size) pages of a table from disk and modifies/writes pages containing dead tuples. This involves reading and writing IO. So this could be an IO-intensive operation, and running an automatic vacuum on a huge table with many dead tuples during peak transaction times is a good thing, so to avoid such a situation, it can be configured in the parameters.

vacuum_cost_page_hit is the cost of pages you can read in the buffer

vacuum_cost_page_miss is the cost of pages not read in buffer

vacuum_cost_page_dirty is the cost of rewriting after dead T is found in the page

If vacuum_cost_delay = 20, one second is 1000 milliseconds, then vacuum occurs 50 times, by default according to the definition of time

Here's a formula

200 * 50 * 8 = amount of dead T data that can be processed in buffer

200/10*50 *8= amount of dead T data that can be processed on disk

200/20*50*8 = number of dead T and rewrite that can be processed in one cycle

These three quantities are distributed in decreasing order, so by increasing autovacuum_max_workers, the cost is evenly distributed among the autovacuum quantities of all autovacuum processes running in the instance. Therefore, adding autovacuum um_max_workers may delay autovacuum execution for currently running autovacuum workers. Increasing autovacuum um_vacuum um_cost_limit may cause IO bottlenecks.

Of course, these parameters can also be specially set for the table. So this autovacuum adjustment definitely belongs to a high IQ math and judgment problem.

After reading this article, I believe you have a certain understanding of "what is the use of autovacuum in postgresql". If you want to know more about it, welcome to pay attention to the industry information channel. Thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.