Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Monitoring SQL Server transactional replication

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Monitoring SQL Server transactional replication

Typically, we can use SSMS's replication monitor to monitor replication. But we can't stare 24 hours a day, we have to monitor it in an automated way. Microsoft provides a system stored procedure dbo.sp_replmonitorsubscriptionpendingcmds in the distribution database to return the number of commands waiting on the subscription and an estimate of the time it takes to deliver all these commands to the subscriber. I created a job that runs every 10 minutes, saves the historical data of the state to a table, and the data is retained for 14 days.

This table is created in the subscriber server's DBA database with the following code:

CREATE TABLE dbo.Replication_Qu_History (Subscriber_db varchar (50) NOT NULL,Records_In_Que numeric (18,0) NULL,CatchUpTime numeric (18,0) NULL,LogDate datetime NOT NULL,CONSTRAINT PK_EPR_Replication_Que_History PRIMARY KEY CLUSTERED (Subscriber_db ASC, LogDate DESC) ON PRIMARYGO

The data in the table is generated by monitoring stored procedures, and problems can be found through historical data. However, it is more necessary to monitor what is happening now.

There are three things that can help determine the health of replication.

1. The status of the replication-related job.

two。 Latency, especially the distribution delay measured by the counter Dist:Delivery Latency.

3. The large number of outstanding commands that the subscription is waiting for.

I focused on distribution latency, because past experience tells me that the problem of distribution latency is more prominent than log read latency. Most of the time, the distribution delay is due to an increase in the number of transactions. For example, index rebuilding on a large table that publishes data can lead to a sudden increase in transaction logs, resulting in more data that needs to be replicated than normal.

If there are a large number of commands waiting to be distributed, sometimes the Distribution Agent job is not running. On the other hand, sometimes the job is running, but it doesn't keep up. By restarting the agent, the job begins to process outstanding commands.

Before we begin, we need to know the replication information, such as the names of publishers and subscribers, the names of distribution agent jobs, and so on. Microsoft provides stored procedures in the distribution database to collect this information. The author's distribution database is with the subscriber database, so the script is simpler than on different servers.

1. First, sp_replmonitorhelppublisher is executed in the distribution database to get the monitoring information of all publishers.

two。 Then, sp_replmonitorhelppublication is executed in the distribution database to return all published monitoring information.

3. Finally, sp_replmonitorhelpsubscription is executed to return monitoring information for all subscriptions.

This information contains some latency metric data, so after executing this stored procedure, I already have some key information.

The following is the code used to collect information:

DECLARE @ cmd NVARCHAR (max) DECLARE @ publisher SYSNAME, @ publisher_db SYSNAME, @ publication SYSNAME, @ pubtype INTDECLARE @ subscriber SYSNAME, @ subscriber_db SYSNAME, @ subtype INTDECLARE @ cmdcount INT, @ processtime INTDECLARE @ ParmDefinition NVARCHAR DECLARE @ JobName SYSNAMEDECLARE @ minutes INT, @ threshold INT, @ maxCommands INT @ mail CHAR (1) = 'N'SET @ minutes = 60-> Define how many minutes latency before you would like to be notifiedSET @ maxCommands = 80000-> change this to represent the max number of outstanding commands to be proceduresed before notificationSET @ threshold = @ minutes * 60SELECT * INTO # PublisherInfoFROM OPENROWSET (' SQLOLEDB', 'SERVER= (LOCAL) TRUSTED_CONNECTION=YES;', 'SET FMTONLY OFF EXEC distribution.dbo.sp_replmonitorhelppublisher') SELECT @ publisher = publisher FROM # PublisherInfoSET @ cmd =' SELECT * INTO # # PublicationInfo FROM OPENROWSET (''SQLOLEDB'',''SERVER= (LOCAL)) TRUSTED_CONNECTION=YES'',''SET FMTONLY OFF EXEC distribution.dbo.sp_replmonitorhelppublication @ publisher='+ @ publisher +'')'--select @ cmdEXEC sp_executesql @ cmdSELECT @ publisher_db=publisher_db, @ publication=publication, @ pubtype=publication_type FROM # # PublicationInfoSET @ cmd = 'SELECT * INTO # # SubscriptionInfo FROM OPENROWSET (' 'SQLOLEDB'',''SERVER= (LOCAL) TRUSTED_CONNECTION=YES'',''SET FMTONLY OFF EXEC distribution.dbo.sp_replmonitorhelpsubscription @ publisher='+ @ publisher +', @ publication_type=' + CONVERT (CHAR (1), @ pubtype) +'')'--select @ cmdEXEC sp_executesql @ cmdALTER TABLE # # SubscriptionInfoADD PendingCmdCount INT NULL,EstimatedProcessTime INT NULL

After knowing the basic information about publishers and subscribers, check the status of the distribution job. They should be running all the time. If it's not running, you need to start it. If I need to restart a job, I will set the logo to force email alerts to be sent.

I'm not trying to send email alerts, but to check the status of all subscriptions. If the set data exceeds the set threshold, a mail alarm will be triggered. I use a cursor to traverse all subscriptions, which is the easiest way to gather information. I use this information as a parameter for other stored procedures to determine whether the distribution agent is running and to restart the agent.

DECLARE cur_sub CURSOR READ_ONLY FORSELECT @ publisher, s.publisher_db, s.publication, s.subscriber, s.subscriber_db, s.subtype, s.distribution_agentnameFROM # # SubscriptionInfo sOPEN cur_subFETCH NEXT FROM cur_sub INTO @ publisher, @ publisher_db, @ publication, @ subscriber, @ subscriber_db, @ subtype, @ JobNameWHILE @ @ FETCH_STATUS = 0BEGINSET @ cmd = 'SELECT @ cmdcount=pendingcmdcount, @ processtime=estimatedprocesstime FROM OPENROWSET (' 'SQLOLEDB'',''SERVER= (LOCAL) TRUSTED_CONNECTION=YES'',''SET FMTONLY OFF EXEC distribution.dbo.sp_replmonitorsubscriptionpendingcmds @ publisher=' + @ publisher+', @ publisher_db=' + @ publisher_db +', @ publication=' + @ publication+', @ subscriber=' + @ subscriber +', @ subscriber_db=' + @ subscriber_db+', @ subscription_type=' + CONVERT (CHAR (1), @ subtype) +' '+') 'SET @ ParmDefinition = N'@cmdcount INT OUTPUT,@processtime INT OUTPUT'--select @ cmdEXEC sp_executesql @ cmd,@ParmDefinition,@cmdcount OUTPUT,@processtime OUTPUTUPDATE # # SubscriptionInfoSET PendingCmdCount = @ cmdcount, EstimatedProcessTime = @ processtimeWHERE subscriber_db = @ subscriber_dbINSERT INTO DBA.dbo.Replication_Que_HistoryVALUES (@ subscriber_db, @ cmdcount, @ processtime GETDATE ()-- find out if the distribution job with the high number of outstanding commands running or not-- if it is running then sometimes stopping and starting the agent fixes the issueIF EXISTS (SELECT * FROM tempdb.INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME LIKE'# # JobInfo%') DROP TABLE # # JobInfoSET @ cmd = 'SELECT * INTO # # JobInfo FROM OPENROWSET (' 'SQLOLEDB'',''SERVER= (LOCAL) TRUSTED_CONNECTION=YES'',''SET FMTONLY OFF EXEC msdb.dbo.sp_help_job @ job_name='+ @ JobName +', @ job_aspect=''''JOB') 'EXEC sp_executesql @ cmdIF @ cmdcount > @ maxCommands OR (@ processtime > @ threshold AND @ cmdcount > 0) BEGINIF (SELECT current_execution_status FROM # # JobInfo) = 1-- This means job is currently executing so stop/start itBEGINEXEC distribution.dbo.sp_MSstopdistribution_agent@publisher = @ publisher @ publisher_db = @ publisher_db, @ publication = @ publication, @ subscriber = @ subscriber, @ subscriber_db = @ subscriber_dbWAITFOR DELAY '00VO 05'-5 Second DelaySET @ mail =' Y'ENDEND--SELECT name, current_execution_status FROM # # JobInfoIF (SELECT current_execution_status FROM # # JobInfo) 1-- if the job is not running start itBEGINEXEC distribution.dbo.sp_MSstartdistribution_agent@publisher = @ publisher, @ publisher_db = @ publisher_db, @ publication = @ publication @ subscriber = @ subscriber, @ subscriber_db = @ subscriber_dbSET @ mail ='Y'-- Send email if job has stopped and needed to be restartedENDDROP TABLE # # JobInfoFETCH NEXT FROM cur_sub INTO @ publisher, @ publisher_db, @ publication, @ subscriber, @ subscriber_db, @ subtype, @ JobNameENDCLOSE cur_subDEALLOCATE cur_sub

Run sp_replmonitorsubscriptionpendingcmds to collect outstanding commands and expected catch-up time.

This is the information I want to store in the history table, so I can see how the replication is going.

We need to determine an acceptable delay threshold. I use it here for 6 minutes, which means that if the replicated database lags behind the published database by more than 6 minutes, you will be alerted. Also determine the maximum number of undistributed commands. If this number fluctuates upward, there may be problems. You can choose how high you want this number to be set before you take action. I chose to have this system have 80000 undistributed commands.

I got this data two weeks after having the replication queue check job run. Make sure that these jobs run like index rebuild jobs. I looked at the maximum number and maximum delay of undistributed commands for a period of time and determined that my setting value would be higher. I don't want to be woken up at night because of the temporary backup of the system caused by the index rebuild job, which will be automatically restored.

The following code needs to enable the Ad Hoc Distributed Queries server configuration option. Assuming that the previous script found a problem, I created a script to send mail.

IF @ mail = 'Y'BEGINDECLARE @ msg VARCHAR (MAX) =' Replication on'+ @ @ SERVERNAME+ 'may be experiencing some problems. Attempts to restart the distribution agent have been made. '+' If this is not the first message like this that you have received within the last hour, please investigate.'DECLARE @ body NVARCHAR (MAX) DECLARE @ xml1 NVARCHAR (MAX) DECLARE @ tab1 NVARCHAR (MAX) DECLARE @ xml2 NVARCHAR (MAX) DECLARE @ tab2 NVARCHAR (MAX) SET @ xml1 = CAST (SELECT subscriber AS 'td','',subscriber_db AS' td','',latency AS 'td','', PendingCmdCount AS' td','', EstimatedProcessTime AS 'td'FROM # # SubscriptionInfo sFOR XML PATH (' tr') ELEMENTS) AS NVARCHAR (MAX) SET @ tab1 = 'Subscription Information Subscriber Subscriber Database Latency (seconds) Undistributed Commands Estimated CatchUpTime'-this command gives us the last 10 measurements of latency for each subscriberSET @ xml2 = CAST ((SELECT s.Subscriber_db AS 'td','', s.Records_In_Que AS' td','', s.CatchUpTime AS 'td','', CONVERT (CHAR (22), LogDate) AS 'td'FROM (SELECT ROW_NUMBER () OVER (PARTITION BY subscriber_db ORDER BY Logdate DESC) AS' RowNumber',subscriber_db, Records_In_Que, CatchUpTime, LogdateFROM DBA.dbo.Replication_Que_History) sWHERE RowNumber

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report