Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A code misspelling caused Microsoft's Azure DevOps service to shut down in Brazil for ten hours.

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

CTOnews.com, June 4: Microsoft Azure DevOps's service in southern Brazil was interrupted for about ten hours due to a simple code error. CTOnews.com noted that Microsoft's software engineering manager Eric Mattingly apologized for the interruption on Friday and revealed the cause of the accident: a spelling mistake caused 17 production databases to be deleted.

Azure DevOps provides an integrated set of services and tools for managing software projects, from planning and development to testing and deployment. Mattingly explained that Azure DevOps engineers sometimes take Snapshot of the production database to investigate reported problems or test performance improvements. They rely on a background system that runs every day, which deletes old snapshots after a certain period of time. Recently, Azure DevOps engineers performed a code upgrade to replace the deprecated Microsoft.Azure.Managment.* package with the supported Azure.ResourceManager.* NuGet package. This results in a large pull request in which the API call in the old package and the new package is replaced.

The spelling error occurs in this pull request, which replaces the call to delete the snapshot database with the call to delete the Azure SQL Server of the managed database. Azure DevOps has special tests to find such problems, but Mattingly says that because the wrong code only runs under certain conditions, existing tests cannot be covered.

A few days later, the software changes were deployed to the customer environment of the South Brazilian scale unit (server cluster with specific roles). The environment has an old snapshot database that triggers this error, causing the background task to delete "the entire Azure SQL Server and all seventeen production databases".

All the data has been recovered, but it took more than ten hours. Mattingly says there are several reasons for this. One of these is that since the customer cannot recover the Azure SQL Server by himself, it must be handled by the Azure engineer on duty, a process that takes about an hour. Another reason is that databases have different backup configurations: some are configured as regional redundant backups, and some are configured as updated geographic area redundant backups, which adds a long recovery time to solve this mismatch.

To prevent the problem from happening again, Mattingly said Microsoft had taken various repair and reconfiguration measures and once again apologized to all customers affected by the outage.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report