Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Ant Group responded that its language Finch collapsed: due to the upgrade of Bug by operation and maintenance staff, members will be given to all users for 6 months.

2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Thanks to CTOnews.com netizen Alejandro86, I want to Kangkang, Rain maker, Zhong Wenze, Bach Russell, cold water drink belly, soft media new friend 2277523, Na Ni clue delivery! CTOnews.com, October 24 (Xinhua)-- Finch is an online document editing and collaboration tool owned by Ant Group, which uses "structured knowledge Base Management", which is similar to a catalogue of books. The software was officially launched in iOS and Android on February 22, 2022.

According to a number of CTOnews.com readers, the tool experienced a massive server failure today, and neither online documents nor the official website can be opened.

After nearly 10 hours of failure, the Voice Finch service has now returned to normal, all ends of the Voice Finch can be accessed normally, and the function has also been restored. at present, the government has released the full fault report and announced that it will give 6-month Voice members to all users.

The cause of the failure and the handling process on the afternoon of October 23, when the data storage operation and maintenance team of ServiceFinch was upgrading, the production environment storage server in East China was mistakenly offline due to the new operation and maintenance upgrade tool bug. Affected by it, the Finch data service has a serious failure, resulting in a large area of service interruption.

In order to restore the service as soon as possible, we and the data storage operation and maintenance team made every effort to restore the data, but it took a long time due to the recovery scheme, data magnitude and other factors.

The specific process is as follows:

14:07 the data storage operation and maintenance team received an alarm from the monitoring system, which was located because the node machine was offline due to the upgrade due to the new operation and maintenance tool bug.

Contact the hardware team at 14:15 to try to bring the offline machine back online.

15:00 confirmed that due to the old type of machines used by the storage system, it could not be operated online directly, and immediately adjusted the recovery plan to restore the stored data from the backup system.

The new storage system starts at 15:10, and the data is recovered from the backup. Because of the huge amount of data, this process takes a long time.

The data recovery was completed at 19:00; at the same time, in order to ensure the integrity of the data, it took 2 hours for data verification after the recovery was completed.

21:00 the storage system passed the integrity check, began to debug with the Finch team, and finally resumed the full service at 22:00. None of the user's data is lost.

The Finch team claimed, "through this failure, we deeply understand that as a document product that serves tens of millions of customers, it should achieve better technical risk protection and highly available architecture design, especially the systematic construction and process audit of 'monitoring, grayscale and rollback' for technological change operations, upgrading from multi-copy disaster recovery with Region to high availability in two places and three centers. Design enough data and system redundancy to achieve rapid recovery, and conduct regular disaster recovery emergency drills. Only in this way can we improve the speed of recovery in the event of serious infrastructure failures and fundamentally avoid the recurrence of such failures. "

To this end, the Finch team has formulated the following improvement measures:

1. Upgrade the hardware version and model to achieve fast online after offline. This measure has been completed in this fault repair.

2. The operation and maintenance team should strengthen the quality assurance and testing of operation and maintenance tools to prevent this kind of operation and maintenance bug from happening again.

3. Reduce the gray range of operation and maintenance actions, increase the grayscale time, and discover bug in advance.

4. Improve the service from the architecture and high availability level to increase the remote disaster preparedness of the storage system for the Finch.

The Voice Finch team said that in order to express its apology, the team will provide the following compensation package to all users affected by the failure:

For individual users of Voice, we offer 6-month membership service. Operation procedure: enter the workbench "account Settings", click "member Information" on the left, click "get now" on the member information page, and you can get the gift service.

For Voice Space users, due to the complexity of the situation, we will work out a separate compensation scheme. Ask the space administrator to pay attention to the message in the sparrow station.

Related reading: "after more than 8 hours of failure, the Finch service has now returned to normal."

"the words of the Ant Group have collapsed: online documents and official websites cannot be opened, and officials say emergency recovery is under way."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report