How to solve the problem when the computing node is down 07/08 Update SLTechnology News&Howtos

How to solve the problem when the computing node is down

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to solve the computing node outage, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following for you. I hope you can gain something according to this article.

Rebuild can recover corrupted instance.

What if the host is broken? For example, if a hardware failure or power outage causes the whole computing node to fail to work, how can the instance running on that node be restored?

Is it okay to use Shelve or Migrate? Unfortunately, both operations require the nova-compute service of the compute node where the instance resides to function properly. Fortunately, there are also Evacuate operations.

Evacuate can migrate the instance on a node to another compute node if the nova-compute does not work. But there is a premise: the image files of Instance must be placed on shared storage.

Here is the flow chart of Evacuate instance

Send a request to nova-api

Nova-api sends messages

Nova-scheduler execution scheduling

Nova-scheduler sends messages

Nova-compute executes the operation

Let's discuss each step in detail.

Send a request to nova-api

Our experimental scenario is as follows: Instance c2 runs on devstack-compute1.

Simulate a node failure through a power outage, and then perform an Evacuate operation to restore instance c2. Currently, Evacuate can only be executed through CLI.

You need to specify the parameter-- on-shared-storage.

View log / opt/stack/logs/n-api.log

Nova-api sends messages

Nova-api sends a message to Messaging (RabbitMQ): "Evacuate this Instance" looks at the source code / opt/stack/nova/nova/compute/api.py, using evacuate.

Have you noticed that evacuate is actually implemented through rebuild operations. This is understandable because evacuate recreates the virtual machine with the image file of instance on shared storage

Nova-scheduler execution scheduling

When nova-scheduler receives the message, it selects the appropriate compute node for instance. Check the log / opt/stack/logs/n-sch.log.

Nova-scheduler finally chose to rebuild the instance on the devstack-controller compute node.

Nova-scheduler sends messages

Nova-scheduler sends a message informing the compute node that the instance can be created. The source code is on line 95 of / opt/stack/nova/nova/scheduler/filter_scheduler.py, and the method is select_destinations.

Nova-compute executes the operation

The job on the compute node is to rebuild the instance with the mirror files on the shared storage. The log is in devstack-controller:/opt/stack/logs/n-cpu.log.

Allocate resources to instance

Use mirrored files on shared storage

Start instance

After the Evacuate operation is complete, instance runs on devstack-controller.

After reading the above, do you have any further understanding of how to solve the problem of computing node downtime? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.