How to switch to nohz and hres in linux kernel 07/06 Update SLTechnology News&Howtos

How to switch to nohz and hres in linux kernel

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article shows you how to switch to nohz and hres in the linux kernel, which is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

The guys who designed the linux kernel are really thoughtful. As mentioned earlier, the character of the linux kernel is passion. As long as the hardware design is flexible enough, then the designer will play as much as possible, never let go of any points and corners that can be played freely, and they never care about the consequences, and sometimes resolutely abandon the advice of the hardware. * the guys who designed the linux kernel are really thoughtful, as mentioned earlier. The character of the linux kernel is passion, as long as the hardware design is flexible enough, then the designer will play as much as possible, never let go of any points and corners that can be played freely, and they never care about the consequences, and sometimes resolutely abandon the suggestions of the hardware. Nohz of the kernel is a pioneering work. Clock interruption is necessary for a computer system, just as a person must have a heartbeat, the human heartbeat is periodic, and the "heartbeat" of a computer system is also periodic, so clock interruptions occur at regular intervals.

Is that true? The designers of the linux kernel believe that if the cpu is idle, then there is no need for the heartbeat. After all, the computer is not a self-organizing system, the energy is supplied entirely by external power supply, and the human is a self-organizing entity, so people must have a periodic heartbeat to generate their own energy. As long as the external power supply of the computer is continuous and the clock is programmable, then aperiodic heartbeat or even cardiac arrest is possible. The linux kernel achieves this. Before the 2.6.21 kernel, clock interrupts are periodic, and after that, new clock encapsulation structures clock_event_device and clocksource are introduced, so we can implement our own personality clock more flexibly, which is nohz mode and hres mode. Of course, when the system starts, the clock interrupt is periodic. When timer_interrupt is called, the timer soft interrupt will be triggered, and then the opportunity to switch to nohz or hres will be found in the next soft interrupt processing. The specific code is as follows:

Void run_local_timers (void)

{

Hrtimer_run_queues (); / / give priority to high precision clock queues

Raise_softirq (TIMER_SOFTIRQ); / / triggers a soft interrupt. The handling function is as follows:

Softlockup_tick ()

}

Static void run_timer_softirq (struct softirq_action * h)

/ / soft interrupt handling function

{

Struct tvec_base * base = _ _ get_cpu_var (tvec_bases)

Hrtimer_run_pending (); / / there is a chance to switch to nohz or hres.

If (time_after_eq (jiffies, base- > timer_jiffies))

_ _ run_timers (base)

}

Void hrtimer_run_pending (void)

{

Struct hrtimer_cpu_base * cpu_base = & _ _ get_cpu_var (hrtimer_bases)

If (hrtimer_hres_active ()) / / if it already is, there is no need to switch, return directly

Return

If (tick_check_oneshot_change (! hrtimer_is_hres_enabled ()

/ / this if judgment is the code that specifically switches to hres or nohz

Hrtimer_switch_to_hres ()

Run_hrtimer_pending (cpu_base)

}

Int tick_check_oneshot_change (int allow_nohz)

{

Struct tick_sched * ts = & _ _ get_cpu_var (tick_cpu_sched)

If (! test_and_clear_bit (0, & ts- > check_clocks))

/ / various judgments from this point of view show that various conditions are required for switching.

Return 0

If (ts- > nohz_mode! = NOHZ_MODE_INACTIVE)

Return 0

If (! timekeeping_valid_for_hres () | |! tick_is_oneshot_available ())

Return 0

If (! allow_nohz) / / if hres is allowed, return 1, which will switch to hres high-precision mode

Return 1

Tick_nohz_switch_to_nohz ()

/ / if there is no opportunity to switch to high-precision mode, all the previous verifications have passed, and here at least switch to nohz mode

Return 0

}

Hrtimer_switch_to_hres and tick_nohz_switch_to_nohz are responsible for the specific switching between hres mode and nohz mode. You can't just follow the tracking code. What does hres and nohz have to do with each other? In fact, hres is not periodic interrupt, but is very accurate to determine the interrupt, using the trigger time of the recent hrtimer to program the clock to trigger the interrupt when that time comes, while nohz only shows that the clock can be programmed with aperiodic time, and there is no requirement for precision.

In hres, everything is handled by a hrtimer, such as the original beat scheduling, counting the time of the current process and other operations are performed directly in timer_interrupt, while in hres mode, there is a special hrtimer for the above operations. When clock_event_device 's event_handler is executed (all operations are encapsulated into clock_event_device 's event_handler, and this event_handler is assigned when switching to hres or nohz), this function traverses all hrtimer. All the hrtimer are organized into a red-black tree, the expired hrtimer is linked into a linked list, and then the callback function of the linked hrtimer is executed in the soft interrupt, and immediately for other hrtimer: all hrtimer are divided into two categories, one can not be executed in the soft interrupt, and the other can be executed in the soft interrupt, which is not so urgent. For pure nohz non-hres mode, event_handler is still the traditional way to handle it, except that the next interrupt can be programmed at will. In this way, the time measurement can reach the accuracy of sodium seconds.

Whenever cpu executes a cpu_idle, the kernel looks for an opportunity to stop the heartbeat of the system and then trigger the heartbeat at the right time, rather than a periodic heartbeat. If hrtimer is in charge of everything, then this is the time to find out the expiration time of the most recently expired timer. Although the clock interrupt of the cycle has been stopped, other hardware interrupts are not stopped, and hardware interrupts may trigger some events, such as scheduling, such as issuing a new timer, so check the expiration of the hrtimer and rescheduling requests after each hardware interrupt. If so, immediately stop the care jump mode and cut out the idle process. The following code shows this, calling irq_enter every time you enter hardware interrupt handling:

Void irq_enter (void)

{

# ifdef CONFIG_NO_HZ

Int cpu = smp_processor_id ()

If (idle_cpu (cpu) & &! in_interrupt ())

Tick_nohz_stop_idle (cpu)

# endif

_ _ irq_enter ()

# ifdef CONFIG_NO_HZ

If (idle_cpu (cpu))

Tick_nohz_update_jiffies (); / / updates the timing, and the nohz mode is thus used as the trigger for the next

Reference to the timing of the interruption. How do you understand it? Look at this call condition, only if the cpu is in the idle state

Update the time, because the cycle clock may have been stopped while cpu is in idle, in order not to lose time

The information must be filled in the interruption.

# endif

}

Interrupts in nohz mode are "almost" periodic, and nohz literally means aperiodic, but it is still basic cyclical because it does not have any time basis for the next clock interrupt. But hres is completely random clock interrupt, because its event_handler is to operate the hrtimer on the red-black tree, so it can take the expiration time of the next expired hrtimer as the next time to trigger the clock interrupt. You know, in hres mode, all time-related operations such as timing and beat scheduling are handled by hrtimer. If you want to choose the next time the clock is triggered, you can't arbitrate in a hrtimer handler, but you have to arbitrate in a global event_handler function that handles all hrtimer. That's everything. Let's take a look at cpu_idle:

Void cpu_idle (void) {int cpu = smp_processor_id (); current_thread_info ()-> status | = TS_POLLING; / * endless idle loop with no priority at all * / while (1) {tick_nohz_stop_sched_tick (1); while (! need_resched ()) {check_pgt_cache (); rmb (); if (rcu_pending (cpu)) rcu_check_callbacks (cpu, 0) If (cpu_is_offline (cpu)) play_dead (); local_irq_disable (); _ _ get_cpu_var (irq_stat). Idle_timestamp = jiffies; / * Don't trace irqs off for idle * / stop_critical_timings (); pm_idle (); start_critical_timings ();} tick_nohz_restart_sched_tick (); preempt_enable_no_resched (); schedule (); preempt_disable ();}}

Next_jiffies = get_next_timer_interrupt (last_jiffies) is called in tick_nohz_stop_sched_tick; this sentence means to find out the next nearest timer or hrtimer to use its expiration time as the next clock interrupt. Of course, the rescheduling flag should be checked in tick_nohz_stop_sched_tick. If it is set, the nohz will be returned immediately. In fact, the tick_nohz_stop_sched_tick function will be called in every irq_exit after the hardware interrupt to reschedule the clock if possible.

It seems that the designers of linux are considerate, and this is another example of crazy use and flexible use of hardware. Linux itself does not distinguish the priority of interrupts and connives at the emergence and development of nohz and hres in a sense. If one day the linux kernel becomes regular, principled, like windows or aligned with unix, then the era of linux will be over and its character will be worn out.

Additional: the scheduling-related hrtimer kernel calls the task_tick function of the scheduling class in two places, that is, in the hrtick handler function of the clock interrupt (excluding nohz and hres) and hrtimer per run queue:

Void scheduler_tick (void)

{

Int cpu = smp_processor_id ()

Struct rq * rq = cpu_rq (cpu)

Struct task_struct * curr = rq- > curr

Sched_clock_tick ()

Spin_lock (& rq- > lock)

Update_rq_clock (rq)

Update_cpu_load (rq)

Curr- > sched_class- > task_tick (rq, curr, 0); / / pay attention to parameters

Spin_unlock (& rq- > lock)

# ifdef CONFIG_SMP

Rq- > idle_at_tick = idle_cpu (cpu)

Trigger_load_balance (rq, cpu)

# endif

}

Static enum hrtimer_restart hrtick (struct hrtimer * timer)

{

Struct rq * rq = container_of (timer, struct rq, hrtick_timer)

WARN_ON_ONCE (cpu_of (rq)! = smp_processor_id ())

Spin_lock (& rq- > lock)

Update_rq_clock (rq)

Rq- > curr- > sched_class- > task_tick (rq, rq- > curr, 1); / / pay attention to parameters

Spin_unlock (& rq- > lock)

Return HRTIMER_NORESTART

}

Take the fair scheduling class as an example, whose task_tick is task_tick_fair, which is up-regulated by scheduling group.

Entity_tick is used:

Static void entity_tick (struct cfs_rq * cfs_rq, struct sched_)

Entity * curr, int queued)

{

Update_curr (cfs_rq)

# ifdef CONFIG_SCHED_HRTICK

If (queued) {

Resched_task (rq_of (cfs_rq)-> curr); / / in the hrtimer-related task_tick

Parameter 1 is exactly the situation here, forced scheduling and then return, why so fierce ah? To understand the way here,

To understand the role of each queue hrtimer, this hrtimer is specifically responsible for recording a scheduling opportunity, which

It must be dispatched, so why must it be dispatched? Because when calculating the timing and setting the hrtimer, you have to first

Calculate how long the advance can run, after which the hrtimer expires, mandatory scheduling, that is,

It means that as long as you get to hrtick, it means that a dispatch will happen immediately.

Return

}

If (! sched_feat (DOUBLE_TICK) & &

/ / if the above hrtimer is being timed, then the hrtimer method is used instead of going down.

Hrtimer_active (& rq_of (cfs_rq)-> hrtick_timer))

Return

# endif

If (cfs_rq- > nr_running > 1 | |! sched_feat (WAKEUP_PREEMPT))

/ / otherwise, come here for regular updates, checks, and scheduling.

Check_preempt_tick (cfs_rq, curr)

}

Why is such a paragraph attached? Because the hrtimer of each queue has to call task_tick, and if you still have to go to task_tick in the event_handler, wouldn't it be superfluous to do one thing in two places? in fact, there is only one place where the real task_tick is carried out. You can see from the above code that if it is a regular task_tick entry, then check if (queued) {or if (! sched_feat (DOUBLE_TICK) & &. If there is hrtimer activity per queue, it will be returned directly and will not be processed, so you can see that there is no repetition. Take a look at how to set the hrtimer per queue:

Static void hrtick_start_fair (struct rq * rq

Struct task_struct * p)

{

Struct sched_entity * se = & p-> se

Struct cfs_rq * cfs_rq = cfs_rq_of (se)

WARN_ON (task_rq (p)! = rq)

If (hrtick_enabled (rq) & & cfs_rq- > nr_running > 1) {

U64 slice = sched_slice (cfs_rq, se)

/ / weight calculates how long the process should run

U64 ran = se- > sum_exec_runtime-se- > prev_sum_exec_runtime

/ / calculate how long the process has actually run

S64 delta = slice-ran; / / calculate the difference between the two

If (delta

< 0) { if (rq->

Curr = = p) / / schedule immediately if the run times out

Resched_task (p)

Return

}

If (rq- > curr! = p)

Delta = max_t (S64, 10000LL, delta)

Hrtick_start (rq, delta); / / otherwise set hrtimer for a fixed period

}

The above is how to switch to nohz and hres in the linux kernel. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.