Performance Analysis of Lock and Atomic Operation in OpenMP creation Thread 07/09 Update SLTechnology News&Howtos

Performance Analysis of Lock and Atomic Operation in OpenMP creation Thread

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "lock and atomic operation performance analysis in OpenMP creation thread". The explanation in this article is simple and clear, easy to learn and understand. Please follow the idea of Xiaobian and study and learn "lock and atomic operation performance analysis in OpenMP creation thread" together!

windows CriticalSection, OpenMP lock manipulation function performance on multicore CPUs.

Atomic operations use InterlockedIncrement for testing

For each type of lock and atomic operation, the time taken to perform 2 000 000 lock and unlock operations was tested for both single-task execution and multitask execution.

The detailed code for the test is shown below.

Test machine environment: Intel 2.66G dual core CPU machine

The results of the test run are as follows:

SingleThread, InterlockedIncrement 2,000,000: a = 2000000, time = 78

MultiThread, InterlockedIncrement 2,000,000: a = 2000000, time = 156

SingleThread, Critical_Section 2,000,000:a = 2000000, time = 172

MultiThread, Critical_Section, 2,000,000:a = 2000000, time = 3156

SingleThread,omp_lock 2,000,000:a = 2000000, time = 250

MultiThread,omp_lock 2,000,000:a = 2000000, time = 1063

In the single-task case, the elapsed time is as follows:

Atomic operation 78ms

Windows CriticalSection 172ms

OpenMP lock operation 250ms

Therefore, from the single-task situation, atomic operation is the fastest, Windows CriticalSection is the second, OpenMP library with lock is the slowest, but the time difference between these operations is not very large, lock operation is about 2~3 times slower than atomic operation.

In the case of multiple tasks running, the elapsed time is as follows:

Atomic operation 156ms

Windows CriticalSection 3156ms

OpenMP lock operation 1063ms

In the case of multitasking, the situation changed unexpectedly. Atomic operation time was twice as slow as single-task operation, and it was twice as slow on two CPUs as on a single CPU. It was really unimaginable. It was probably caused by task switching overhead.

Windows CriticalSection was even more outrageous, taking 3156ms, 18 times more time than a single task, which was unimaginably slow.

OpenMP's lock operation is slightly better than Windows CriticalSection, but it also takes 1063ms, about seven times as long as a single task.

From this, we can know that in the multi-core CPU multitasking environment, atomic operation is the fastest, OpenMP is second, Windows CriticalSection is the slowest.

At the same time, from the performance gap between these locks in single task and multitasking, it can be seen that programming on multicore CPUs will be very different from previous single-core multitasking programming.

It should be noted that this test is an extreme test, the lock operation is only a simple plus 1 operation, and the lock competition times up to 2 million times, in actual cases, because there are still a lot of code in the task that does not need to be locked, the actual performance will be much better than the performance of this test.

The test code is as follows:

//TestLock.cpp: Atomic operations and lock performance tester in OpenMP tasks. // #include #include #include #include #include void TestAtomic() { clock_t t1,t2; int i = 0; volatile LONG a = 0; t1 = clock(); for( i = 0; i < 2000000; i++ ) { InterlockedIncrement( &a); } t2 = clock(); printf("SingleThread, InterlockedIncrement 2,000,000: a = %ld, time = %ld/n", a, t2-t1); t1 = clock(); #pragma omp parallel for for( i = 0; i < 2000000; i++ ) { InterlockedIncrement( &a); } t2 = clock(); printf("MultiThread, InterlockedIncrement 2,000,000: a = %ld, time = %ld/n", a, t2-t1); } void TestOmpLock() { clock_t t1,t2; int i; int a = 0; omp_lock_t mylock; omp_init_lock(&mylock); t1 = clock(); for( i = 0; i < 2000000; i++ ) { omp_set_lock(&mylock); a+=1; omp_unset_lock(&mylock); } t2 = clock(); printf("SingleThread,omp_lock 2,000,000:a = %ld, time = %ld/n", a, t2-t1); t1 = clock(); #pragma omp parallel for for( i = 0; i < 2000000; i++ ) { omp_set_lock(&mylock); a+=1; omp_unset_lock(&mylock); } t2 = clock(); printf("MultiThread,omp_lock 2,000,000:a = %ld, time = %ld/n", a, t2-t1); omp_destroy_lock(&mylock); } void TestCriticalSection() { clock_t t1,t2; int i; int a = 0; CRITICAL_SECTION cs; InitializeCriticalSection(&cs); t1 = clock(); for( i = 0; i < 2000000; i++ ) { EnterCriticalSection(&cs); a+=1; LeaveCriticalSection(&cs); } t2 = clock(); printf("SingleThread, Critical_Section 2,000,000:a = %ld, time = %ld/n", a, t2-t1); t1 = clock(); #pragma omp parallel for for( i = 0; i < 2000000; i++ ) { EnterCriticalSection(&cs); a+=1; LeaveCriticalSection(&cs); } t2 = clock(); printf("MultiThread, Critical_Section, 2,000,000:a = %ld, time = %ld/n", a, t2-t1); DeleteCriticalSection(&cs); } int main(int argc, char* argv[]) { TestAtomic(); TestCriticalSection(); TestOmpLock(); return 0; } Thank you for reading, the above is the content of "OpenMP creation thread lock and atomic operation performance analysis", after the study of this article, I believe you have a deeper understanding of OpenMP creation thread lock and atomic operation performance analysis, the specific use needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.