Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize CUDA timer

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how to realize the CUDA timer". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

In CUDA programming, you need to use the timing method to check the running speed of the program.

First, give the header file gputimer.h

# ifndef _ _ GPU_TIMER_H__#define _ GPU_TIMER_H__struct GpuTimer {cudaEvent_t start; cudaEvent_t stop; GpuTimer () {cudaEventCreate (& start); cudaEventCreate (& stop);} ~ GpuTimer () {cudaEventDestroy (start); cudaEventDestroy (stop) } void Start () {cudaEventRecord (start, 0);} void Stop () {cudaEventRecord (stop, 0);} float Elapsed () {float elapsed; cudaEventSynchronize (stop); cudaEventElapsedTime (& elapsed, start, stop) Return elapsed;}}; # endif / * _ _ GPU_TIMER_H__ * /

General usage

GpuTimer timer;timer.Start (); / / launch the kernalkernal (d_out, d_in); timer.Stop (); printf ("Time elapsed =% g ms\ n", timer.Elapsed ()); / / output

In practical application, calculate the square of 1000 numbers

# include # include "device_launch_parameters.h" # include "gputimer.h" # include # include _ _ global__ void square (float* d_out, float* d_in) {int idx = threadIdx.x; float f = din [IDX]; dout[ IDX] = f * f;} int main () {GpuTimer timer; const int ARRAY_SIZE = 1000; const int ARRAY_BYTES = ARRAY_SIZE * sizeof (float) / / generate the input array on the host float hin [array _ SIZE]; for (int I = 0; I

< ARRAY_SIZE; i++) { h_in[i] = float(i); } float h_out[ARRAY_SIZE]; // declare GPU memory pointers float* d_in; float* d_out; // allocate GPU memory cudaMalloc((void **)&d_in, ARRAY_BYTES); cudaMalloc((void**)&d_out, ARRAY_BYTES); // transfer the array to the GPU cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice); timer.Start(); // launch the kernal square(d_out, d_in); timer.Stop(); // copy back the result array to the CPU cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost); // print out the resulting array for (int i = 0; i < ARRAY_SIZE; i++) { printf("%f", h_out[i]); printf(((i % 4) != 3) ? "\t" : "\n"); } printf("Time elapsed = %g ms\n", timer.Elapsed()); // free GPU memory allocation cudaFree(d_in); cudaFree(d_out); system("pause"); return 0;} 运行结果:

This is the end of the content of "how to implement CUDA timer". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report