How to solve the problem of OOM error in TensorFlowGPU version 07/15 Update SLTechnology News&Howtos

How to solve the problem of OOM error in TensorFlowGPU version

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to solve the problem of OOM error in TensorFlowGPU version". In daily operation, I believe many people have doubts about how to solve the problem of OOM error in TensorFlowGPU version. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the question of "how to solve the problem of OOM error in TensorFlowGPU version"! Next, please follow the editor to study!

Question:

The following error occurs when using mask_rcnn to predict your own dataset:

ResourceExhaustedError: OOM when allocating tensor with shape [1120 node rpn_model/rpn_conv_shared/convolution] and type float on / job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [{{node rpn_model/rpn_conv_shared/convolution} = Conv2D [T = DT_FLOAT, data_format= "NCHW", dilations= [1,1,1,1], padding= "SAME", strides= [1,1,1,1], use_cudnn_on_gpu=true _ device= "/ job:localhost/replica:0/task:0/device:GPU:0"] (fpn_p2/BiasAdd, rpn_conv_shared/kernel/read)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [{{node roi_align_mask/strided_slice_17/_4277} = _ Recv [client _ terminated=false, recv_device= "/ job:localhost/replica:0/task:0/device:CPU:0", send_device= "/ job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name= "edge_3068_roi_align_mask/strided_slice_17", tensor_type=DT_INT32 _ device= "/ job:localhost/replica:0/task:0/device:CPU:0"] ()] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Reason:

First, because the size of the picture is 3200 to 4480, the size of the picture is too large.

Second, I use the TensorFlow GPU version, but my GPU video memory is only 8G, resulting in insufficient video memory.

Resolve:

First, change the size of the picture to the point that it takes up more memory than video memory.

Second, do not use GPU for prediction, only use CPU prediction, because generally CPU memory is larger than video memory. But the GPU version of TensorFlow is installed, so you need to make changes in the predictor.

The program adds the following code to the first two lines:

Import osos.environ ["CUDA_VISIBLE_DEVICES"] = ""

The quotation marks are filled with the serial number of GPU, and if it is left empty, it means that GPU is not used.

At this point, the study on "how to solve the problem of OOM errors in the TensorFlowGPU version" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.