How to use Python to generate lmdb files and txt list manifest files needed for caffe 07/04 Update SLTechnology News&Howtos

How to use Python to generate lmdb files and txt list manifest files needed for caffe

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

How to use Python to generate the lmdb file and txt list file needed by caffe. Aiming at this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

For some of the picture data sets in your hand, how to convert the picture format, how to calculate the average value of the picture data, and how to write the prototxt configuration file are the main contents of the following notes. Here is the main record of how to convert picture data into db files.

I. Review of Caffe training and learning steps

1. Prepare the dataset (training set and test set)

two。 Conversion of picture data to db (leveldb/lmdb) files

3. Calculate the mean of picture data

4.prototxt profile

5. Training model

Note: there is also a training method that does not require db files and calculating the mean of picture data, but only a list of txt lists, and another training step is explained after this learning method.

II. Converting picture data into db (leveldb/lmdb) files

1. Overview

In the practical application of deep learning, the raw data we often use are picture files, such as jpg,jpeg,png,tif, and the size of the pictures may not be the same. The data type that is often used in caffe is lmdb or leveldb, so there is such a problem: how to convert the original image file into a db (leveldb/lmdb) file that can be run in caffe?

In caffe, the author provides us with such a file: convert_imageset.cpp, which is stored in the tools directory under the root directory of caffe. After compiling the caffe, the corresponding executable file will be generated and placed in the build/tools/ directory. The purpose of this executable file convert_imageset is to convert the image file into a db file that can be used directly in the caffe framework.

The format of the file is as follows:

Convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME

Need to take four parameters:-FLAGS: picture parameter group, later detailed-ROOTFOLDER/: image storage absolute path, starting from the linux system root directory-LISTFILE: picture file list list, usually a txt file, one picture per line-DB_NAME: the resulting db file storage directory, so if we want to use convert_imageset this tool to generate the db file we need You need to get the list of picture files, list, txt file first. In the / examples/image directory of the caffe root directory, there are two images we tested. They are cat.jpg and fish-bike.jpg. We can use the eog command to view these two images on the terminal (remote login ssh is not allowed, vnc is OK, of course, not remote login can be used), they are as follows:

We can use these two pictures to learn how to make a list of picture files list txt files. The txt file format for this list of pictures is as follows:

Picture file name label

Take cat.jpg and fish-bike.jpg as an example, then the list txt file for these two images is:

Cat.jpg 1

Fish-bike.jpg 2

And so on, one picture label per line. We define that label 1 is the label of a cat and label 2 is the label of a bicycle. Obviously, if there are only two pictures, we can handwrite a list of pictures txt file, but if there are a lot of pictures, what should we do with them?

Obviously, we can use scripts, and there are many ways to choose from shell scripts, python scripts, and so on. My approach is to use the python script to process these files and generate the final picture list txt file.

two。 Using python script to write picture list list txt file

(1) create a project directory my-caffe-project under the root directory of caffe, and use the following instructions:

Cd / home/Jack-Cui/caffe-master & & mkdir my-caffe-project

(2) create and edit the create_db.py file, using the following instructions:

Vim create_db.py

The contents of the file editing are as follows:

#-*-coding: UTF-8-*-import osimport re "" function description: generate picture list list txt file Parameters: images_path-picture storage directory txt_save_path-picture list txt file save directory Returns: no Author: Jack CuiModify: 2017-03-29 "" def createFileList (images_path, txt_save_path): # Open picture list list txt file fw = open (txt_save_path "w") # View the files in the picture directory, equivalent to the shell instruction lsimages_name = os.listdir (images_path) # iterate through all file names for eachname in images_name:# regular expressions here can be changed according to the situation # regular expression rules: find the number that starts with cat, followed by 0 to 10 numbers And the picture file pattern_cat = r'(^ cat\ d {0Magne10} .jpg $)'# regular expression rule: look for the picture file pattern_bike = r'(^ fish-bike\ d {0jpg $)'# regular expression that begins with fish-bike, followed by 0 to 10 numbers, and ends with jpg. The regular expression matches cat_name = re.search (pattern_cat, eachname) bike_name = re.search (pattern_bike). Eachname) # write the content to the txt file if cat_name! = None: fw.write (cat_name.group (0) +'1\ n') if bike_name! = None: fw.write (bike_name.group (0) +'2\ n') # print success message print "generate txt file successfully" # close fwfw.close () if _ name__ = ='_ main__ ': # caffe_root directory caffe_root =' / home/Jack-Cui/caffe-master/'#my-caffe-project directory my_caffe_project = caffe_root + 'my-caffe-project/'# picture storage directory images_path = caffe_root +' examples/images/'# generated picture list txt file name txt_name = 'filelist.txt'# generated picture list list txt file save directory txt_save_path = my _ caffe_project + txt_name# generates the txt file createFileList (images_path Txt_save_path)

(3) run the create_db.py script file, using the following instructions:

Python create_db.py

(4) use the instruction cat create_filelist.py to view the result as follows:

! =

3. Use python script to execute convert_imageset file to generate db file

The generated filelist.txt file can be used directly as the third parameter.

Next, let's take a look at what the parameter group FLAGS contains:

Gray: whether to open the picture as a grayscale image. The program calls the imread () function in the opencv library to open the picture. The default is false.

Backend: the db file format to be converted. It can be leveldb or lmdb. The default is lmdb.

Resize_width/resize_height: change the size of the picture. In operation, all pictures are required to be the same size, so you need to change the size of the picture. The program calls the resize () function of the opencv library to zoom in and out of the picture. The default is 0 and does not change.

Check_size: check that all data have the same size. Defaults to false and does not check

Encoded: whether to put the original image encoding into the final data. Default is false.

Encode_type: corresponding to the previous parameter, which format will the picture be encoded into: 'png','jpg'...

Now that we know these parameters, we can call the command to generate the final data in lmdb format.

(1) continue to write the create_db.py file, using the following instructions:

Vim create_db.py

The file is added as follows:

#-*-coding: UTF-8-*-import commandsimport osimport re "" function description: generate picture list list txt file Parameters: images_path-picture storage directory txt_save_path-picture list txt file save directory Returns: no Author: Jack CuiModify: 2017-03-29 "" def createFileList (images_path, txt_save_path): # Open picture list list txt file fw = open (txt_save_path "w") # View the files in the picture directory, equivalent to the shell instruction lsimages_name = os.listdir (images_path) # iterate through all file names for eachname in images_name:# regular expressions here can be changed according to the situation # regular expression rules: find the number that starts with cat, followed by 0 to 10 numbers And the picture file pattern_cat = r'(^ cat\ d {0Magne10} .jpg $)'# regular expression rule: look for the picture file pattern_bike = r'(^ fish-bike\ d {0jpg $)'# regular expression that begins with fish-bike, followed by 0 to 10 numbers, and ends with jpg. The regular expression matches cat_name = re.search (pattern_cat, eachname) bike_name = re.search (pattern_bike). Eachname) # write the content to the txt file according to the rules if cat_name! = None: fw.write (cat_name.group (0) +'1\ n') if bike_name! = None: fw.write (bike_name.group (0) +'2\ n') # print success message print "generate txt file successfully" # close fwfw.close () "" function description: generate lmdb file Parameters: caffe_root-caffe root directory images_path-picture storage directory txt_save_path-picture list list txt file save directory Returns: no Author: Jack CuiModify: 2017-03-29 "" def create_db (caffe_root Images_path, txt_save_path): # lmdb file name lmdb_name = 'save directory of db files generated by img_train.lmdb'# lmdb_save_path = caffe_root +' my-caffe-project/' + lmdb_name#convert_imageset tool path convert_imageset_path = caffe_root + 'build/tools/convert_imageset'cmd = "% s-shuffle-resize_height=256-resize_width=256% s% s"status Output = commands.getstatusoutput (cmd% (convert_imageset_path, images_path, txt_save_path) Lmdb_save_path) print outputif (status = = 0): print "lmbd file generated successfully" if _ _ name__ = ='_ _ main__':#caffe_root directory caffe_root ='/ home/Jack-Cui/caffe-master/'#my-caffe-project directory my_caffe_project = caffe_root + 'my-caffe-project/'# picture storage directory images_path = caffe_root +' examples/images/'# list of pictures generated Txt file name txt_name = 'list of pictures generated by filelist.txt'# the saved directory of the txt file txt_save_path = my_caffe_project + txt_name# generates the txt file createFileList (images_path Txt_save_path) # generate lmdb file create_db (caffe_root, images_path, txt_save_path)

Set the parameter-shuffle to disrupt the order of the pictures. Set the parameters-resize_height and-resize_width to change the size of all images to 256 "256. . / home/xxx/caffe-master/examples/images/ is the absolute path where the picture is saved, and my caffe is placed in the / home/Jack-Cui directory. The final result runs as follows, and it is done!

This is the answer to the question about how to use Python to generate the lmdb file and txt list file needed for caffe. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.