How to extract and collect information automatically by Python 07/10 Update SLTechnology News&Howtos

How to extract and collect information automatically by Python

2025-07-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces Python how to achieve automatic extraction and collection of information related knowledge, the content is detailed and easy to understand, simple and fast operation, has a certain reference value, I believe that after reading this Python how to achieve automatic extraction and collection of information article will have a harvest, let's take a look.

I. brief introduction

The purpose of this function is to extract the information on receipts / invoices, replace people with machines, and improve work efficiency.

The way to achieve this is to intercept the required information through the cv2 module and call Baidu's api character recognition interface.

Second, code implementation

1. Import the required libraries, including Baidu's api interface and cv2 image screenshot library

Import cv2from aip import AipOcr# reads the picture and uses imshow to display the picture pic = cv2.imread (ritual Ypurcutimg1.png') pic = cv2.resize (pic,None,fx = 0.5, fy = 0.5) cv2.imshow ('img',pic) cv2.waitKey (0)

two。 Capture the picture and get the information you need, including the following information

Time Time

Merchant business

Commodity goods

Price money

Order number num

# Delete unnecessary parts img = pic [210 time 500, 100 time 580] # intercept the words time = pic [400 time 580] business = pic [370 time 400, 100 time 580] goods = pic [350 pic 380,100 pic 580] money = pic [210 time 300, 100 time 580] num = pic [460 time 500, 100 num 580] # check whether the intercepted part is appropriate or not 'num'] excel_data = {} pd_columns = ["a", "b", "c", "d", "e"] # title

3. The definition function saves the intercepted pictures to the folder.

Def shotcut_image (args): for index in gener: cv2.imwrite ('image/ {} .png' .format (args), img)

4. Call Baidu api interface to realize character recognition

# Import apiAppID = '24177719'API_Key =' p8skmRYfHGoVGR4UU03Q5jiM'Secret_Key = 'dyM0tzSILBZu9CFqZ7IkjWwECGaws4xo'cilent = AipOcr (AppID,API_Key,Secret_Key) def get_words (img_name): with open (' image/ {} .png '.format (img_name),' rb') as f: result = cilent.basicAccurate (f.read ()) return result

5. Finally, the information is converted into Dataframe, and the data is put into excel using the to_exccel function of pandas.

Def convert_to_dataframe (words): # build dataframe result = words ['words_result'] for word in result: excel_data.setdefault (' words, []) .append (word ['words']) # after all words is read, take out the statement and store it in exceldef convert_to_excel (): frame = DataFrame (excel_data, columns=pd_columns) # todo header requires extra processing Specify that the header frame.to_excel ('out.xls',index=False, header=False) is not set here. This is the end of the article on "how to automatically extract and collect information from Python". Thank you for reading! I believe that everyone has a certain understanding of the knowledge of "how to automatically extract and collect information from Python". If you want to learn more knowledge, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.