How to use Python to realize data filtering and matching 07/08 Update SLTechnology News&Howtos

How to use Python to realize data filtering and matching

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article is about how to use Python to filter and match data. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Data filtering requires us to filter out qualified data in the table.

Data matching requires us to match relevant data between multiple tables.

As before, to complete the code of the project problem, we need to first analyze the requirements of data filtering and data matching, and then find the corresponding knowledge points and determine the execution order of the code, so as to realize the project code.

Case 1: data filtering

This case requires us to screen out the information of those who are late and take a look at it in detail.

In the [October attendance Statistics. Xlsx] workbook, the late information of 100 employees of the company is saved, including the time of being late and the number of times.

According to the regulations of the company, employees who are late for more than 45 minutes and have been late for more than 3 times are counted as unqualified for attendance and need to deduct a 300 attendance deposit.

Former colleagues need to save the screened results as [October latecomer information. Xlsx] and report the sorted information to the leader.

So how do you implement this scenario in code?

Before we write code, we need to be clear about the task requirements.

According to the company's regulations, select the employee information in [October attendance Statistics .xlsx] who are late for more than 45 minutes and have been late more than 3 times. Print out the information of those who are late and then store them in a new workbook [October latecomer information .xlsx].

Code implementation:

From openpyxl import load_workbook Workbook # Open [October attendance statistics .xlsx] workbook wb = load_workbook ('. / material/ October attendance statistics .xlsx') # get the active worksheet ws = wb.activeprint (ws) print (ws [1]) print ('-') # get the header late_header = [] for cell in ws [1]: late_header.append (cell.value) print ( Cell.value) # New workbook new_wb = Workbook () # get the worksheet in the new workbook new_ws = new_wb.active # write the header to the worksheet in the new workbook new_ws.append (late_header) # traverse the table for row in ws.iter_rows from the second row (min_row=2 Values_only=True): # take out the name Late time and number of late times name = row [1] time = row [3] number = row [- 1] # determine whether or not to be late if time > 45 and number > 3: print ('{} late {} minutes Late {} times' .format (name, time, number) # write late information to the worksheet of the new workbook new_ws.append (row) # Save the new workbook as [October late information .xlsx] new_wb.save ('. / material/ October late information .xlsx')

Running result:

According to the requirements of the task, we need to obtain two parts of data: header data and all data except headers.

You may be wondering why you want to get the header data separately.

Because the task requires us to generate a new workbook [October latecomer information .xlsx], the header in the new workbook is the same as [October attendance statistics .xlsx], so we need to get the data in the header for later use.

Usage data

We need to implement the data filtering function in this step, and three filtering criteria can be summarized by analyzing the task requirements:

1) the late time is more than 45 minutes.

2) the number of lateness is more than 3 times.

3) meet the above two conditions at the same time.

After defining the filter conditions, we can filter the data with the help of the basic knowledge of Python, such as conditional judgment statements, comparison operators, member operators and logical operators.

If we use time to represent late time and number to represent late times, then the filter criteria can be written as follows:

If time > 45 and number > 3:

Data output

After the screening is completed, we need to output the filter results to the terminal or save the filter results according to the actual needs.

This task requires us to print out the screened employee information and store it in [October latecomer information .xlsx].

If you need to get data in a workbook that meets certain criteria, this scenario can be classified as a data filtering scenario.

When dealing with this scenario, you can follow the three steps of getting data, using data, and data output.

First of all, get the data, use the relevant knowledge of table reading and writing learned in the last lesson, and determine whether to obtain scattered cells, single row / single column, or multi-row / multi-column data according to the task requirements.

The key to data filtering lies in the word screening, and we can implement the filtering function in the step of using data.

In this step, we should carefully understand the task requirements, clarify the filter conditions, and then according to the actual situation, select the relevant knowledge of Python basic syntax (conditional judgment statement, comparison operator, member operator and logic operator) to construct the filter condition.

Finally, there is the data output part, which outputs the filter results according to the actual needs, or saves the filter results. To sum up, it can be divided into three categories:

1) store the filtering results in learned data structures, such as lists, tuples, or dictionaries.

2) save the filtering results to the file.

3) print the results of the filter.

Case 2: data matching

This case requires us to match the number of lateness specified in the two tables. Let's take a look at the case scenario first.

There are two forms, [October attendance Statistics. Xlsx], which record the number of lateness of employees in October, which is manually recorded by the company's administrative staff.

[monthly statistics on the number of lateness (updated in October). Xlsx] records the monthly data on the number of lateness of employees. This form is automatically generated by the company's attendance system.

The data in the two tables can be corresponded one by one through the work number.

Now it is necessary to check whether the times of being late in October match in the two tables (that is, whether the number of lateness of the same work number in October is the same in the two tables), and remind the relevant personnel at the terminal to check the mismatch.

Code implementation:

From openpyxl import load_workbook # Open workbook [October attendance Statistics .xlsx] Get active worksheet wb = load_workbook ('. / material/ October attendance statistics. Xlsx') ws = wb.active # create latecomer dictionary info_dict = {} # cycle to read tabular data for row in ws.iter_rows (min_row=2) except header Values_only=True): # take out the employee ID staff_id = row [0] # take out the number of late times staff_late = row [- 1] # add the information to the dictionary The dictionary format is {'employee ID': 'number of lateness'} info_ employees [staff _ id] = staff_late # Open the workbook [monthly statistics of lateness (updated in October) .xlsx] Get active worksheet monthly_wb = load_workbook ('. / material/ monthly statistics of lateness (updated in October). Xlsx') monthly_ws = monthly_wb.active # cycle to read out the tabular data outside the header for monthly_row in monthly_ws.iter_rows (min_row=3, max_col=13) Values_only=True): # take out the employee's work number member_id = monthly_row [0] # take out the number of lateness in November member_late = monthly_row [- 1] # match whether the number of lateness is equal if member_late! = info_ employee [member _ id]: print ('job number {} tardiness does not match Please update '.format (member_id) after checking)

Running result:

Why did you choose to store it in a dictionary?

Because the dictionary can well reflect the corresponding relationship between the work number and the number of late arrivals, that is, {'work number': 'late times'}.

Then match the number of lateness in [monthly Statistics of lateness (updated in October) .xlsx] with the number of lateness stored in the dictionary, and then determine whether the number of lateness corresponding to the same job number is the same.

Thank you for reading! On "how to use Python to achieve data filtering and matching" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, you can share it out for more people to see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.