Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python deal with Excel data

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how Python deals with Excel data". In daily operation, I believe many people have doubts about how Python deals with Excel data. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "how Python handles Excel data". Next, please follow the editor to study!

I like to use actual cases to illustrate the problem. This article uses the passenger information table from the Titanic sinking:

Implement several simple analysis requirements:

Find out the number of groups with multiple people (2 or more) on board.

List the information about these people.

Are there the luckiest friends and relatives (many people boarded the ship and all rescued)?

Is "operation Excel" equal to "data processing"?

Beginners often mistakenly think that operating Excel is processing data, but they are actually two different things.

The requirement is: "the name and address column content is usually very long, and you want to fill it with a reduced font when the final Excel is displayed."

For this formatting setting, vba is definitely the best choice! Because we can get the approximate code automatically by recording macros.

By simply recording a macro, we can write the following implementation:

If we use python to implement the same requirements, there must be a lot of code and it will be difficult to debug.

If you read my column "take you to play with Python data processing-pandas", you will think that this is the process of "data presentation" in the section on data processing flow.

In other words, if your data task eventually needs to output Excel files, vba is the best tool for automating the "data presentation" process.

Unfortunately, most of the requirements in reality are not simple, they all need "data processing", so how to deal with data in vba?

Vba uses array + dictionary, which means high efficiency.

Most of the students who open their mouths against python without thinking are fascinated by their confidence in their vba "array + dictionary" skills.

Let's look at a processing requirement related to data analysis:

We have noticed that some people are friends and relatives on board, such as:

From the "ticket stub", we can see that they boarded the ship together.

From the same "address", we can see that they are a family of four.

We need to figure out how many groups of these two or more people are on board?

The following attempts to solve with vba:

In fact, there is not much code, and the skills are only basic, but if you can pandas, the students will definitely say, "it's too tedious."

Because for pandas, it is as follows:

There are only four sentences in the code, and the most important one is only three sentences, which indicate:

Load data

Count the number in groups by "ticket stub"

The sum of quantities greater than 1

Isn't this the processing thinking of a normal person? This is simplicity.

Can be similar to the semantic expression of requirements, the less redundant expression, the more concise.

Looking back at the expression of vba, there are many superfluous expressions.

Excel has a very useful statistical tool-PivotTable. You can try to get the PivotTable operation code by recording macros, but you will still find a lot of redundant expressions.

Sql's expression is more concise, but to achieve the above requirements, you will find that his expression order needs to be "bypassed".

Some unconvinced students will say, "I can only write this vba code for 1 minute, and I can get the correct result anyway."

When the requirements are constantly changing, you will find that such code will eventually lead to an unachievable dead end.

We just learned that there are 216 groups of relatives and friends who boarded the ship together, but there is only one number, and we would like to see what the 216 data looks like.

Which means outputting these passenger data.

Implementation of python:

By the way, it's easy to observe.

The extra expression of the code here is the lambda word. If it were sql, it would be very concise.

The implementation of vba is too troublesome, so leave it to those unconvinced vba fans.

Now you can probably understand why Python is so popular in the data world.

Data processing in data analysis requires your code to catch up with your thinking speed, which can only be done in a concise language.

In theory, sql should be a better choice, but in fact many complex requirements are implemented, and sql requires a large number of nested queries, which is not concise at all. We will give examples later.

The real demand is "operating Excel" + "data processing". What should we do?

Ideally, use vba to manipulate Excel and leave data processing to Python, and you need a bridge between vba and python, which is the best practice for xlwings or other similar libraries.

If you completely use xlwings to control Excel,Python code and manipulate Excel is very awkward to write, once you understand the difference between "manipulating Excel" and "data processing", you will naturally know how to organize your code.

Python needs to be installed separately, so it is not as good as vba

This is a boring conclusion, because any automation tool needs to be installed, such as learning the most cost-effective Sql, it also needs to install the corresponding driver to execute.

While part of my work environment needs to be performed on the server, it is impossible to install Excel,vba or use it, but I can't draw the conclusion that "vba is not as good as python".

If you can't install python in your work environment, but you need to do a lot of data processing tasks, then I can only say that unfortunately, you can only sacrifice a lot of your time to use vba to complete the requirements.

Can't Vba have pandas?

"after talking for a long time, isn't it just that python has one more pandas library than vba? just find a vba guy to write a vba-pandas!"

Vba inherently lacks a language feature that makes it impossible for you to play even if you have the ability to write libraries.

This is the ability to extract logic.

In general, if some data in a piece of code is not fixed, we can extract the parameters of the function, such as the simplest numerical calculation:

Define three parameters respectively for you to enter, but the calculation method is fixed.

He can do the same for vba, but if a piece of code is not fixed, can it be extracted as a parameter?

Python can do this, such as the following function, which allows you to enter three numbers, and it is up to you to decide how to calculate the first two numbers, and finally multiply with the third number:

As a result, the first number is multiplied by 10 + the second number times 100 (this is the logic of change, determined by the person who uses the function), and finally multiplied by the third number (this is fixed logic, determined by the person who defines the function).

Vba needs to use interfaces to achieve the same effect, which is very troublesome and does not have any practical value for data tasks.

What's the use of this? State with the last requirement:

Are there the luckiest friends and relatives (many people boarded the ship and all rescued)?

If I only give you a set of passenger data with the same "ticket stub", how can I tell that they are all alive?

Just see if the "Survival" column is all 1. There are many ways, in addition to the above, you can also "add the sum to see if it is equal to the number of rows recorded."

The filter used in the previous requirement can accept a piece of logic (function). Pandas is responsible for grouping you. You only need to describe the logic that meets the conditions in the function:

Pass the function just defined to the filter function

And vba can't do this, which means he can't make a library as good as pandas!

Because many seemingly complex processes are actually made up of a lot of fixed logic + changing logic.

For example, the principle of grouping is similar to using dictionaries in vba, which is relatively fixed and can be done by the library.

However, after grouping, the processing logic of each group is changed, which is temporarily decided by the users. for example, in the previous requirements grouping, we sometimes need to count and sometimes we need to filter. The logic of screening is also ever-changing.

At this point, the study on "how Python handles Excel data" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report