Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the eight points that are easy to forget when using Python to do data science?

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article is to share with you what are the eight points that are easy to forget when doing data science with Python. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article. Without saying much, let's take a look at it with the editor.

If you find yourself looking for the same problem, concept, or grammar over and over again while programming, you are not alone!

While it's normal for us to look up answers on StackOverflow or other sites, it does take time and makes people wonder if you fully understand the programming language.

We now live in a world where there seems to be unlimited free resources, and you only need to search once to get it. However, this is both a blessing and a curse of this era. If you do not make effective use of resources, but rely too much on them, you will develop bad habits and be at a disadvantage for a long time.

When I Googled a question and found that someone had asked the same question, but there was only one answer below, and there was no new answer since 2003, I really felt sorry for the questioner! Weak, pitiful and helpless!

"who are you! where are you! what did you finally find?"

Personally, I found myself looking for code from similar technical questions and answers many times (see the cartoon illustrated above), rather than taking the time to learn and consolidate concepts so that I could write the code myself next time.

Searching for answers on the Internet is a kind of lazy behavior, although it may be the easiest way in the short term, it is not conducive to your growth after all, and will reduce your productivity and your ability to be familiar with grammar (ahem, this knowledge is very important during the interview).

target

Recently, I have been taking an online course in data science called Python for Data Science and Machine Learning at Udemy. In the early courseware of this series of courses, I remembered some concepts and grammars that I had been ignoring when using Python for data analysis.

In order to consolidate my understanding of these concepts once and for all, and to save you some StackOverflow search, I sorted out the things I always forget when using Python,NumPy and Pandas.

I provide a brief description and examples for each point. In order to bring benefits to readers, I have also added links to videos and other resources so that you can learn more about the concepts.

Single-line List Comprehension

Writing a for loop every time you need to define a list is tedious, but Python has a built-in way to solve this problem with a single line of code. The grammar may be a little difficult to understand, but once you are familiar with this technique, you will use it often.

* Line 8 is a single-line simplification of for loop

Refer to the example above and below to compare the difference between the for loop template you usually use when creating a list and creating it as a single line of code.

X = [1Jing 2 item**2 3 4] out = [] for item in x: out.append (item**2) print (out) [1, 4, 9, 16] # vs. X = [1 item**2 for item in x] print (out) [1, 4, 9, 16]

Lambda function

In the process of programming, in order to achieve the final function, we often create one phased function after another, and these functions are often used only once or twice. The process is annoying. This is when the Lambda function comes to rescue you!

The Lambda function is used to create small, disposable and anonymous function objects in Python. Basically, they allow you to create a function "without creating a new function".

The basic syntax of the lambda function is as follows:

Lambda arguments: expression

So, as long as you give it an expression, the lambda function can do what all regular functions can do. Take a look at the simple example below and the video below to get a better feel for the power of the lambda function.

Double = lambda x: X * 2 print (double (5)) 10

Map and Filter

Once you have mastered the lambda functions and learned to use them with the map and filter functions, you will have a powerful tool.

Specifically, the map function takes a list and converts it into a new list by performing some action on each element. In the following example, it iterates through each element and maps the result of multiplying it by 2 to a new list. Note that the list function here simply converts the output to a list type.

# Map seq = [1,2,3,4,5] result = list (map (lambda var: var*2, seq)) print (result) [2,4,6,8,10]

The filter function takes the input of lists and rules, much like map, but returns a subset of the original list by comparing each element with a Boolean filtering rule.

# Filter seq = [1,2,3,4,5] result = list (filter (lambda x: X > 2, seq)) print (result) [3,4,5]

Arange and Linspace

To create a quick and simple NumPy array, look at the arange and linspace functions. They all have specific uses, but what we like here is that they all output Numpy arrays (rather than their scope of use), which is usually easier to use in data science.

Arange returns evenly spaced values within a given range. In addition to the start and end values, you can also define the step size or data type as needed. Note that the termination value is a "cut-off" value, so it is not included in the array output.

# np.arange (start, stop, step) np.arange (3,7,2) array ([3,5])

Linspace is very similar to Arange, but slightly different. Linspace is an evenly spaced number that returns a specified number within a specified range. So given a start value and end value, and specify the number of returned values, linspace will divide the NumPy array according to the number you specify. This is especially useful for data visualization and when defining chart axes.

# np.linspace (start, stop, num) np.linspace (2.0,3.0, num=5) array ([2.0,2.25,2.5,2.75,3.0])

The true meaning of Axis

You may encounter this problem when deleting columns in Pandas or summing values in the NumPy matrix. Even if not, then you are sure to meet it at some point in the future. Let's now look at an example of deleting a column:

Df.drop ('Row Aids, axis=0) df.drop (' Column Aids, axis=1)

I don't know how many times I wrote this line of code before I knew why I defined the axis in this way. As you can see from the above, if you want to process columns, set axis to 1, and if you want to process rows, set it to 0.

But why is that? I remember my favorite explanation was this:

Df.shape (# of Rows, # of Columns)

Calling the shape property from the dataframe of Pandas returns a tuple where the first value represents the number of rows and the second value represents the number of columns. If you think about how the index is built in Python, that is, behavior 0, column 1, you will find that this is very similar to the way we define axis values. Isn't that interesting!

Concat, Merge, and Join

If you are familiar with SQL, then these concepts may be easier for you. In any case, these features are basically ways to combine dataframe in a specific way. It may be difficult to judge when and which is the best to use, so let's all review it.

Concat allows users to attach one or more dataframe below or next to it (depending on how you define the axis).

Merge can combine multiple dataframe based on a specific, shared primary key (Primary Key).

Join, like merge, can combine two dataframe. However, it is combined according to their indexes, rather than specific primary keys.

You can check out the helpful Pandas documentation for syntax and specific examples and special situations you may encounter.

Pandas Apply

Apply is similar to the map function, except that it is for Pandas DataFrames or, more specifically, for Series. It doesn't matter if you're not familiar with it, Series is very similar to array in NumPy.

Apply sends a function to each element in the column or row based on what you specify. You can imagine how useful this is, especially when dealing with formatting or numeric operations on the entire DataFrame column, eliminating loops.

PivotTable

The last thing I want to talk about is the PivotTable. If you are familiar with Microsoft Excel, you may have heard of PivotTable reports. Pandas's built-in pivot_table function creates a spreadsheet-style PivotTable report as DataFrame. Note that the dimensions in the PivotTable are stored in the MultiIndex object, which is used to declare the index and columns of DataFrame.

Conclusion

So much for my Python programming tips. I hope the important but somewhat tricky methods, functions, and concepts I've come across when using Python for data science will help you.

I have also benefited a lot from organizing these contents and trying to explain them in simple terms.

These are the eight points that are easy to forget when using Python to do data science. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report