Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the 10 simple techniques for accelerating data analysis using Python

2025-03-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you 10 simple techniques for accelerating data analysis using Python. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

Tips and techniques, especially in the field of programming, can be very useful. Sometimes, a little trick can save time and life. A small shortcut or add-on sometimes turns out to be a godsend and can really increase productivity. So here are some of my favorite tips and tips that I'll use and compile together in the form of this article. Some may be familiar to us, some may be new, but I'm sure they will come in handy the next time you work on a data analysis project.

1. Analyze pandas data frames

Profiling is a process that helps us understand data, and Pandas Profiling is the python package that performs this process. This is a simple and fast method for exploratory data analysis of Pandas Dataframe (data frames). The df.describe () and df.info () functions of Pandas are often used as the first step in the EDA process. However, it only provides a very basic overview of the data, which is not very helpful in the case of large datasets. On the other hand, the Pandas Profiling function uses df.profile_report () to extend pandas data frames for fast data analysis. It displays a lot of information in a single line of code, as well as in interactive HTML reports.

For a given dataset, the pandas profiling package calculates the following statistics:

The statistics calculated by the Pandas Profiling package.

Installation

Usage

Let's use the ancient titanic dataset to demonstrate the functionality of the multi-functional python analyzer.

Editor's note: a week after the release of this article, Pandas-Profiling released a major upgrade-2.0.0. Some changes have taken place in the syntax, in fact, its functionality has been included in the pandas itself, and the report has become more comprehensive. Here is the latest usage syntax:

Usage

To display the report in Jupyter notebook, run the following code:

You only need this line of code to display the data analysis report in a Jupyter notebook. The report is very detailed and includes the necessary charts.

You can also use the following code to output the report to an interactive HTML file.

two。 Bring interactivity to pandas diagrams

Pandas has a built-in .plot () function, which is part of the DataFrame class. However, the visualization rendered using this function is not interactive, which makes it less attractive. On the contrary, the use of pandas. The convenience of drawing graphs with the datafram. Plot () function. What if we could use pandas to draw interactive diagrams like plotly without making major changes to the code? In fact, you can do this using the Cufflinks library.

The Cufflinks library combines the power of plotly with the flexibility of pandas to make it easy to draw. Now let's see how to install the library and make it run in pandas.

Installation

Usage

It's time to look at the magic of deployment with Titanic datasets.

Df.iplot () vs. df.plot ()

The visualization on the right shows a static chart, while the one on the left is interactive and more detailed, none of which makes any significant changes to the syntax.

3. A little magic.

Magic commands are a set of convenient functions in Jupyter Notebook designed to solve some common problems in standard data analysis. You can view all available magic commands through the% lsmagic command.

A list of all available magic functions

There are two types of magic commands: line magics (line magic) and cell magics (unit magic), the former prefixed with a single% character and operated on one line of input, while the latter is associated with two% prefixes and operates on multiple lines of input. If you set the option of the magic function to 1, you don't need to type the initial% to call it.

Let's take a look at some magic functions that might be useful in common data analysis tasks:

% pastebin

Pastebin uploads the code to Pastebin and returns its url. Pastebin is an online content hosting service where we can store plain text, such as source code fragments, and then share url with others. In fact, Github gist is similar to Pastebin, although it has version control.

Let's assume that there is a python script file.py that contains the following:

Use% pastebin in Jupyter Notebook to generate a url address for pastebin.

% matplotlib notebook

The% matplotlib inline function is used to render static matplotlib drawings in Jupyter notebook. Try replacing the inline section with notebook, and you can easily get scalable and resizable drawings. Be sure to call this function before importing the matplotlib library.

% matplotlib inline vs.% matplotlib notebook

% run

The% run function runs a python script in a notebook.

Writefile

Writefile will write the contents of the cell to the file. The code here will be written to a file called foo.py and saved in the current directory.

Latex

The% latex function renders the contents of the cell as LaTeX. It is useful for writing mathematical formulas and equations in cells.

4. Find and eliminate errors

Interactive debugger (interactive debugger) is also a magic function, but I have grouped it into a separate category. If an exception occurs while running the code cell, type% debug on a new line and run it. This opens an interactive debugging environment that takes you to where the exception occurred. You can also check the value of the assigned variable in the program and perform the operation here. To exit the debugger, press Q.

5. Printout can also be beautiful.

If you want to generate a beautiful representation of your data structure, pprint is the module of choice. It is especially useful when printing dictionary or JSON data. Let's look at an example of displaying output using both print and pprint.

6. Highlight the comments.

We can use the InfoTip / comment box in Jupyter Notebook to highlight something important or anything that needs to be highlighted. The color of the comment depends on the type of prompt you specify. You just need to add any or all of the following code to the cells that need to be highlighted.

Blue message box: information

Yellow message box: warning

Green prompt box: successful

Red prompt box: dangerous

7. Print all the output in a cell

Suppose you have a Jupyter Notebook cell with the following lines of code:

Printing only the last output is a normal attribute of the cell, while for other outputs, we need to add the print () function. In fact, we can print out all the output simply by adding the following code snippet to the top of the notebook.

Now all the output is printed out one by one.

Restore to the original settings:

8. Run the Python script using the'i' option

A typical way to run python scripts from the command line is: python hello.py. However, if you add an additional-I, such as python-I hello.py, when running the same script, it will provide more advantages. Let's find out.

First, once the program is finished, python does not exit the interpreter. Therefore, we can check the value of the variable and the correctness of the function defined in the program.

Second, we can easily call the python debugger with the following code, because we are still in the interpreter:

This will take us to where the exception occurred, and then we can deal with the code.

The initial source of this technique. (http://www.bnikolic.co.uk/blog/python-running-cline.html)

9. Comment the code automatically

Ctrl/Cmd + / automatically comments the selected line in the cell. Clicking the key combination again will uncomment the same line of code.

10. Being able to delete is a man, and being able to restore is a god.

Have you ever accidentally deleted a cell in Jupyter Notebook? If so, there is a shortcut to undo this deletion.

If you delete the contents of a cell, press CTRL/CMD+Z to easily restore it

If you need to restore a completely deleted cell, click ESC+Z or EDIT > Undo Delete Cells

Conclusion

I listed the main techniques I collected when using Python and JupyterNotebook.

These are the 10 simple techniques shared by the editor for you to use Python to accelerate data analysis. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 249

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report