In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
In this issue, the editor will bring you what are the tips for accelerating Python data analysis. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.
Sometimes, a little hacking can save time and lives. A small shortcut or add-on sometimes turns out to be a godsend and can really increase productivity. So, here are some of my favorite techniques that I use and compile together in the form of this article. Some of them may be quite famous and some may be new, but I'm sure they will be very useful the next time you work on a data analysis project.
1.Profiling the pandas dataframe
Profiling is a program that helps us understand data, and Pandas Profiling is a python package that does this. This is a simple and fast method for exploratory data analysis of pandas data frames. The pandas df.describe () and df.info () functions are often used as the first step in the EDA process. However, it only provides a very basic overview of the data and is not very helpful for large datasets. On the other hand, the pandas analysis function extends pandas data frames with df.profile_report () for fast data analysis. It displays a large amount of information in a single line of code, as well as in interactive HTML reports. For a given dataset, the pandas analysis package calculates the following statistics:
Pandas Profiling package calculation Statistics
Installation
Pip install pandas-profiling-- or conda install-c anaconda pandas-profiling
Use
Let's use the old Titanic dataset to demonstrate the functionality of the generic Python parser.
# importing the necessary packages import pandas as pd import pandas_profiling # Depreciated: pre 2.0.0 version df = pd.read_csv ('titanic/train.csv') pandas_profiling.ProfileReport (df)
Note: a week after this article was published, Pandas-Profiling released an upgrade version 2.0.0. Some changes have taken place in the syntax, in fact, the functionality has been included in the pandas, and the report has become more comprehensive. The following is the latest syntax usage:
Use
To display the report in Jupyter notebook, run:
# Pandas-Profiling 2.0.0 df.profile_report ()
This line of code is all the code needed to display the data analysis report in Jupyter notebook. The report is very detailed and includes charts if necessary.
You can also export the report to an interactive HTML file with the following code.
Profile = df.profile_report (title='Pandas Profiling Report') profile.to_file (outputfile= "Titanic data profiling.html") 2. Bring interactivity to pandas plots
Pandas has a built-in .plot () function as part of the data frame class. However, the visualization rendered with this function is not interactive, which makes it less attractive. Conversely, the ease of using the pandas.dataframe.plot () function to draw a chart cannot be ruled out. What if we don't need to make major changes to the code and can draw interactive diagrams like pandas plots? You can do this with the help of the Cufflinks library. Cufflinks combines the power of plotly with the flexibility of pandas to make it easy to draw. Now let's see how to install the library and make it work in pandas.
Installation
Pip install plotly # Plotly is a pre-requisite before installing cufflinks pip install cufflinks
Use
# importing Pandas import pandas as pd # importing plotly and cufflinks in offline mode import cufflinks as cf import plotly.offline cf.go_offline () cf.set_config_file (offline=False, world_readable=True)
It's time to show off its magic with Titanic datasets.
Df.iplot () df.iplot () vs df.plot ()
The right view shows a static chart, the left chart is interactive, and in more detail, there is no significant change in syntax.
3. A little bit of Magic
The Magic command is a set of convenient functions in Jupyter notebook designed to solve some common problems in standard data analysis. With the help of% lsmagic, you can see all available magic.
List of all available magic functions
There are two types of magic commands: line magics (prefixed with a% character and operating on one line input) and unit magics (associated with the%% prefix and operated on multiple lines of input). If set to 1, you can call the magic function without typing the initial percentage.
Let's look at some of these features that might be useful in common data analysis tasks:
% pastebin
Pastebin uploaded the code to Pastebin and returned URL. Pastebin is an online content hosting service where we can store plain text (such as source code fragments) and then share URL with others. In fact, Github gist is similar to Pastebin, although there is version control.
Consider using python script file.py that contains the following:
# file.py def foo (x): return x
Generate pastebin url using% pastebin in Jupyter notebook
% matplotlib notebook
The% matplotlib inline function is used to render a static matplotlib drawing in Jupyter noteboo. Try replacing the embedded assembly with notebook to easily get a scalable and resizable drawing. Make sure that the function is called before importing the Matplotlib library.
Matplotlib inline vs matplotlib notebook
% run
The% run function runs the python script within notebook.
% run file.py
Writefile
% WriteFile writes the contents of the cell to the file. Here, the code will be written to a file called foo and saved in the current directory.
Latex
The% latex function renders the contents of the cell as LaTeX. It can be used to write mathematical formulas and equations in the unit.
4. Find and eliminate errors
Interactive debugger is also a magical function, but I have provided a category of my own for it. If you encounter an exception while running a unit of code, type% debug on the new line and run it. This opens an interactive debugging environment that takes you to the location where the exception occurred. You can also check the value of the variable assigned in the program and perform the action here. To exit the debugger, click Q.
5. The output can also be beautiful.
If you want to generate a beautiful representation of the data structure, pprint is the module you want, and it is especially useful when printing dictionary or JSON data. Let's look at an example of displaying output using print and pprint.
6. Highlight the alarm frame
We can use warning / comment boxes in your Jupyter notebook to highlight important content or anything that needs to be highlighted. The color of the comment depends on the type of alert. Simply add the following code to the cells that need to be highlighted.
Blue alert box: information prompt
Tip: Use blue boxes (alert-info) for tips and notes. If it's a note, you don't have to include the word "Note".
Yellow alert box: warnin
Example: Yellow Boxes are generally used to include additional examples or mathematical formulas.
Green alert box: successful
Use green box only when necessary like to display links to related content.
Red alert box: danger
It is good to avoid red boxes but can be used to alert users to not delete some important part of code etc.
7. Print all the output of the cell
Consider a Jupyter notebook unit that contains the following lines of code:
In [1]: 10'5 11+6Out [1]: 17
Usually, only the last output in the cell is printed, and for other outputs, we need to add the print () function. Well, in fact, we just need to add the following code snippet to the top of notebook to print all the output.
From IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all"
Now all the output is printed out one by one.
In [1]: 10: 5 11: 6 12+7Out [1]: 15 Out [1]: 17 Out [1]: 19
To revert to the original settings:
InteractiveShell.ast_node_interactivity = "last_expr" 8. Run the python script using the "I" file
A typical way to run python scripts from the command line is: python hello.py. However, if you add an additional-I hello.py when running the same script (such as python), it provides more advantages. Let's see how to do it. First, python does not exit the interpreter as long as the program does not end. Therefore, we can check the value of the variable and the correctness of the function defined in the program.
Second, we can easily call the Python debugger in the following ways, because we are still in the interpreter:
Import pdb pdb.pm ()
This will take us to where the exception occurred, and then we can deal with the code.
You can click here to view the source.
9. Automatically annotate code
Ctrl/cmd+/ automatically releases the selected lines in the cell, and clicking the combination again will uncomment the same line of code.
10. It is easy to delete and difficult to restore
Have you ever accidentally deleted a unit on Jupyter notebook? If so, there is a shortcut to undo the delete operation. If you delete the contents of a cell, you can easily restore it by pressing ctrl/cmd+z. If you need to restore the entire deleted cell, press Esc+Z or EDIT > Undo to undelete the deleted cell.
Conclusion the above are the tips for accelerating Python data analysis shared by Xiaobian. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.