In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "what are the skills to improve the speed of Python data analysis". In the daily operation, I believe that many people have doubts about the skills of improving the speed of Python data analysis. I have consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the questions of "what are the skills to improve the speed of Python data analysis?" Next, please follow the editor to study!
1. Preview Datagram data in Pandas (Dataframe)
Analysis Preview (profiling) is a process that helps us understand the data. In Python, Pandas Profiling is a toolkit that can accomplish this task. It can easily and quickly perform search data analysis on Pandas data boxes. The df.describe () and df.info () functions in Pandas usually implement the first step of the EDA process, but just giving a very basic data preview does not help you analyze those large datasets. On the other hand, Pandas Profiling functions can display a lot of information in a single line of code, as well as in interactive HTML reports.
For a given dataset, the Pandas Profiling toolkit calculates the following statistics:
Statistics calculated from the pandas profiling package
Code example:
Installation
In the version of Python2.x, use pip or conda to install the pandas-profiling resource pack:
Pip install pandas-profilingorconda install-c anaconda pandas-profiling
Use
Now use an old Titanic data set to demonstrate the results of multi-functional python profiler:
# importing the necessary packagesimport pandas as pd # # use pandas Resource Pack import pandas_profiling # # use the newly installed pandas profiling Resource Pack df = pd.read_csv ('titanic/train.csv') # # read data to form a data frame pandas_profiling.ProfileReport (df) # # use pandas profiling
Analysis data
This line is all the code you need to generate a data analysis report in jupyter notebook. This data report is very detailed and includes all the necessary charts.
The report can also be exported as an interactive HTML file (interactive HTML file) with the following code:
Profile = pandas_profiling.ProfileReport (df) profile.to_file (outputfile= "Titanic data profiling.html") # # form a Titanic data profiling.html web page
The interactivity of 2.Pandas chart (Plot)
There is a built-in .plot () function in Pandas as part of the Dataframe, but because the visualization rendered by this function is not interactive, this makes its functionality less attractive. Also, it is not easy to draw a chart using the pandas.DataFrame.plot () function. What if we want to draw interactive diagrams with pandas without making major changes to the code? Well, you can use the Cufflinks resource pack to help you do this.
The Cufflinks resource kit combines powerful plotly with flexible and easy-to-use pandas to make it easy to draw. Now let's take a look at how to install and use this resource pack in pandas.
Code example:
Installation
In the version of Python2.x, use pip to install plotly and cufflink:
Pip install plotly # Plotly is a pre-requisite before installing cufflinks (plotly installs before cufflinks) pip install cufflinks
Use
Call method:
# importing Pandas import pandas as pd # # using pandas resource pack # importing plotly and cufflinks in offline modeimport cufflinks as cf # # using cufflinks and plotly resource bundle import plotly.offlinecf.go_offline () # # using the function cf.set_config_file (offline=False, world_readable=True) in the cufflink package
Let's take a look at the magic of the Titanic dataset:
Df.iplot ()
The visualization on the right shows a static line diagram, while the one on the left is interactive and more detailed, with no significant changes in code.
3. A little magic.
Magic commands are a set of handy features in Jupyter Notebook that are designed to solve some common problems in data analysis. You can use% Ismagic to check all Magic commands.
The figure above lists all the available Magic functions
There are two main types of Magic commands: line magic commands (line magics), prefixed with a single% character, and single-line input operations; unit magics commands (cell magics), prefixed with double% characters, can enter operations on multiple lines. If set to 1, we don't need to type% when using the magic function.
Let's take a look at some commands that might be used in common data analysis tasks.
% pastebin
% pastebin uploads the code to Pastebin and returns a link. Pastebin is an online content hosting service where we can store plain text, such as source code fragments, and form links that can be shared with others. In fact, Github gist is similar to pastebin, except that it comes with version control.
Code example:
Take a look at the contents of this file.py 's python code file:
# file.pydef foo (x): return x
Use% pastebin in Jupyter Notebook to form a link to pastebin.
% matplotlib notebook
The% matplotlib inline function is used to render a static matplotlib diagram in a Jupyter notebook. We can try using notebook instead of inline to get a drawing that can be easily scaled and resized, but be sure to call this function before applying the matplotlib resource bundle.
Matplotlib inline vs matplotlib notebook
% run
The% run function is used to run a python script file in jupyter notebook.
Writefile
Writefile writes the contents of the execution unit to a file. The following code is written to a file named foo.py and saved in the current directory.
Latex
The%% latex function renders the contents of the cell as LaTeX. It is useful for writing mathematical formulas and equations in cells.
4. Find and reduce errors
The interactive debugger (interactive debugger) is also a Magic function, but I have to classify it. If you have an exception in the running code unit, you can type% debug in the new line to run. This opens an interactive debugging environment that tells you where the code exception occurred. You can also check the value of the variable assigned in the program and perform the operation here. Click Q to exit the debugger.
5. The output can be so beautiful.
If you want to generate beautiful data structures, pprint is the preferred module. It is especially useful when outputting dictionary data or JSON data. Let's take a look at an example of print and pprint output:
6. Make the prompt more prominent.
You can use the prompt / comment box in your Jupyter Notebook to highlight anything important. The color of the comment depends on the type of prompt you specify. Just add something to your code that needs to be highlighted.
Blue prompt box: comment
Code example:
# the prompt box begins with Tip: Use blue boxes (alert-info) for tips and notes. If it's a note, you don't have to include the word "Note". # content of prompt box # end of prompt box
Output result:
Yellow prompt box: warnin
Code example:
Example: Yellow Boxes are generally used to include additional examples or mathematical formulas.
Output result:
Green prompt box: successful
Code example:
Use green box only when necessary like to display links to related content.
Output result:
Red prompt box: high risk
Code example:
It is good to avoid red boxes but can be used to alert users to not delete some important part of code etc.
Output result:
7. Output all the results in an execution unit
Let's take a look at a few lines of code contained in the Jupyter Notebook grid:
In [1]: 10'5 11+6Out [1]: 17
Usually an execution unit outputs only the results of the last line, while for other outputs we need to add the print () function. Well, it turns out we can output the result of each line by adding the following code at the beginning of Jupyter Notebook:
From IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all"
Now all the results can be output one by one:
In [1]: 10'5 11'6 12+7Out [1]: 15Out [1]: 17Out [1]: 19
To revert to the initial settings:
InteractiveShell.ast_node_interactivity = "last_expr"
8. Run the Python script file using the'i' option
A typical way to run a python script from the command line is: python hello.py. However, if you add an extra-I, such as python-I hello.py, when running the same script file, this will bring more benefits. Let's see what's going on:
First, once the program is finished, python does not exit the compiler. Therefore, we can check the value of the variable and the correctness of the function defined in the program.
Second, we can easily call the python debugger because we are still in the compiler:
Import pdbpdb.pm ()
This will take us to where the exception occurred in the code, and then we can deal with the code.
9. Automatically add code comments
The Ctrl / Cmd + / command automatically comments the selected lines in the execution unit. Clicking the combination again will uncomment the same line of code.
10. It is easy to delete and difficult to restore
Have you ever accidentally deleted the execution unit in Jupyter Notebook? If so, there is a shortcut to undo the deletion.
If you delete the contents of the execution unit by mistake, you can easily restore it by clicking CTRL/CMD+Z.
If you want to restore all the contents of the deleted execution unit, click ESC+Z or EDIT > Undo Delete Cells
At this point, the study of "what are the skills to improve the speed of Python data analysis" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.