Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use pandas_profiling to complete exploratory data Analysis

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

How to use pandas_profiling to complete exploratory data analysis, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

The author recently discovered a package--pandas_profiling that quickly converts pandas data boxes into descriptive data analysis reports. One line of code can generate rich EDA content, and two lines of code can save the report in .html format. The author also started from data analysis, so I know very well that this tool is very convenient for friends of data analysis, so I would like to share it with you.

We take the census data set adult.data in uci machine learning database as an example.

Dataset address:

Https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data

Normally, these functions are required when we get the data for EDA:

See what the data looks like:

Import numpy as npimport pandas as pdadult = pd.read_csv ('.. / adult.data') adult.head ()

Make a statistical description of the data:

Adult.describe ()

View variable information and missing information:

Adult.info ()

This is the easiest and fastest way to learn about a dataset. Of course, a deeper level of EDA must be demonstrated with the help of statistical graphics. The presentation based on tools such as scipy, matplotlib and seaborn will be skipped here.

Now we have pandas_profiling. The above process and all kinds of statistical correlation calculation and statistical drawing are all done by pandas_profiling package. Pandas_profiling installation, including pip, conda and source code installation.

Pip:

Pip install pandas-profilingpip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

Conda:

Conda install-c conda-forge pandas-profiling

Source:

Download the source code file first, and then decompress it to the same file directory as setup.py:

Python setup.py install

Looking at the basic usage of pandas_profiling, after reading the data in pandas, call the profile_report method directly on the data box to generate the EDA analysis report, and then use the to_file method to save as an .html file.

Profile = df.profile_report (title= "Census Dataset") profile.to_file (output_file=Path (". / census_report.html"))

Let's see how the report works. The pandas-profiling EDA report includes five aspects: overall data overview, variable exploration, correlation calculation, missing values and sampling display.

Overall data overview:

Variable exploration:

Correlation calculation:

Here are five kinds of correlation coefficients.

Missing values:

Pandas-profiling provides us with four forms of representation of missing values.

Data sample display:

These are the df.head () and df.tail () functions in pandas.

The above example refers to the code:

From pathlib import Pathimport pandas as pdimport numpy as npimport requestsimport pandas_profilingif _ _ name__ = = "_ _ main__": file_name = Path ("census_train.csv") if not file_name.exists (): data = requests.get ("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data") file_name.write_bytes (data.content) # Names based on https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names

Df = pd.read_csv (file_name, header=None, index_col=False, names= ["age", "workclass", "fnlwgt", "education", "education-num", "marital-status", "occupation", "relationship", "race" "sex", "capital-gain", "capital-loss", "hours-per-week", "native-country",],) # Prepare missing values df = df.replace ("\?", np.nan Regex=True) profile = df.profile_report (title= "Census Dataset") profile.to_file (output_file=Path (". / census_report.html"))

In addition, pandas_profiling also provides pycharm configuration methods:

After the configuration is completed, right-click pandas_profiling under external_tool in the project column on the left side of pycharm to generate EDA report directly. For more information, you can check the GitHub address of the project:

This is the answer to the question on how to use pandas_profiling to complete exploratory data analysis. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report