In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article to share with you is about how to use a line of Python for data collection exploration, Xiaobian feel quite practical, so share with you to learn, I hope you can read this article after some gains, not much to say, follow Xiaobian to see it.
Simple Pandas Path
Anyone working with Python data will be familiar with Pandas packages. Pandas are go-to packages for most row and column format data. If you don't have Pandas, make sure you install them in your terminal using pip install:
pip install pandas
Now, let's see what the default methods in the Pandas package can do:
Here's what I'm writing to novices who don't know what's going on:
Any Pandas data frame has a. describe () method that returns the output above. However, categorical variables are not noted in this approach. In the example above, the " method " column is omitted entirely from the output.
Let's see if we can solve this problem.
Pandas analysis
What if I told you that Python could produce the following statistics in just three lines? But in fact, if you don't count imports, only one line is enough.
Key Points: Type, Unique Value, Missing Value
Quantile statistics: e.g. minimum, Q1, median, Q3, maximum, range, quartile range
Descriptive statistics: e.g. mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
usual value
histogram
Correlations of Spearman, Pearson and Kendall matrices with high correlation variables highlighted
Missing Value Matrix, Count, Heat Map and Missing Value Tree
(Feature list directly from Pandas Profiling GitHub)
Well, we can use Pandas Profiling package! To install the Pandas Profiling package, simply use pip install in the terminal:
pip install pandas_profiling
Experienced data analysts may scoff at the looseness of the data or even at first glance at it as "flashy," but it's certainly useful to quickly get a first-hand impression of the data:
The first thing we see is an overview, which provides some very high-level statistics about the data and variables, as well as warnings about high correlations between variables, high skewness, etc.
But that's nothing. Scrolling down we will see that the report has multiple sections, and simply showing the output of this 1-line program in pictures is not enough to fully present them, so I made a gif:
I strongly recommend that you explore the package's features for yourself. After all, this is just one line of code, and this package may be useful for future data analysis.
import pandas as pdimport pandas_profilingpd.read_csv ('https://raw.githuusercontent.com/mwaskom/seaborn-data/master/planets.csv ').profile_report() The above is how to use a line of Python for data collection and exploration. Xiaobian believes that some knowledge points may be seen or used in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.