In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I would like to talk to you about what are the three tips for dealing with big data, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can gain something according to this article.
Data processing is ubiquitous, master common skills, get twice the result with half the effort.
This series uses Pandas to carry out data processing and analysis, and summarizes the commonly used and useful data analysis skills.
The version of Pandas I use is as follows, which is also imported into the Pandas library by the way.
> import pandas as pd
> pd.__version__
'0.25.1'
The name of the dataset used today: IMDB-Movie-Data, taken from Kaggle. The download link for Baidu network disk is as follows:
Link: https://pan.baidu.com/s/15u7Hf2y5dSFwek2vA1-zjg extraction code: bvfx
Make sure that the interpreter and the dataset are in the same directory before you begin:
> import os
> os.chdir ('DJUR Accord source _ source _ dataset') # this is the directory where my dataset is located
> os.listdir () # confirm that the IMDB-Movie-Data dataset already exists in this directory
['drinksbycountry.csv',' IMDB-Movie-Data.csv', 'movietweetings',' titanic_eda_data.csv', 'titanic_train_data.csv']
After the preparatory work is in place, we will officially begin the journey of data processing skills.
1 Pandas removes a column
Import data
> > df = pd.read_csv ("IMDB-Movie-Data.csv")
> df.head (1) # Import and display the first line
Rank Title Genre... Votes Revenue (Millions) Metascore
0 1 Guardians of the Galaxy Action,Adventure,Sci-Fi... 757074 333.13 76.0
[1 rows x 12 columns]
Use the pop method to remove the specified column:
> meta = df.pop ("Title") .to_frame () # remove Title column
Confirm that it has been removed:
> df.head (1) # df becomes 11 columns
Rank Genre... Revenue (Millions) Metascore
0 1 Action,Adventure,Sci-Fi... 333.13 76.0
[1 rows x 11 columns]
2 count the number of words in the title
After pop, you get meta, which displays the first three lines of meta:
> > meta.head (3)
Title
0 Guardians of the Galaxy
1 Prometheus
2 Split
The title is made up of words, separated by spaces.
# .str.count ("") + 1 to get the number of words
> meta ["words_count"] = meta ["Title"] .str.count (") + 1
The meta.head (3) # words_count column represents the number of words.
Title words_count
0 Guardians of the Galaxy 4
1 Prometheus 1
2 Split 1
3 Genre frequency statistics
The frequency of the movie Genre is counted below.
> vc = df ["Genre"] .value_counts ()
The Top5 of the movie Genre is shown below. The highest frequency is the Action,Adventure,Sci-Fi class with 50 occurrences, followed by the Drama class with 48 occurrences:
> > vc.head ()
Action,Adventure,Sci-Fi 50
Drama 48
Comedy,Drama,Romance 35
Comedy 32
Drama,Romance 31
Name: Genre, dtype: int64
Show the pie chart of Top5:
> import matplotlib.pyplot as plt
> vc [: 5] .plot (kind='pie')
> > plt.show ()
After reading the above, do you have any further understanding of big data's three tips? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.