Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the three tips big data deals with?

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, I would like to talk to you about what are the three tips for dealing with big data, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can gain something according to this article.

Data processing is ubiquitous, master common skills, get twice the result with half the effort.

This series uses Pandas to carry out data processing and analysis, and summarizes the commonly used and useful data analysis skills.

The version of Pandas I use is as follows, which is also imported into the Pandas library by the way.

> import pandas as pd

> pd.__version__

'0.25.1'

The name of the dataset used today: IMDB-Movie-Data, taken from Kaggle. The download link for Baidu network disk is as follows:

Link: https://pan.baidu.com/s/15u7Hf2y5dSFwek2vA1-zjg extraction code: bvfx

Make sure that the interpreter and the dataset are in the same directory before you begin:

> import os

> os.chdir ('DJUR Accord source _ source _ dataset') # this is the directory where my dataset is located

> os.listdir () # confirm that the IMDB-Movie-Data dataset already exists in this directory

['drinksbycountry.csv',' IMDB-Movie-Data.csv', 'movietweetings',' titanic_eda_data.csv', 'titanic_train_data.csv']

After the preparatory work is in place, we will officially begin the journey of data processing skills.

1 Pandas removes a column

Import data

> > df = pd.read_csv ("IMDB-Movie-Data.csv")

> df.head (1) # Import and display the first line

Rank Title Genre... Votes Revenue (Millions) Metascore

0 1 Guardians of the Galaxy Action,Adventure,Sci-Fi... 757074 333.13 76.0

[1 rows x 12 columns]

Use the pop method to remove the specified column:

> meta = df.pop ("Title") .to_frame () # remove Title column

Confirm that it has been removed:

> df.head (1) # df becomes 11 columns

Rank Genre... Revenue (Millions) Metascore

0 1 Action,Adventure,Sci-Fi... 333.13 76.0

[1 rows x 11 columns]

2 count the number of words in the title

After pop, you get meta, which displays the first three lines of meta:

> > meta.head (3)

Title

0 Guardians of the Galaxy

1 Prometheus

2 Split

The title is made up of words, separated by spaces.

# .str.count ("") + 1 to get the number of words

> meta ["words_count"] = meta ["Title"] .str.count (") + 1

The meta.head (3) # words_count column represents the number of words.

Title words_count

0 Guardians of the Galaxy 4

1 Prometheus 1

2 Split 1

3 Genre frequency statistics

The frequency of the movie Genre is counted below.

> vc = df ["Genre"] .value_counts ()

The Top5 of the movie Genre is shown below. The highest frequency is the Action,Adventure,Sci-Fi class with 50 occurrences, followed by the Drama class with 48 occurrences:

> > vc.head ()

Action,Adventure,Sci-Fi 50

Drama 48

Comedy,Drama,Romance 35

Comedy 32

Drama,Romance 31

Name: Genre, dtype: int64

Show the pie chart of Top5:

> import matplotlib.pyplot as plt

> vc [: 5] .plot (kind='pie')

> > plt.show ()

After reading the above, do you have any further understanding of big data's three tips? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report