In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "what are the basic terms of Python". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
1. Two groups of basic Python terms that must be known
a. Variables and assignments
Python can directly define variable names and assign them. For example, when we write a = 4, the Python interpreter does two things:
Created an integer data with a value of 4 in memory
Create a variable named an in memory and point it to 4
Use a schematic diagram to show the key points of Python variables and assignments:
For example, in the following code, the function of "=" is to assign values, and Python automatically recognizes the data type:
The result is that the text "42" # is the result of the text "42" # below the text "42" # the result is the text "42" # the text "42" # is the result of the combination of 6 print ("the result is", "the result is") # and the result is the merging of the text "42" # below the text "42" #.
b. Data type
In the process of primary data analysis, three data types are common:
List list (built-in Python)
Dictionary dic (built-in Python)
DataFrame (data type under the toolkit pandas, which requires import pandas to call)
They are written as follows:
List (list):
# list liebiao= [1pc2.223dje 3dai 'Liu Qiangdong', 'Zhang Zetian', 'Jay Chou', 'Kunling', ['Weibo', 'bilibili', 'Douyin']
A list is an ordered collection in which elements can be any of the previously mentioned data formats and data types (integers, floats, lists, etc.) And the elements can be added in a specified order at any time, in the form of
# ist is a variable ordered table, so you can add elements to the end of list: liebiao.append ('thin') ptint (liebiao) # result 1 > [1, 2.223,-3, 'Liu Qiangdong', 'Zhang Zetian', 'Zhou Jielun', 'Kunling', ['Weibo', 'bilibili', 'Douyin'], 'thin'] # you can also insert elements into specified locations For example, where the index number is 5, insert the element "fat": liebiao.insert (5, 'fat') ptint (liebiao) # result 2 > [1, 2.223,-3, 'Liu Qiangdong', 'Zhang Zetian', 'fat', 'Zhou Jielun', 'Kunling', ['Weibo', 'bilibili', 'Douyin'], 'thin']
Dictionary (dict):
# Dictionary zidian= {'Liu Qiangdong': '46th June' Zhang Zetian': '36th minute' Jay Chou': '40`' Kun Ling': '26'}
Dictionaries use key-value (key-value) storage, unordered, with extremely fast lookup speed. Take the above dictionary as an example. If you want to know Jay Chou's age quickly, you can write:
Zidian ['Jay Chou'] > '40'
The order in which the dict is stored has nothing to do with the order in which the key is placed, that is to say, "Zhang Zetian" is not behind "Liu Qiangdong".
DataFrame:
DataFrame can be understood simply as a table format in excel. After importing the pandas package, both dictionaries and lists can be converted to DataFrame. Taking the above dictionary as an example, the conversion to DataFrame is as follows:
Import pandas as pd
Df=pd.DataFrame.from_dict (zidian,orient='index',columns= ['age']) # notice that the D and F of DataFrame are uppercase
Df=df.reset_index () .rename (columns= {'index':'name'}) # add a field name to the name
Like excel, any column or row of DataFrame can be selected separately for analysis.
The above three data types are the most commonly used in python data analysis, so the basic syntax is over, and then you can start writing some functions to calculate the data.
two。 Learning cyclic functions from Python reptiles
Having mastered the above basic grammar concepts, we can begin to learn some interesting functions. Let's take the inescapable traversal url in the crawler as an example to talk about the use of the most difficult loop function for:
A.for function
The for function is a common circular function. First understand the purpose of the for function from the simple code:
Zidian= {'Liu Qiangdong': '46th Qiangdong' Zhang Zetian': '36th Zhe Zhe Jay Zhou': '40Zhi Zhoujielun': '26'} for key in zidian: print (key) > Liu Qiangdong Zhang Zetian Zhou Jielun Kunling
Because the storage of dict is not arranged in the order of list, the order of iterated results is probably not the same every time. By default, dict iterates over key. If you want to iterate over value, you can use for value in d.values (), and if you want to # iterate key and value at the same time, you can use for k, v in d.items ()
As you can see, the names in the dictionary have been printed out one by one. The for function is used to traverse the data. Mastering the for function can be said to be a real introduction to the Python function.
b. Crawlers and loops
The for function is often used in writing Python crawlers, because crawlers often need to traverse every web page to get information, so it is critical to build complete and correct links to web pages. Take a box office data network as an example, his website message looks like this:
The weekly box office json data address of the website can be found through the bag grabbing tool at http://www.cbooo.cn/BoxOffice/getWeekInfoData?sdate=20190114
Take a closer look at the box office data URL (url) of the site on different dates, only the later dates are changing. If you visit different URLs (url), you can see the box office data on different dates:
What we need to do is to traverse the URL under each date and climb down the data with Python code. At this point, the for function comes in handy. With it, we can quickly generate multiple qualified URLs:
Import pandas as pd url_df = pd.DataFrame ({'urls': [' http://www.cbooo.cn/BoxOffice/getWeekInfoData?sdate=' for i in range (5)], 'date': pd.date_range (20190114 Freq =' Wmurmonious dairies = 5)}) 'generate the same part of the URL 5 times, and use the time series function of pandas to generate the corresponding date of 5 weekends. Several data types provided in the * * section are used: range (5) belongs to list, 'urls': [] belongs to dictionary, pd.dataframe belongs to dataframe' 'url_df [' urls'] = url_df ['urls'] + url_df [' date'] .astype ('str')
Slide the slider to see the complete code and the comments in the middle.
To make it easier to understand, I drew a schematic diagram of the traversal of the for function:
The subsequent crawling process is omitted here, and the relevant crawler code is shown at the end of the article. We used the crawler to crawl 5800 + pieces of data, including 20 fields, including the weekly box office, cumulative box office, cinema attendance, average ticket price per game, and changes in the number of shows during the 11-year period from January 2008 to February 2019.
How does 3.Python implement data analysis?
Besides crawlers, analyzing data is also one of the important uses of Python. How on earth can Python achieve what Excel can do? can Python achieve what Excel can't do? Using the movie box office data, let's give an example to illustrate:
A.Python analysis
After data collection and import, the selection of fields for preliminary analysis can be said to be the only way for data analysis. With the help of the Dataframe data format, this step becomes simple.
For example, when we want to see which movies are ranked in the weekly box office, we can use the method commonly used in the pandas tool library to filter out all the weekly box office data, and keep the weekly box office data in the same movie for analysis:
Import pandas as pd data = pd.read_csv ('Chinese box office data crawling test 20071-20192.csvault gravity engineered box office python') data [data ['average number of seats'] > 20] ['movie name'] # calculate weekly box office * * results over time and import data And select movies with an average attendance of more than 20 as valid data dataTop1_week = data [data ['ranking'] = = 1] [['movie name', 'weekly box office']] # take out all the data with weekly box office rankings of *. And keep the "movie name" and "weekly box office" two columns of data dataTop1_week = dataTop1_week.groupby ('movie name'). Max () ['weekly box office']. Reset_index () # groups the data with "movie name", and the weekly box office of the same movie is retained. Delete other data dataTop1_week = dataTop1_week.sort_values (by=' weekly box office', ascending=False) # sort the data in descending order according to "weekly box office" dataTop1_week.index = dataTop1_week ['movie name'] del dataTop1_week ['movie name'] # sort out the index column to change it to the movie name, and delete the original movie name column dataTop1_week # to view the data
9 lines of code, we completed the PivotTable in Excel, drag, sort and other mouse click actions. * use the visualization package matplotlib in Python to quickly create the image:
b. Functional analysis
The above is a simple statistical analysis process. Next, let's talk about what the basic functions of Excel can't do-- custom function efficiency. Observation data can be found that the data recorded the weekly box office and the total box office ranking, then just calculated the weekly box office ranking code, can it be reused to do a total box office analysis?
Of course, as long as you use the def function and the code you just wrote to create a custom function and explain the function rules:
Def pypic (pf): # define a pypic function, the variable is pf dataTop1_sum = data [['movie name', pf]] # take out the source data, column names are "movie name" and pf two columns of data dataTop1_sum = dataTop1_sum.groupby ('movie name'). Max () [pf] .reset _ index () # use "movie name" to group data The pf box office of the same movie list is retained, and the other data is deleted dataTop1_sum = dataTop1_sum.sort_values (by=pf,ascending=False) # sort the data by pf in descending order dataTop1_sum.index = dataTop1_sum ['movie name'] del dataTop1_sum ['movie name'] # sort out the index column to make it the movie name And delete the original movie name listed in dataTop1_sum [: 20] .iloc [::-1] .plot.barh (figsize = (6 orange' 10), color = 'orange') name=pf+'top20 Analysis' plt.title (name) # named according to function variables
After defining the function, batch plot so easy:
By learning the construction of functions, a data analyst can really bid farewell to the mouse click mode of Excel and enter the field of efficient analysis.
This is the end of the content of "what are the basic terms of Python". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.