In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "what are the basic skills for the use of Python". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
1. Read_csv
Everyone knows the order. But if you want to read a large amount of data, try adding this parameter: nrows = 5 so that only a small portion of the table is read before the entire table is actually loaded. You can then avoid errors by choosing the wrong delimiter (it is not always separated by a comma).
(alternatively, you can use the 'head' command in linux to check the first five lines of any text file, for example: head-c 5 data.txt)
Then, you can use df.columns.tolist () to extract all the columns in the list, and then add usecols = ['c1girls. Parameter to load the required columns. In addition, if you know the data types of several specific columns, you can add the parameter dtype = {'c1, column, and column, etc., so that the data loads faster. Another advantage of this parameter is that if you have a column that contains both a string and a number, it is a good choice to declare its type as a string so that you can try to use this column as a key to merge tables without error.
2. Select_dtypes
If data preprocessing must be done in Python, this command can save some time. After reading the table, the default data type for each column can be bool,int64,float64,object,category,timedelta64 or datetime64. You can check it first.
Df.dtypes.value_counts ()
Command to understand all possible data types of the data frame, and then execute the
Df.select_dtypes (include = ['float64','int64'])
Select sub-data frames that have only digital features.
3. Copy
This is an important order. If you execute the following command:
Import pandas as pd df1 = pd.DataFrame ({'aqu: [0je 0je 0], 'baked: [1rect 1]}) df2 = df1 df2 [' a'] = df2 ['a'] + 1 df1.head ()
You will find that df1 has changed. This is because instead of copying the value of df1 and assigning it to df2, df2 = df1 sets a pointer to df1. Therefore, any change in df2 will lead to a change in df1. To solve this problem, you can:
Df2 = df1.copy () br
Or
From copy import deepcopy df2 = deepcopy (df1)
4. Map
This is a command that can perform simple data conversion. First define a dictionary where 'keys' is the old value and' values' is the new value.
Level_map = {1: 'high', 2:' medium', 3: 'low'} df [' centering level'] = df ['c'] .map (level_map)
To give a few examples: True,False is 1J 0 (for modeling); definition level; user-defined lexical coding.
5. Apply or not apply?
The apply function is sometimes useful if we want to create a new column and take other columns as input.
Def rule (x, y): if x = = 'high' and y > 10: return 1 else: return 0 df = pd.DataFrame ({' C1: ['high',' high', 'low',' low'],'c2: [0,23,17,4]}) df ['new'] = df.apply (lambda x: rule (x [' c1'], x [[c2']), axis = 1) df.head ()
In the above code, we define a function with two input variables and apply it to columns'C1 'and' c2 'using the apply function.
But the problem with the apply function is that it is sometimes too slow. If you want to calculate the * * values of two columns "C1" and "c2", you can:
Df ['maximum'] = df.apply (lambda x: max (x [' c1'], x ['c2']), axis = 1)
But you will find that it is much slower than this command:
Df ['maximum'] = df [[' C1']] .max (axis = 1)
Note: if you can use other built-in functions to do the same thing (which are usually faster), do not use apply. For example, if you want to round column'c' to an integer, execute round (df ['c'], 0) instead of using the apply function:
Df.apply (lambda x: round (x ['c'], 0), axis = 1)
6. Value counts
This is a command to check the distribution of values. For example, if you want to check the possible values and frequency of each value in the "c" column, you can do the following
Df ['c'] .value_counts ()
It has some useful techniques / parameters:
A. normalize = True: if you want to check the frequency instead of counting. B. dropna = False: if you want to count the missing values contained in the data. C. df ['c']. Value_counts (). Reset_index (): if you want to convert the stats table to pandas data frames and operate. D. df ['c']. Value_counts (). Reset_index (). Sort_values (by='index'): displays statistics sorted by value rather than by count.
7. The number of missing values
When building a model, you may want to exclude rows with many or all missing values. You can use .isnull () and .sum () to calculate the number of missing values in a specified column.
Import pandas as pd import numpy as np df = pd.DataFrame ({'id': [1dje 2jin3],' c1jewelry: [0Lie0Eng np.nan], 'c2recording: [np.nan,1,1]}) dfdf = df [[' id', 'c1records,' c2']] df ['num_nulls'] = df [[' c1mom, 'c2']] .isnull () .sum (axis=1) df.head ()
8. Select a row with a specific ID
In SQL, we can use SELECT * FROM... WHERE ID ('A001', 'C022') To get a record with a specific ID. If you want to do the same thing with Pandas, you can
Dfdf_filter = df ['ID'] .isin ([' A001''C022'']) Df[df _ filter]
9. Percentile groups
You have a number column, and you want to classify the values in this column into groups, for example, the first 5% of the column is divided into group 1, the first 5-20% is divided into group 2, and the first 20% is divided into group 3. 50% of the columns are divided into group 4. Of course, you can do it with pandas.cut, but here's another option:
Import numpy as np cut_points = [np.percentile (df ['c'], I) for i in [50,80,95]] df ['group'] = 1 for i in range (3): df [' group'] = df ['group'] + (df [' c'] < cut_ points [I]) # or
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.