In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1. Add Age, Fullname field
Esproc:
A1=now () 2=file ("C:\\ Users\\ Sean\\ Desktop\\ esproc_vs_python\\ EMPLOYEE.txt") 3=A2.import@t () 4=A3.derive (age (BIRTHDAY): Age,NAME+ "" + SURNAME:Fullname) 5=interval@ms (A1 now ())
A4: we use T to represent the order table. T.derive () means to add fields. Here the age is calculated using age (date) as the Age field. Use NAME,SURNAME to get Fullname.
A5: calculate the operation time (interval: calculate the interval. @ ms means in milliseconds)
Python:
Import time
Import pandas as pd
Import datetime
S = time.time ()
Data = pd.read_csv ("C:/Users/Sean/Desktop/esproc_vs_python/EMPLOYEE.txt", sep= "\ t")
Today = datetime.datetime.today () .year
Data ["Age"] = today-pd.to_datetime (data ["BIRTHDAY"]) .dt.year
Data ["Fullname"] = data ["NAME"] + data ["SURNAME"]
Print (data)
E = time.time ()
Print (eMurs)
Calculate the number of years from the value (date) of the BIETHDAY field to today as the age field. Use NAME+SURNAME as the Fullname field
Result
Esproc:
Python:
Time-consuming esproc0.008python0.0202. Extract the required records or fields (the first 3 fields, 3-10 records)
Esproc:
A1=now () 2=file ("C:\\ Users\\ Sean\\ Desktop\\ esproc_vs_python\\ EMPLOYEE.txt") 3=A2.import@t () 4=A3.new (# 1jiggle 2 5=interval@ms 3) .to (3:10) 5=interval@ms (A1 focus now ())
A4:T.new () represents a new ordinal table. Here, the first and second fields are used as the fields of the new table. T.A, which indicates the line number contained in the fetch sequence.
Python:
Import time
Import pandas as pd
Import datetime
S = time.time ()
Data = pd.read_csv ("C:/Users/Sean/Desktop/esproc_vs_python/EMPLOYEE.txt", sep= "\ t")
Data = data.iloc [2JV 10 Jing Ji 3]
Print (data)
E = time.time ()
Print (eMurs)
Use the df.iloc [] slice to get 3 records, the first three fields (the field number and record number of dataframe are counted from 0).
Results:
Esproc:
Python:
Time-consuming esproc0.008python0.0103. Filter eligible records
Esproc:
A1=now () 2=file ("C:\\ Users\\ Sean\\ Desktop\\ esproc_vs_python\\ EMPLOYEE.txt") 3=A2.import@t () 4=A3.select (STATE== "California") 5=interval@ms (A1 California ())
A4:T.select () filters the records that meet the criteria. Here are the records that filter the STATE== "California" as true.
Python:
Import time
Import pandas as pd
Import datetime
S = time.time ()
Data = pd.read_csv ("C:/Users/Sean/Desktop/esproc_vs_python/EMPLOYEE.txt", sep= "\ t")
Data = data [data ['STATE'] = = "California"]
Print (data)
E = time.time ()
Print (eMurs)
Take out the record of data ['STATE'] = = "California"
Results:
Esproc:
Python:
Time-consuming esproc0.007python0.0284. Calculate the common value of a field
A1=now () 2=file ("C:\\ Users\\ Sean\\ Desktop\\ esproc_vs_python\\ EMPLOYEE.txt") 3=A2.import@t () 4=A3.min (SALARY) 5=A3.max (SALARY) 6=A3.avg (SALARY) 7=A3.sum (SALARY) 8=A3. (SALARY). Median () 9=A3. (float (SALARY). Variance () 10=interval@ms (A1 now ())
A4:T.min () calculates the minimum of the field
A5:T.max () calculates the maximum value of the field
A6:T.avg () calculates the field average
A7:T.sum () calculates the sum of fields
A8: calculates the median of the field. The A.median (KGRV n) function returns the intermediate bit value if the sequence length is odd, or the average of the middle two values if the sequence length is even.
A9:T.variance () calculates the field variance.
Python
S = time.time ()
Data = pd.read_csv ("C:/Users/Sean/Desktop/esproc_vs_python/EMPLOYEE.txt", sep= "\ t")
Min = data ["SALARY"] .min ()
Max = data ["SALARY"] .max ()
Avg = data ["SALARY"] .mean ()
Sum = data ["SALARY"] .sum ()
Median = data ["SALARY"] .median ()
Var = data ["SALARY"] .var ()
Print (min,max,avg,sum,median,var)
E = time.time ()
Print (eMurs)
Df [field name] means to get the field. Min (), max (), mean (), sum (), median (), var () calculate the minimum, maximum, average, sum, median, and variance, respectively.
Result
The minimum value of commonly used calculation esprocpython is 30003000.
Maximum 1600016000
Average 73957395.0
Total 36975003697500
Median 7000.07000.0
Variance 53244755335145.29 the difference between population variance and sample variance takes 0.0040.007
5. Statistics on the number of men and women in each department
Esproc:
A1=now () 2=file ("C:\\ Users\\ Sean\\ Desktop\\ esproc_vs_python\\ EMPLOYEE.txt") 3=A2.import@t () 4=A3.groups (DEPT:Dept;count (GENDER== "M"): Mcount, count (GENDER== "F"): Fcount) 5=interval@ms (A1 GENDER== now ())
A4:T.groups () indicates grouping by DEPT, calculates the value of GENDER== "M" or GENDER== "F", and gets the number of male and female employees in each department.
Python
S = time.time ()
Data = pd.read_csv ("C:/Users/Sean/Desktop/esproc_vs_python/EMPLOYEE.txt", sep= "\ t")
Group = data.groupby (['DEPT','GENDER']) .size ()
Print (group)
E = time.time ()
Print (eMurs)
Take a slice of GENDER=='M' or GENDER=='F' and use DEPT to get the grouping as DEPT through the goupby () function. Finally, the result is obtained by using the size () function.
Results:
Esproc:
Python:
Time-consuming esproc0.004python0.0086. Statistics on the average age of male and female employees
Esproc:
A1=now () 2=file ("C:\\ Users\\ Sean\\ Desktop\\ esproc_vs_python\\ EMPLOYEE.txt") 3=A2.import@t () 4=A3.groups (GENDER;avg (age (BIRTHDAY)): Age) 5=interval@ms (A1 now ())
A4:T.groups () is used to group, avg () calculates the average, and age () calculates the time interval based on the date.
Python
S = time.time ()
Data = pd.read_csv ("C:/Users/Sean/Desktop/esproc_vs_python/EMPLOYEE.txt", sep= "\ t")
Data ["Age"] = today-pd.to_datetime (data ["BIRTHDAY"]) .dt.year
Avg_age = data.groupby ('GENDER') [' Age'] .mean ()
Print (avg_age)
E = time.time ()
Print (eMurs)
The Age field is calculated. Then the groupby () function is grouped by GENDER, and finally the average value is obtained through the mean () function.
Results:
Esproc:
Python:
Time-consuming esproc0.005python0.0087. Calculate the maximum number of employees whose salary is higher than that of the previous employee
Esproc:
A1=now () 2=file ("C:\\ Users\\ Sean\\ Desktop\\ esproc_vs_python\\ EMPLOYEE.txt") 3=A2.import@t () 4 a=if SALARY [- 1], A3.max (SALARY > SALARY [- 1])) 5=interval@ms (A1 now ())
A4:if (condition,x1,x2) means that if the condition is true, the value of the if statement is x1, otherwise the value is x2, where a=a+1 is calculated if SALARY earns more than the previous employee. The result is A3. (a), where an is constantly changing with the if statement. Finally, we get a sequence, and the max () function gets the maximum value.
Python
S = time.time ()
Data = pd.read_csv ("C:/Users/Sean/Desktop/esproc_vs_python/EMPLOYEE.txt", sep= "\ t")
0; 0; 0; 0
For i in data ['SALARY'] .shift (0) > data [' SALARY'] .shift (1):
Axiom 0 if i==False else axi1
M = an if m
< a else m print(m) e = time.time() print(e-s) df.shift(0)表示当前记录,df.shift(n)表示前面第n条记录,data['SALARY'].shift(0)>Data ['SALARY']. Shift (1) gets the series structure of pandas. Loop if False indicates that the current record is less than or equal to the previous record, set a to 0, and add 1 if it is true. The role of m: when mA3.sort (rand ()) .to (5+rand (6)) .field (A4MagneNull) 5=interval@ms (A1 Magi now ())
A4:T.fno () gets the number of fields in the ordinal table.
B4:T.field (fjournal x) assigns the members of the x sequence in turn to the field value of the F field in An or the value of the string parameter F. FA3.field ("EID", to (A3.len ()
5for 2mai A3.fno () = A3.select (! ~ .field (A5)) 6
= A3\ B57
> B5.run (~ .field (A5MagneB6 (rand (B6.len ()) + 1) .field (A5)) 8=interval@ms (A1JiNow ())
A4:T.field (FCoA) assigns the members of the A sequence to the field value of the F field in T or the value of the string parameter F in turn.
B5: filter records with field null
B6: difference set to get records that are not null
B7: here you need to pay special attention to the field () function, where r.field (F) gets the field value of the F field of the record or the value of the string parameter F. R.field (FQuery X) modifies that the field value of the F field or string parameter F in record r is x. A3.select (# ${A5} = = null) takes a sequence of records whose field is null in A3 and loops through the run () function to modify each record in the sequence.
Python:
S = time.time ()
Data = pd.read_csv ("C:/Users/Sean/Desktop/esproc_vs_python/EMPLOYEE_nan.txt", sep= "\ t")
Data ['EID'] = pd.Series ([i for i in range (1) data + 1])
For col in data.columns:
Nonan_list = list (data [col] [~ pd.isna (data [col])])
Fill_list = [nonan_ list. Randint (0 Len (nonan_list))] for i in range (len (DataCol] [pd.isna (DataCol])]
Data [col] [pd.isna (data [col])] = fill_list
E = time.time ()
Print (eMurs)
Change the value of the field named EID to an incremental sequence. It is generated here with pd.Series ().
Cycle through all fields. Df [col] [pd.isna (df)] gets the value that contains nan in df. ~ means no. Nonan_list indicates that the current column does not contain a list consisting of all the values of nan. Fill_list represents the randomly generated list of the value to be populated with nan.
Assign the value in fill_list to the record that contains nan.
Results:
Esproc:
Python
The time-consuming esproc0.008python0.14411.EID field is directly replaced with a sequence, the SALARY field fills the missing value with the mean, and the missing values of other fields are filled with one of the random values.
Esproc:
AB1=now ()
2=file ("C:\\ Users\\ Sean\ Desktop\\ esproc_vs_python\\ EMPLOYEE_nan.txt")
3=A2.import@t ()
4 > A3.field ("EID", to (1meme A3.len ()
5for 2mai A3.fno ()-1=A3.group (! ~ .field (A5)) 6
= B5 (1). (~ .field (A5)) 7
> B5 (2) .run (.field (A5 B6.len B6 (rand (B6.len ()) + 1)) 8=A3.avg (SALARY)
9=A3.select (! SALARY) .run (~ .field ("SALARY", A8))
10=interval@ms (A1 focus now ())
In the above example, the operation of B5 and B6 causes the sequence to be traversed twice, which is improved here.
B5:A.group (xi) groups sequences / permutations into equivalent groups according to one or more fields / expressions, and the result is a sequence of group sets. Here, the order table is divided into two groups, the first group is that the field does not contain null, and the second group contains null.
B6: get the deduplicated field value of this field
B7: here you need to pay special attention to the field () function, where r.field (F) gets the field value of the F field of the record or the value of the string parameter F. R.field (FQuery X) modifies that the field value of the F field or string parameter F in record r is x. R.run (xi, …) The expression x is evaluated for record r, and record r is finally returned This function is usually used to modify the field value of r
B9: similar to the principle of B7, the field () function is used to modify the field value of SALARY to the average calculated in A8.
Python:
S = time.time ()
Data = pd.read_csv ("C:/Users/Sean/Desktop/esproc_vs_python/EMPLOYEE_nan.txt", sep= "\ t")
Data ['EID'] = pd.Series ([i for i in range (1) data + 1])
For col in list (data.columns) [1 Murray 1]:
Nonan_list = list (data [col] [~ pd.isna (data [col])])
Fill_list = [nonan_ list. Randint (0 Len (nonan_list))] for i in range (len (DataCol] [pd.isna (DataCol])]
Data [col] [pd.isna (data [col])] = fill_list
Data ['SALARY'] .fillna (data [' SALARY'] .mean (), inplace=True)
Print (data.loc [180 190])
Print (eMurs)
Change the value of the field named EID to an incremental sequence. It is generated here with pd.Series ().
Cycle through all fields from the first to the last. Df [col] [pd.isna (df)] gets the value that contains nan in df. ~ means no. Nonan_list indicates that the current column does not contain a list consisting of all the values of nan. Fill_list represents the randomly generated list of the value to be populated with nan.
Assign the value in fill_list to the record that contains nan.
Df.fillna (DF [s]. Mean ()) indicates that the missing value is filled with the average of the field s.
Results:
Esproc:
Python
Time-consuming esproc0.006python0.165
Summary: in this section, we use 11 examples to simply calculate the data. Both esproc and python use more functions, as well as some relatively complex combined applications. This has to be said that one of the shortcomings of esproc at this stage is that there are too few users to consult data and use cases. But in terms of describing efficiency and execution efficiency, the advantages of esproc are too obvious, so we should use esproc more to improve work efficiency, and at the same time can improve the shortcomings of esproc. The functions in esproc are very powerful and need to be used constantly to fully understand the usage of functions to the extent that practice makes perfect and finally proficient.
EMPLOYEE.txt
EMPLOYEE_nan.txt
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.