How to use Dataframe in R language 07/13 Update SLTechnology News&Howtos

How to use Dataframe in R language

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to use Dataframe in R language". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to use Dataframe in R language".

First of all, the creation function of the data box is data.frame (). Refer to the help documentation of the R language, let's take a look at the specific use of data.frame ():

Usagedata.frame (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, fix.empty.names = TRUE, stringsAsFactors = default.stringsAsFactors ()) default.stringsAsFactors () Arguments... : these arguments are of either the form value or tag = value. Component names are created based on the tag (if present) or the deparsed argument itself.row.names: NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.

Of course, there are a lot of specific uses of parameters. I won't repeat them here, but the first two are mainly used. First of all, "." Represents the table data, that is, to constitute the data body of the data box, row.names () is the row name of the data box, so since the data box is equivalent to a table in R language, there should be both row names and column names, so how is the column name given? We know that many data processing software and algorithms are carried out in the column units of the data. When we built the matrix, we also byrow=FALSE by default, and the column names were determined at the beginning of the creation of the data box. See the following code for details:

I want to create a data box called "mydataframe", first determine which columns are in the data box, and then call the function data.frame () function

> C1 C2 C3 C4 C5 mydataframe mydataframe C1 C2 C3 C4 C5R1 15 9 13 17R2 2 6 10 14 18R3 3 7 11 15 19R4 4 8 12 16 20

Thus it can be seen that a data box is a data structure that splices existing columns into a table. Careful friends will find that this data box looks so much the same as the matrix we talked about in the previous section! Review the matrix creation in the previous section:

> mydata cnames rnames myarray myarray C1 C2 C3 C4 C5R1 15 9 13 17R2 2 6 10 14 18R3 3 7 11 15 19R4 4 8 12 16 20

Indeed, there is no difference in appearance, but the elements in the matrix must be consistent, and the data box can be a collection of various types of data. This collection is not an unconditional messy collection, but in columns. The element types of different columns can be different, but the element types of the same column must be the same. So, matrices can be seen as special data box types, so what's the point of doing so? In data statistics, we need to have various types of data. For example, a simple report card contains character elements such as "name", "student number" and "subject", as well as numerical elements such as "scores" and Boolean elements such as "pass". Therefore, in a broad sense, dataframe is more universal, and matrices are mostly used in mathematical calculation. In a nutshell, let's actually create a data box and then demonstrate how it works:

Names StudentID subjects scores Result Result StudentID names subjects scores1 2014 Xiaoming English 872 2015 Xiaolan English 983 2016 Xiaolan English 93

As you can see, when no row name is assigned to the data box, the system gives each row a row number starting at 1 by default, which is similar to the Excel table. As usual, let's first learn the basic operation of the dataframe data type

Access to data box elements: since matrices are special data boxes, should matrix elements be accessed in the same way as dataframe? No, we know that the data box is in rows or columns (rows and columns can be transposed), so elements can only be accessed as a whole row or column. That is, when dataframe [1,] (accesses the first row) and dataframe [, 1] (accesses the first column) accesses the column in this way, the return values are arranged by row. Access columns can also access the first column directly using dataframe (1), or dataframe (column name) to access the specified column. You can also access several columns successively, as detailed in the code:

> Result [1,] # visit the first row StudentID names subjects scores1 2014 Xiaoming English 87 > Result [, 1] # visit the first column [1] 2014 2015 2016Levels: 2014 2015 2016 > Result [1] # visit the first column StudentID1 20142 20153 2016 > Result ["names"] # visit the specified column names1 Xiaoming 2 Xiao Hong 3 Xiaolan > Result [1:3 ] # visit 1-3 lines StudentID names subjects scores1 2014 Xiaoming English 872 2015 Xiaolan English 983 2016 Xiaolan English 93 > Result [1:3] # visit 1-3 columns StudentID names subjects1 2014 Xiaoming English 2 2015 Xiaolan English 3 2016 Xiaolan English > Result [c (1pr 3),] # access only 1 line Pay attention to writing c () StudentID names subjects scores1 2014 Xiaoming English 873 2016 Xiaolan English 93 > Result [c (1pr 4)] # only visit 4 columns of one StudentID scores1 2014 2015 983 2016 93 > Result [c ("names", "scores")] # only access names and scores columns Pay attention to writing c () names scores1 Xiaoming 872 Xiaohong 983 Xiaolan 93

Can be obtained from the above: for data box operations, must be in vector units, using c () or list (), through the above understanding, we found that ordinary access must be with row name and column name, which sometimes brings us unnecessary trouble, such as I want to calculate the average score, with the column name Score will bring us some confusion So what are the ways to access database elements without row or column names?

Method 1: use attach and detach functions, for example, to print all names, then you can write as follows:

> attach (Result) The following objects are masked _ by_. GlobalEnv: names, scores, StudentID, subjectsThe following objects are masked from Result (pos = 3): names, scores, StudentID, subjects > name score detach (Result) > name [1] "Xiaoming", "Xiaoxing"Xiaolan" > score [1] 87 98 93 > mean (score) [1] 92.66667

Method 2: use with function

> with (Result, {score score [1] 87 98 93)

I talked about the creation and reading of dataframe. What if I need to add or delete a column?

> Result$age Result StudentID names subjects scores age1 2014 Xiaoming English 87 122 2015 Xiaolan English 98 143 2016 Xiaolan English 93 13 > Result2 Result2 StudentID subjects scores age1 2014 English 87 122 2015 English 98 143 2016 English 93 13

What if I need to ask for information about students whose grades are equal to 98?

> Result [which (Result$scores==98),] StudentID names subjects scores age2 2015 Xiao Hong English 98 14

As mentioned above, matrices and data boxes are also two different data types. We know that data types can be converted to each other. We can use is.*** () to determine whether a variable is of type *, and as.*** () to convert a variable to type *. Accordingly, the conversion of the matrix to the data box type should be:

> myarray C1 C2 C3 C4 C5R1 15 9 13 17R2 2 6 10 14 18R3 3 7 11 15 19R4 4 8 12 16 20 > myarrayframe myarrayframe C1 C 2 C 3 C4 C5R1 15 9 13 17R2 2 6 10 14 18R3 3 7 11 15 19R4 4 8 12 16 20 > is.data.frame (myarray) [1] FALSE > is.data.frame (myarrayframe) [1] TRUE

Like the matrix matrix operation, the data box also has rbind and cbind functions, which are roughly the same. Interested friends can simply contact them. I won't repeat them here.

Finally, let's talk about the data frame data processing operation:

As we mentioned above, using dataframe [column number] or dataframe [column value], you can read a column of the data box, and the return value is still the data box type, but it is not convenient for this part of the data to be calculated and analyzed directly using the summation and average methods we mentioned earlier, because the read data has a "row name / column name", which is a character variable. Some people will ask, when I create a data box, why not add the row and column names? First, when creating a data box, you will be assigned a row or column name by default, and second, even if you do not assign a row or column name, what is the point of creating a data box?

> mydataframe C1 C2 C3 C4 C5R1 15 9 13 17R2 2 10 14 18R3 3 7 11 15 19R4 4 8 12 16 20 > mydataframe ["C4"] C4R1 13R2 14R3 15R4 16 > mean (mydataframe ["C4"]) [1] NAWarning message:In mean.default (mydataframe ["C4"]): parameter is neither numeric nor logical: reply NA > is.data.frame (mydataframe ["C4"]) [1] TRUE

Method 1: convert the data box format back to the matrix format, and then find the data set to be processed according to the matrix index, and use the relevant functions in the matrix or vector to carry out certain data processing.

> myarray2 is.matrix (myarray2) [1] TRUE > myarray2 C1 C2 C3 C4 C5R1 15 9 13 17R2 2 6 10 14 18R3 3 7 11 15 19R4 4 8 12 16 20 > x xR1 R2 R3 R3 R4 9 10 11 12 > is.vector (x) # check whether x is a vector type [1] TRUE > mean (x) [1] 10.5 > sum (x) [1] 42

Method 2: use another method when reading the columns of the data box, dataframe$ (row name or column name), and the return value is of type vector

> c [1] 9 10 11 12 > is.vector (c) [1] TRUE > mean (c) [1] 10.5 > sum (c) [1] 42

You can also use dataframe$ (new column name) mydataframe$sum mydataframe$mean mydataframe C1 C2 C3 C4 C5 sum meanR1 15 9 13 17 14 7R2 2 6 10 14 18 8R3 3 7 11 15 19 18 9R4 4 8 12 16 20 10 10

What is most recommended is the next method, which directly uses the transform function to build a new data box. The specific usage is as follows:

> x1 x2 mydataframe2 mydataframe2 C1 C2 C3 C4 C5 sum mean sum2 mean2R1 15 9 13 17 14 10 5R2 2 6 10 14 18 16 8 12 6R3 3 7 11 15 19 9 14 7R4 4 8 12 16 20 20 10 16 8 Thank you for reading. This is the content of "how to use Dataframe in R language". After studying this article, I believe you have a deeper understanding of how to use Dataframe in R language. The specific use situation still needs to be verified by practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.