Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use R language to make column chart

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "how to make a bar chart in R language". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

In fact, R language itself has a variety of mapping functions, such as plot, bar, pie and so on, and the syntax is very simple and clear, so why use ggplot2, a syntax-independent and self-contained mapping package?

You can feel it with an example:

Plot (mpg$cty,mpg$hwy) # R language has built-in scatter plot functions (no need to load any auxiliary toolkits)

Ggplot (mpg,aes (cty, hwy)) + geom_point (colour= "steelblue") + labs (x = "City mpg", y = "Highway") # ggplot function in the ggplot2 package (you need to load the ggplot2 toolkit support first)

This is a very simple example, the two graphs represent the same data variables, the same chart form, and there is almost no difference in accuracy.

But even the appearance of two foods with the same taste can affect appetite, and the two charts are like the comparison between a Nokia phone and an iPhone phone, even if there is little difference in function, but the artistic sense of appearance has pulled the two apart.

This is why I have just come into contact with the R language, but also confused to learn a variety of built-in chart functions, suddenly saw that the gods have already used ggplot, immediately chose to get started with ggplot.

Today, I will introduce the use of column chart in ggplot function (for a large family, single sequence column chart, cluster column chart, stacked column chart, percentage stacked column chart, and facet column chart).

In fact, strictly speaking, in the mapping function of R prediction, there is no strict distinction between column chart and bar chart, because both of them express the same data type and information both in form and function. They have a common name-Barplot.

The conversion between the two often requires only one additional parameter.

Coord_flip ()

Today, let's introduce the column chart:

For the time being, use the dataset mpg built into the ggplot2 package.

The first six records of the dataset can be viewed through the head (mpg) function, the variable types of the dataset can be viewed through str (mpg), and the simple statistical summary results of the dataset can be viewed by summary (mpg).

Single sequence column chart:

Ggplot (mpg,aes (class,displ)) + geom_bar (stat= "identity", fill= "steelblue")

Among the above parameters, mpg is the name of the dataset, and the parameters in aes are x value-class (classification variable) and y value-displ (continuous variable).

Geom_bar is a histogram layer added on top of the ggplot coordinate system, stat is a statistical transformation of numeric variables (default is count), fill is a color fill setting, can be a classification variable, can also be directly mapped to color.

Ggplot (mpg,aes (reorder (class,displ), displ) + geom_bar (stat= "identity", fill= "steelblue")

The simplest single sequence bar chart above, in fact, there are many parameter adjustment settings, which will not be explained in detail here. If you are interested, you can refer to ggplot2-- data Analysis and graphic Art, the author's book of this package.

Multi-series cluster column chart:

With (mpg,table (class,year))

Through the summary, you can see the crosstab relationship between class and year. The following two variables will be used to create a series of cluster bar charts.

Ggplot (data=mpg,aes (year)) + geom_bar (position='identity')

Because year is an int variable, you need to use factor to become a factor in the parameter setting city. The above chart is two series of column charts without any settings. You can see that the positions of the two series overlap and the actual height of the 1999 column chart cannot be seen clearly.

Ggplot (data=mpg,aes (year)) + geom_bar (position='identity',alpha=0.3)

Even if the transparency of the bar chart is set by the alpha parameter, it is still difficult to clearly distinguish the column chart from the column chart in 1999 and 2008. What we want to see here is that the column charts of 1999 and 2008 do not overlap but are placed side by side. The postion parameter needs to be adjusted.

Ggplot (data=mpg,aes (year)) + geom_bar (position='dodge')

After adjusting the position parameter to dodge, we achieve the desired effect, when the two sequences are juxtaposed, we can clearly see the height of each other.

Of course, we can also set up two sequence stacks.

Ggplot (data=mpg,aes (year)) + geom_bar (position='stack')

By setting the position parameter to stack, we can stack the two-year target and achieve the same goal.

If we want to observe the percentage of biennial shares in each category, we can also do this by modifying the position parameter.

Ggplot (data=mpg,aes (year)) + geom_bar (position='fill')

At this point, you can get the two-year data share of each category, and if you look closely, you will find that the color order of the legend is opposite to that of the color order in the chart, and there are pits everywhere.

Ggplot (data=mpg,aes (year)) + geom_bar (position='stack') + guides (fill= guide_legend (reverse = TRUE))

Ggplot (data=mpg,aes (year)) + geom_bar (position='fill') + guides (fill= guide_legend (reverse = TRUE))

By setting the filling order of the column chart and the display order of the legend, the color order in the legend is consistent with that in the chart.

The last type of chart is faceted group chart:

Ggplot (data=mpg,aes (year)) + geom_bar (position='fill') + facet_grid (. ~ drv)

Ggplot (data=mpg,aes (year)) + geom_bar (position='fill') + facet_grid (drv~.)

In addition, we can also apply the existing theme, refine the detailed elements of the chart (legend, axis label, data label, column spacing, background and color theme, etc.), these details have a lot of special parameters to adjust and set, the details are better to take a look at Hadley's monograph, will understand more thoroughly.

By setting the facet parameter: facet_grid, we can make a facet group map of each classification item from a classification variable.

But considering that people draw a little more in excel, the mapping method in R is quite different from that in excel:

Drawing by summarizing wide data in excel (also the only format that office can recognize)

But the R language adheres to the mapping rules of standard data sources (long data, that is, data sources in type database format).

The huge differences in the data storage formats supported by charts often become a major reason for beginners to stumble and cause confusion in front of R language charts. (I am also a beginner)

Therefore, if you want to play R language visualization, you must be able to adapt to the characteristics of long data as a standard data storage format. Understand how variable types affect chart rendering.

Want to adapt to R language drawing: I think there are two ways to refer to:

1. Suppose you are completely immersed in or unable to break away from the wide data mapping form of excel, which means that the dataset you import is often in a wide data format. You need to be very proficient in using the data reshaping kits in R: dplyr, tidyr, reshape2, etc., to reshape wide data into long data formats supported by R mapping. The advantage is that you can gradually adapt to the habit of mapping data in R language, but you need to learn a lot of additional data reshaping tools and functions.

2. If you have a good understanding of long data (for example, you often use statistical analysis software, most of which come into contact with standard long data, that is, one-dimensional tables), then you can directly convert wide data into long data (two-dimensional to one-dimensional) in excel, or import long data from the database directly into R, just make some basic settings. At least not too much time and energy will be wasted on the conversion of data length and width format.

I advocate the second kind, because excel is not a standard visualization software (although the function is not to be underestimated, but because it takes into account the office properties of data aggregation, so the format of data storage is not too much set, and the flexibility is too high. In order to adapt to this situation, the chart engine developed by Microsoft engineers also uses this kind of aggregated two-dimensional data table as mapping data, which is obvious. Because of the one-dimensional tables (long data) just exported from the database, many cases are not suitable for drawing directly in excel.

Professional statistical analysis tools such as Eviews, SPSS, Stata and R, Python, and even data visualization software such as Tableau and PowerBI accept long data mapping by default. Variable formatting is done before data import, although some tools for converting length and width data are also provided.

This is the end of the content of "how to make a bar chart in R language". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report