In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about why I want to learn R language. Many people may not understand it very well. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.
R software is an excellent open source software that integrates data manipulation, statistics and visualization. R software has efficient data processing and storage functions, is good at data matrix operation, provides a large number of tools suitable for data analysis, and supports a variety of data visual output. One of the advantages of R software is that analysts can use a simple R program language to describe the processing process in order to build a powerful analysis function. In addition, R software has good scalability, and researchers from open source communities around the world provide it with a variety of rich toolkits. Because R software can combine various mining algorithms, it can effectively simplify the process of data analysis and is suitable for the field of data mining.
The past Life and present Life of R language
R is a branch of S language widely used in the field of statistics, which was born around 1980. It can be considered that R is an implementation of S language. S language is an interpretive language developed by AT&T Bell Laboratories for data exploration, statistical analysis and mapping. The original implementation version of the S language is mainly S-PLUS. S-PLUS is a commercial software based on the S language and further improved by the Statistical Science Department of MathSoft. Later, RobertGentleman and RossIhaka of the University of Auckland in New Zealand and other volunteers developed an R system. The "R development core team" is responsible for the development. R can be seen as an implementation of the S language developed by Bell Labs. Of course, the S language is also the foundation of S-Plus. Therefore, the two program syntax can be said to be almost the same, but there may be slight differences in the function, the program can be easily transplanted to a program, and many one programs can also be used in R with slight modification.
The existence of R language is reasonable
(1) Free and open source
At present, there are all kinds of mining software, and the mainstream commercial mining tools such as Unica, SAS/EM, InsightfulMiner, IBMIM, Matlab and SPSS are characterized by general mining problems, perfect functions and good performance. However, there are generally some shortcomings, such as low scalability, high cost and so on. Open source software can effectively overcome these shortcomings, including Weka, YALE, KNIME, Orange and R. R software is an excellent open source software that integrates data manipulation, statistics and visualization. Researchers from open source communities around the world have provided them with a variety of rich toolkits Because R software can combine various mining algorithms, it can effectively simplify the process of data analysis and is suitable for the field of data mining.
(2) easy to learn and convenient
The first programming language for many people is the amazing C language, because it pays attention to detail and can train people's programming thinking. However, for many researchers, their focus is on theory and thought, so it is almost impossible for them to write algorithms that they have not easily proved in C language. So is there any programming language that frees data scientists from onerous programming and shifts their focus to theoretical research? At this time, the Matlab language appeared, and Matlab solved this problem with its powerful vectorization and matrix computation. If the emergence of Matlab language can solve this problem, then the emergence of this language has highly pushed the solution of this problem to the pinnacle, that is R language. Anyone who has studied R language knows that it is easy to learn, interpretive sentences, easy to read and easy to understand. The data set needed for the built-in model in the package may sometimes be able to complete the process from data to model construction to visual output of the results, which greatly facilitates the research of data scientists. And after learning, we can also write a more suitable function module according to the functions of the existing package, reflecting the strong expansibility of R language.
(3) powerful functions
As the second vector programming language (Matlab is the first), R is a complete data processing, calculation and mapping software system. Its functions include: data storage and processing system; array operation tools (especially powerful in vector and matrix operations); complete and coherent statistical analysis tools; excellent statistical mapping functions; simple and powerful programming language: can manipulate data input and output, can achieve branches, loops, user-defined functions.
R language popularity index
R programming language is widely used in the fields of statistics and science, and is in a leading position in the field of cloud computing. In the recently released ranking of programming languages by Redmond, R was ranked 13th. According to IEEESpectrum's ranking of the most popular programming languages, R ranks third in TiobeSoftware among data languages, and R became the 18th most popular programming language in January, up from 44th a year ago. PYPL (programming language popularity Index) is based on the frequency of searches for language tutorials on Google. In terms of global search engine popularity, R language ranks ninth.
The wide Application of R language at present
Medical treatment
There is a kind of analysis called Metaeta analysis of survival data. Survival analysis (survivalanalysis) is a kind of statistical method that combines the outcome and survival time of patients.
The packages used for Meta analysis in R software mainly include Meta, rmeta and metafor. The program package can be used to analyze binary data, continuity data, correlation coefficient, survival analysis data and so on. Meta analysis of survival data is becoming more and more common. After obtaining the indicators such as HR and its 95%CI, how to calculate and merge HR is a key step. There are more and more Meta analyses of survival data. After calculating the risk ratio (HR) and its 95% confidence interval, the difference between the actual frequency and the theoretical frequency (Omure) and its standard deviation from the original literature, the R software can be used to calculate the combined HR, thus calculating the survival rate of patients of different ages. RevMan software is simple and easy to learn, but it has some limitations; R software is powerful, flexible and diverse, and can draw a variety of graphics, but it needs proper programming.
data mining
The main mining steps are:
(1), task definition. Through analysis to determine the mining task, it is required to be able to accurately and succinctly describe the task information.
(2) data preparation. Data mining is an operation based on data, which requires data acquisition, data extraction and data transformation (such as word vectorization in text mining, etc.).
(3) Mining and modeling. According to the mining task, select a good model to describe and describe the data object.
(4) Model evaluation. According to the modeling results, combined with the actual background and significance, the problem is evaluated, and even a reasonable solution is given when needed.
Open source R software integrates a variety of data analysis and visualization methods, has a powerful data analysis function and good scalability, and is suitable for data mining. For example, combined with the data mining cases of the main economic indicators of the city, the application methods of R software in the main stages of the mining process are given, and the data preparation stage includes the application of data extraction, data selection and statistical analysis; the typical mining application of clustering and classification is given in the mining modeling stage, and the evaluation method of decision tree is given in the model evaluation stage. From the concise R language script design and good analysis results, which show the basic characteristics of R software and its advantages and applications in data mining applications.
Teaching experiment
In fact, this is still the embodiment of R language is a free language. For example, as we all know, a set of Microsoft's office office software is still expensive. Many students occasionally use it to write, so do they have to spend hundreds of yuan on an office? The course of experimental design and data processing is a compulsory basic course for all engineering majors, and it is a highly theoretical, applied and practical methodology discipline. As a branch in the field of natural science research methodology, it provides basic training for students engaged in scientific research, engineering experiments and engineering design based on probability theory, mathematical statistics, professional and technical knowledge and practical experience. To cultivate students' ability to correctly determine scientific research, engineering test schemes and data processing. At present, almost all the experimental design and data analysis and processing are completed by software. Commercial software such as SAS, SPSS, Matlab and so on are widely used in experimental design and data processing. In view of the fact that this kind of software is expensive and needs a large cost, it is only temporarily used in teaching experiments, so exploring the application of R in the teaching of experimental design and data processing saves the cost. it also loses the original intention that it is of great significance to cultivate students' scientific research and innovation ability and practical ability.
e-Buniness
With the development of e-commerce, there are higher requirements for service, quickness, low cost and flexibility of the distribution center, and it is more important to coordinate the relationship between demand and inventory through order-inventory analysis. Since the "Singles Day" promotion caused a huge sensation in 2011, various promotion time points such as "Singles Day", "double Twelve", "618", weekly celebration and year-end celebration have been emulated by major e-commerce enterprises. every year, there are a number of centralized promotion points continue to set off one consumption boom after another, online shopping promotion has been gradually normalized. In addition to efficient logistics, there is no doubt that sufficient inventory is the strong backing to ensure customer satisfaction in the promotion season, and then sufficient inventory does not mean to reserve super-capacity inventory, the right amount is the best. Therefore, the normalization of online shopping promotion needs accurate order demand forecasting to ensure the high efficiency, expansibility and massive data processing based on R software, and choose R software to analyze the customer information at the front end of e-commerce enterprises. in order to reduce the huge cost caused by the mismatch between inventory and demand of e-commerce logistics enterprises.
Emotion
Social media has become an important carrier for people to express their feelings. As a widely spread social media, Weibo has become an important channel to understand people's feelings. In the face of the huge and seemingly disorganized Weibo data, how to effectively extract valuable information from the existing data and then analyze network public opinion and present it in a clearer way has become an important research field. By using the powerful natural language processing package of R language, the process from model building to result visualization can be easily completed. In view of the problem that most of the existing emotion analysis studies focus on the tendency of emotion, lack of detailed description of all kinds of emotions, and can not visually reflect the emotional changes of social groups, this paper proposes an emotion analysis method based on the combination of dependency syntax and manual tagging. This method uses three-dimensional facial expressions for emotional analysis to vividly show the emotional changes of social groups. For different social events, the emotions of Weibo groups in different regions are shown in a visual way. The experimental results show that the model can effectively describe the emotion of people, and the research results provide a new idea for the analysis of online public opinion based on big data.
R language status quo
(1) Microsoft acquires R programming language
JosephSirosh, vice president of machine learning at Microsoft, wrote on his blog that "strong data analysis tools are needed to support them in making data-oriented decisions in areas such as finance, manufacturing, health, retail and academic research. R language can help employees fill the gaps in the company's data analysis." After the acquisition, RevolutionAnalytics said that it will continue to support open source projects in R language and provide customers with subscription-based technical support services.
(2) Google publishes internal guidance for R language format specification
In September 2016, Google has released the internal guidance of the 15 format specification of R language, which shows that the use of R language has also been widely recognized within Google, and it is possible to expand the scale of use, so this guidance has been issued uniformly to standardize future code.
R language is playing a more and more important role in various fields because of its easy to learn, free and open source. The birth of R language is not a flash in the pan, so we have reason to believe that its brilliant growth process will be a broad road ahead, and the Cambrian era of R language has come. Because: modern data science needs it, social development needs it.
After reading the above, do you have any further understanding of why you want to learn R language? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.