Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of Apache Zeppelin Notebook and R

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly analyzes the relevant knowledge points of the example analysis of Apache Zeppelin Notebook and R, the content is detailed and easy to understand, the operation details are reasonable, and has a certain reference value. If you are interested, you might as well follow the editor to have a look, and follow the editor to learn more about the "sample Analysis of Apache Zeppelin Notebook and R".

Introduction

The purpose of the editor is to help you start using Apache Zeppelin Notebook, which can meet your needs of using R for data science. Zeppelin is a Web-based notebook that provides interactive data analysis. It makes it easy for you to make beautiful documents that are data-driven, interactive and collaborative, and supports a variety of languages, including Scala (using Apache Spark), Python (Apache Spark), SparkSQL, Hive, Markdown, Shell and so on. And Zeppelin supports its own rewriting of plug-ins in various languages, which is easy to extend.

However, the latest official version is 0.5.0 and does not support the R programming language. Fortunately, NFLabs did an open source project that asked me to provide an R compiler. This compiler is a Zeppelin plug-in that allows users to use a custom language as the back end of data processing. For example, with scala code in Zeppelin, you need a Spark compiler. So, if you are as patient as I am in integrating R into Zeppelin, this tutorial will show you how to configure Zeppelin and R from source code.

Preparatory work

We will install Zeppelin on Linux through Bash shell. If you are using the Windows operating system, I recommend that you install and use the Cygwin terminal (which provides functionality similar to the Linux distribution on Windows).

Make sure that Java 1.7 and Maven 3.2.x are installed and configured in the environment variables.

The first step in building Zeppelin from source code: download the Zeppelin source code

Go to the github branch to download the source code, copy and paste the link into your browser: https://github.com/elbamos/incubator-zeppelin/tree/rinterpreter

In my example I have downloaded and unzipped the folder on my desktop

Suppose you are installed on a stand-alone computer, open your Terminal and run the following code. If you are installing in a cluster, it will be a little more complicated, which can be found in the Zeppelin documentation for specific steps.

$cd Desktop/Apache/incubator-zeppelin-rinterpreter$ mvn clean package-DskipTests

It will take about 16 minutes to build Zeppelin, Spark, all the engines including RMagneMardDown, ShellMed, hive, and so on.

Step 3: start Zeppelin

Run the following command to start Zeppelin:

$. / bin/zeppelin-daemon.sh start

Open a web browser and visit http://localhost:8080. At this point, you are ready to start creating interactive notebooks in code in Zeppelin.

The first step in interactive data science: create a notebook

Click the Notebook page next to the drop-down arrow and click create New report.

Name your notebook or you can use the default name you specify. I named it "Base R in Apache Zeppelin".

Step 2: start your analysis

As shown in the following figure, calling R can be labeled "% spark.r" or "% spark.knitr". First, let's write some introductions in markdown.

Based on what we may need our analysis, let's install some packages now.

We will use the "flights" dataset to show flights leaving New York in 2013. Now let's read the dataset.

Now, let's use dplyr (with pipe characters) to do some data manipulation.

You can also use bar and pie charts to visualize some descriptive statistics.

Now, let's dance with ggplot2.

Now, let's do some statistical machine learning with caret packages.

Finally, draw some maps.

This is the end of the "sample Analysis of Apache Zeppelin Notebook and R". More related content can be searched for previous articles, hoping to help you answer questions and questions, please support the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report