In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
The following is the material of big data compiled by the education of the old boy. Please indicate the source of the reprint: http://www.oldboyedu.com
Hadoop
Hadoop is a distributed computing engine with four modules, common, hdfs, mapreduce and yarn.
Concurrency and parallelism
Concurrency usually refers to the ability of a single node to respond to multiple requests, which is a measure of computing power on a single node. Parallelism usually refers to the use of multiple nodes for distributed collaborative work, which we call parallel computing.
Spark
Fast as lightning cluster computing engine, applied to large-scale data processing fast general-purpose engine, using memory computing.
1.Speed
Memory computing is more than 100 times faster than hadoop, hard disk computing is more than 10 times faster than Hadoop, and Spark uses an advanced DAG (Direct acycle graph) execution engine.
two。 Easy to use
Provide 80 + advanced operators, you can easily build parallel applications, you can also use scala,python,r 's shell for interactive operations.
3. Versatility
SQL, flow calculation and complex analysis can be combined. Spark provides class library stacks, including SQL, MLlib, graphx, and Spark streaming.
4. Architecture
Including: Spark core, Spark SQL, Spark streaming, Spark mllib and Spark graphx
5. Run everywhere
Spark can run on hadoop, mesos, standalone and clound, and can access a variety of data sources, such as hdfs, hbase, hive, Cassandra, S3 and so on.
Spark cluster deployment model
1.local
You don't need to start any Spark processes, but use one JVM to run all the components of Spark, mainly for debugging and testing.
2.standalone
In stand-alone mode, you need to install the Spark cluster, start the master node and the worker node, respectively. Master is the management node and worker is the execution node of task.
3.yarn
There is no need to deploy Spark clusters separately, it can be said that there is no concept of Spark clusters at all.
In this mode, the Job execution process of Hadoop is completely used, but the Task execution of Spark is used when the task is started at the end, which is equivalent to that Spark is a Job of Hadoop. All the jar packages of Spark are put into the dependent package run by job, and the process is carried out according to the execution process of hadoop.
Install spark
1. Download spark-2.1.0-bin-hadoop2.7.tgz
The following is the official download address of Spark:
Https://www.apache.org/dyn/closer.lua/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz
two。 Extract the file to the / soft directory
$> tar-xzvf spark-2.3.0-bin-hadoop2.7.tgz-C / soft
3. Create a soft connection
After creating a soft connection, it is very convenient to compile various file configurations and to upgrade and replace versions later.
$> cd / soft
$> ln-s spark-2.3.0-bin-hadoop2.7 spark
4. Configure environment variables
Edit the / etc/profile environment variable file:
$> sudo nano / etc/profile
Add the following at the end of the file:
...
SPARK_HOME=/soft/spark
PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
Note: add both the bin directory and the sbin directory of Spark to the environment variable path, and Linux uses ":" as the delimiter.
5. Environmental variables take effect
$> source / etc/profile
6. Enter the Spark-shell command line
$> / soft/spark/spark-shell
# enter the scala command prompt
$scala >
7. Experience Spark-shell
Because Spark uses the scala language, it is exactly the same as the use of Scala.
$scala > 1 + 1
# output results
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.