In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "what are the two core technologies of big data". In daily operation, I believe many people have doubts about what the two core technologies of big data are. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts about "what are the two core technologies of big data?" Next, please follow the editor to study!
What is Hadoop?
Hadoop started as a Yahoo project in 2006 and has since been promoted to a top Apache open source project. It is a general distributed system infrastructure with multiple components: Hadoop distributed File system (HDFS), which stores files in Hadoop native format and parallelizes them in a cluster; YARN, a scheduler that coordinates application runtime; and MapReduce, an algorithm that actually processes data in parallel. Hadoop is built in the Java programming language, and applications on it can also be written in other languages. Through a Thrift client, users can write MapReduce or Python code.
In addition to these basic components, Hadoop includes Sqoop, which moves relational data into HDFS; Hive, an SQL-like interface that allows users to run queries on HDFS; Mahout, machine learning. In addition to using HDFS for file storage, Hadoop can now be configured to use S3 buckets or Azure blob as input.
It can be open source through Apache distributions, or it can be provided by vendors such as Cloudera (the largest and largest Hadoop vendor), MapR or HortonWorks.
What is Spark?
Spark is a relatively new program that was launched in 2012 at AMPLab at the University of California, Berkeley. It is also a top-level Apache project that focuses on processing data in parallel in a cluster, with a big difference in that it runs in memory.
Similar to the concept of Hadoop reading and writing files to HDFS, Spark uses RDD (Elastic distributed datasets) to process data in RAM. Spark runs in stand-alone mode, and the Hadoop cluster can be used as a data source or run with Mesos. In the latter case, the Mesos master will replace the Spark master or YARN for scheduling.
Spark is built around Spark Core, and Spark Core is the engine that drives scheduling, optimization, and RDD abstraction, and connects Spark to the correct file system (HDFS,S3,RDBM or Elasticsearch). Several libraries are also running on Spark Core, including Spark SQL, which allows users to run SQL-like commands on distributed datasets, MLLib for machine learning, GraphX for solving graphics problems, and Streaming for entering continuous streaming log data.
Spark has several API. The original interface was written in Scala, and due to the use of a large number of data scientists, Python and R interfaces were added. Java is another option for writing Spark jobs.
Databricks is a company founded by Matei Zaharia, founder of Spark, and is now responsible for Spark development and providing Spark distribution to customers.
At this point, the study of "what are the two core technologies of big data" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.