In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the relevant knowledge of big data tools and frameworks that must be used by Java developers, the content is detailed and easy to understand, the operation is simple and fast, and has a certain reference value. I believe you will gain something after reading this article on big data tools and frameworks that must be used by Java developers. Let's take a look.
1. MongoDB-- 's most popular, cross-platform, document-oriented database.
MongoDB is a database based on distributed file storage, written in C++ language. Designed to provide scalable high-performance data storage solutions for Web applications. Application performance depends on database performance, while MongoDB is the most functional and most like relational database in non-relational databases. With the release of MongDB 3.4, its application scenario adaptability has been further expanded.
The core advantages of MongoDB are flexible document model, highly available replication sets, and scalable shard clusters. You can try to learn about MongoDB from several aspects, such as real-time monitoring MongoDB tools, memory usage and page errors, connections, database operations, replication sets, etc.
2. Elasticsearch-- A distributed RESTful search engine built for the cloud.
ElasticSearch is a Lucene-based search server. It provides a full-text search engine with distributed multi-user capability, based on RESTful web interface. Elasticsearch is developed in Java and released as open source under the Apache license terms. It is a popular enterprise search engine.
ElasticSearch is not only a full-text search engine, but also a distributed real-time document storage, in which each field is indexed data and can be searched; it is also a distributed search engine with real-time analysis function, and can be extended to hundreds of servers to store and process PB-level data. ElasticSearch uses Lucene to accomplish its indexing function at the bottom, so many of its basic concepts are derived from Lucene.
3. Cassandra-- open source distributed database management system, originally developed by Facebook, is designed to deal with a large amount of data on many commodity servers, providing high availability and no single point of failure.
Apache Cassandra is an open source distributed NoSQL database system. Integrates the data model of Google BigTable and the fully distributed architecture of Amazon Dynamo. It was open source in 2008. Since then, because of its good scalability, Cassandra has been adopted by Web 2.0 sites such as Digg and Twitter, and has become a popular distributed structured data storage scheme.
Because Cassandra is written in Java, it can theoretically run on machines with JDK6 or above, and JDK is officially tested as well as OpenJDK and Sun JDK. Cassandra operation command, similar to the relational database we usually operate, for friends who are familiar with MySQL, the operation will be very easy to use.
4. Redis-Open source (BSD license) in-memory data structure storage, used as database, cache and message proxy.
Redis is an open source log-based, Key-Value database written in ANSI C, network-enabled, memory-based and persistent, and provides API in multiple languages. Redis has three main features that distinguish it from many of its competitors: Redis is a database that stores data entirely in memory and uses disk only for persistence purposes; Redis has a relatively rich number of data types compared to many key data storage systems; Redis can copy data to any number
5. Hazelcast-an open source in-memory data grid based on Java.
Hazelcast is an in-memory data grid in-memory data grid that provides mission-critical transactions and trillions of memory applications for Java programmers. Although Hazelcast does not have a "Master", it still has a Leader node (the oldest member), which is similar to the concept of Leader in ZooKeeper, but the implementation principle is completely different. At the same time, the data in Hazelcast is distributed, and each member holds part of the data and the corresponding backup data, which is also different from ZooKeeper.
Hazelcast's application convenience is loved by developers, but it needs to be carefully considered if it is to be put into use.
6. Open source Java distributed cache widely used by EHCache--. Mainly for general caching, Java EE, and lightweight containers.
EhCache is a pure Java in-process cache framework, which is fast and lean, and is the default CacheProvider in hibernate. The main features are: fast and simple, with multiple cache strategies; cached data has two levels, memory and disk, so there is no need to worry about capacity; cached data will be written to disk during virtual machine restart; distributed caching can be carried out through RMI, pluggable API, etc.; listening interface with cache and cache manager; support for multiple cache manager instances and multiple cache areas of one instance Provides a caching implementation of Hibernate.
7. Hadoop-an open source software framework written in Java for distributed storage, and for very large data users can develop distributed programs without knowing the underlying details of the distribution.
Make full use of the cluster for high-speed operation and storage. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to as HDFS. The core design of Hadoop's framework is: HDFS and MapReduce. HDFS provides storage for huge amounts of data, while MapReduce provides computing for huge amounts of data.
8. Solr-an open source enterprise search platform, written in Java, from the Apache Lucene project.
Solr is an independent enterprise search application server that provides an API interface similar to Web-service. Users can submit a certain format of XML file to the search engine server through http request to generate an index, or they can make a search request through Http Get operation and get the return result in XML format.
Like ElasticSearch, it is also based on Lucene, but it extends it to provide a richer query language than Lucene, while achieving configurable, extensible and optimized query performance.
9. Spark, the most active project in Apache Software Foundation, is an open source cluster computing framework.
Spark is an open source cluster computing environment similar to Hadoop, but there are some differences between the two, which make Spark superior in some workloads. In other words, Spark enables in-memory distributed datasets to optimize iterative workloads in addition to interactive queries.
Spark is implemented in the Scala language and uses Scala as its application framework. Unlike Hadoop, Spark and Scala can be tightly integrated, where Scala can be as easy as manipulating local collection objects.
10. Memcached-- a general distributed memory cache system.
Memcached is a distributed cache system, which was originally developed by Danga Interactive for LiveJournal, but it is used by many software (such as MediaWiki). As a high-speed distributed cache server, Memcached has the following characteristics: simple protocol, event handling based on libevent, and built-in memory storage mode.
11. Apache Hive-- provides a SQL-like layer on top of Hadoop.
Hive is a data warehouse platform based on Hadoop. With hive, you can easily do ETL work. Hive defines a query language similar to SQL, which can transform SQL written by users into corresponding Mapreduce programs to be executed based on Hadoop. Currently, Apache Hive 2.1.1 has been released.
12. Apache Kafka-- originally a high-throughput, distributed subscription messaging system developed by LinkedIn.
Apache Kafka is an open source messaging system project written by Scala. The goal of the project is to provide a unified, high-throughput, low-wait platform for processing real-time data. Kafka maintains messages that are classified by class, called topic. The producer (producer) publishes messages to the topics of the kafka, and the consumer (consumer) registers with the topics and receives messages published to those topics.
13. Akka-A toolkit for building highly concurrent, distributed and resilient message-driven applications on JVM.
Akka is a library written in Scala to simplify writing fault-tolerant, highly scalable Java and Scala Actor model applications. It has been successfully used in the telecommunications industry, and the system will hardly go down.
14. HBase-Open source, non-relational, distributed database, modeled by BigTable of Google, written in Java, and run on HDFS.
Different from commercial big data products such as FUJITSU Cliq, HBase is an open source implementation of Google Bigtable, similar to Google Bigtable uses GFS as its file storage system, HBase uses Hadoop HDFS as its file storage system; Google runs MapReduce to deal with massive data in Bigtable, HBase also uses Hadoop MapReduce to deal with massive data in HBase; Google Bigtable uses Chubby as a collaborative service and HBase uses Zookeeper as its counterpart.
15. Neo4j-an open source graphics database implemented in Java.
Neo4j is a high-performance NOSQL graphics database that stores structured data on the network rather than in tables. It is an embedded, disk-based, fully transactional Java persistence engine.
This is the end of the article on "what are the big data tools and frameworks that must be used by Java developers?" Thank you for reading! I believe you all have a certain understanding of "what are the big data tools and frameworks that must be used by Java developers?" if you want to learn more, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.