Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to compare MongoDB, Cassandra and HBase NoSQL databases

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article is about how to compare the three NoSQL databases of MongoDB, Cassandra and HBase. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article.

Hadoop has gained the reputation of many big data applications, but the reality is that NoSQL database is a technology that has been more widely deployed and developed. Although Hadoop is chosen as the application storage, it is relatively straightforward and simple. However, it is worth thinking about what kind of NoSQL database to use. After all, there are more than 100 kinds of NoSQL database.

Which one should we choose?

Choice tendency

"any enterprise of decent size will use different types of data storage technologies to cope with different types of data." According to Martin Fowler, the reality is that you don't have enough energy to learn more storage technologies.

Fortunately, choice is getting easier because the market revolves around three NoSQL databases: MongoDB,Cassandra (mainly developed by DataStax and born in Facebook), and HBase (closely associated with Hadoop and developed by the same community).

In addition, I deliberately ruled out Redis. Compared with big data storage, it is mainly used for high-speed memory cache data applications.

According to research data from LinkedIn, the most attractive ones on the market are MongoDB, Cassandra and HBase:

This is LinkedIn's personal data. We think it is a data storage engine, which collects work, search and other data to understand the popularity of the database. While Oracle,SQL Server and MySQL dominate, MongoDB (5th), Cassandra (9th), and HBase (15th).

To better explain why these three database technologies are so dazzling, I asked each representative to determine the key factors for their success: Kelly Stirman,MongoDB 's product director; Patrick McFadin,DataStax 's Cassandra chief evangelist; and Justin Kestelyn,Cloudera senior director.

But first, we need to understand why we use NoSQL.

The world is made up of unstructured data

We live in a world where data is becoming more and more abundant, but none of this data can be neatly displayed in a RDBMS (Relational Database Management System) rows and columns. Mobile, social and cloud computing have spawned huge amounts of data. It is estimated that 90% of the world's data was created in the past two years, and 80% of the business data is unstructured. More importantly, unstructured data is growing twice as fast as structured data.

With the change of the world, the requirements of data management begin to go beyond the effective scope of traditional relational databases. The first organizations to pay attention to the solution to this problem include pioneers of Web technology, government agencies, and companies engaged in information technology services.

Now more and more companies want to use similar NoSQL and Hadoop as substitutes: to build business operation applications through NoSQL, and Hadoop to create data mining applications to help companies provide strong research on business data.

MongoDB: originating from developers and serving developers

Among the many NoSQL solutions, MongoDB's Stirman points out that MongoDB is aimed at a balanced approach suitable for a variety of applications. Its function is similar to traditional relational database, MongoDB users can not only take advantage of the cloud infrastructure of its scale-out machine, but also support different types of dataset storage because it can easily define a variety of flexible data models.

MongoDB is usually the first NoSQL database developers try because it is easy to learn. CEO of Will Shulman,MongoLab, a MongoDB service provider, says:

The success in MongoDB is largely due to its innovative data structure storage, which makes it easier and more expressive to define the data models in our applications. In general development and application scenarios, it is of great advantage to have the same basic data model as the original database, because it simplifies the task of application development and, on the other hand, eliminates the complex data format code conversion layer.

Of course, like any other technology, MongoDB has its strengths and weaknesses. MongoDB is specifically for OLTP (On-Line Transaction Processing, online transaction processing system) mode. If you need complex transactions, it is not a good choice. However, the simplicity of MongoDB makes it an excellent storage.

Note: MongoDB stores data as a document and does not support transactions and table joins. Therefore, it is much easier to write, understand, and optimize queries. )

Cassandra: large-scale safe operation

Among the three kinds of databases, at least two kinds of databases have simple characteristics: easy to develop and easy to operate. MongoDB wins people's hearts because of its simple development applications, and Cassandra wins people's hearts because of its easy-to-manage scale.

DataStax's McFadin tells me that users tend to use Cassandra because it is very difficult to enhance the performance and reliability of relational data, especially in large clusters. A former Oracle DBA,McFadin is delighted to find that "replication and scalability are the foundation", and Cassandra is characterized by solving this problem from the very beginning.

In the world of RDBMS, database functionality, expansion and replication are a challenge for many developers and users. This problem was not a big problem when the scale of enterprises was small in the past. Today, it quickly becomes a big problem.

I learned from McFadin and others that Cassandra is particularly good at machine expansion deployment. The backup mechanism of Cassandra ensures the data security of each data center. As for adding capacity to the cluster, "all you have to do is start a new machine and tell Cassandra about the new node there," McFadin said, "and then it does the rest."

Excellent scalability, coupled with excellent writing and considerable query performance, add up to become the core of the high performance of Cassandra.

An article by NoSQL argues that Cassandra is excellent at managing cluster size, but it requires a doctorate to get started. This is not the case, McFadin insists:

Copying, reading and writing is deliberately simple. You can learn the core functions of Cassandra in a few hours. When deploying this new technology, it brings a lot of confidence to developers because the technical details and complex failure mode principles in the "black box" are rarely introduced.

This means that the main development cost is the understanding of the Cassandra data model and how to integrate your application. Given Cassandra's CQL query language (similar to SQL, but not actually SQL), McFadin says it's not difficult to learn.

More importantly, he told me, "what Cassandra returns to you is that in a database: there are no dramatic scenarios (failures). That's why users like to use Cassandra."

HBase:Hadoop 's bosom friend

HBase, like Cassandra, is a service for column storage through key-value. It is widely used because it shares a "common pedigree" with Hadoop. In fact, as Cloudera's Kestelyn puts it, "HBase provides a record-based storage layer that can read and write data quickly and randomly, just to make up for the shortcomings of Hadoop. Hadoop focuses on system throughput at the expense of I / O read efficiency."

Kestelyn then said:

Changes are effectively entered into memory to maximize access while saving the data to HDFS. This design enables the Hadoop-based EDH (enterprise data hub, Enterprise data Center) service to read, write and store data randomly in real time, but still has the high fault tolerance and durability of HDFS.

Hadoop's affinity is not the only reason for the rising popularity rankings in the HBase database. Similar to Cassandra,HBase is Google's open source implementation of Bigtable into a database that is naturally designed to be highly extensible.

Hbase can take advantage of any number of servers' disk, memory, and CPU resources, and has excellent expansion features, such as automatic sharding. When the system load and performance requirements are increasing, HBase can be expanded infinitely by simply adding server nodes. HBase is designed from the bottom to ensure data consistency while providing the best performance.

But size is not its only use. Kestelyn points out that "because of its tight integration with Hadoop's ecosystem, data is easily accessible to users and applications, and can be queried through SQL (using Cloudera's Impala,Phoenix, or Hive), or even free text search (using Cloudera Search)." Therefore, HBase provides developers with a way to use the existing common SQL language to build on a more mature distributed database.

Each database technology has its own advantages and disadvantages, but the three databases reviewed here occupy an important position in the technical field of big data. While there may be a new NoSQL database technology that will challenge their top three positions in the future, the reality is that many developers and a group of powerful mature enterprises have made their choices: MongoDB, Cassandra, and HBase.

The above is how to compare the three NoSQL databases of MongoDB, Cassandra and HBase. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report