Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

NoSQL Database Cassandra (1)

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

With the development of Internet technology, there are higher and higher requirements for data storage, such as capacity, security, backup, high availability and so on. The popular relational databases are SQLServer, MySQL, Orcale, non-relational databases are key, value Redis,Memcached, document databases are MongoDB, CouchDB, as well as column cluster type Hbase, Cassandra. There are many kinds, and there are more and more knowledge points to learn. when we choose the technology, we must follow "there is no best technology, only the most suitable technology". Because there are some new technologies in need of business, we will take notes on the process of preliminary study of Cassandra for later reference.

1. Know Cassandra for the first time

Apache Cassandra is a highly scalable, high-performance distributed NoSQL database. Cassandra is designed to handle large amounts of data on many servers, providing high availability without worrying about a single point of failure.

Cassandra has a distributed architecture that can handle large amounts of data. Data is placed on different machines with multiple replication factors for high availability without having to worry about a single point of failure.

Official website: http://cassandra.apache.org/ help documentation: http://cassandra.apache.org/doc/latest/contactus.html

Current mainstream version: Apache Cassandra 3.11 Apache Cassandra 3.0

Apache Cassandra 2.2 Apache Cassandra 2.1

At present, no relatively new books on Cassandra have been found, and the authoritative guides to Cassandra practical Cassandra that can be found online are based on 0.6 and 0.7, which is very old compared to the commonly used versions, so when we learn Cassandra, the best way is to study the official documents.

1.1 comparison between Cassandra and Relational Database

The Cassandra relational database Cassandra is used to deal with unstructured data. RDBMS is used to process structured data. Cassandra has a flexible mode. RDBMS has a fixed pattern. In Cassandra, a table is a list of "nested key-value pairs" (row x column key x column values). In RDBMS, a table is an array of arrays (rows x columns). In Cassandra, keyspace is the outermost container containing the data corresponding to the application. In RDBMS, a database is the outermost container that contains the data corresponding to the application. In Cassandra, a table or column family is the entity of a key space. In RDBMS, a table is the entity of a database. In Cassandra, a row is a copy unit. In RDBMS, a row is a single record. In Cassandra, a column is a storage unit. In RDBMS, columns are attributes that represent relationships. In Cassandra, collections are used to represent relationships. In RDBMS, there are concepts of foreign keys, joins, and so on.

In relational databases such as MySQL, there are the concepts of tables and libraries, and the ways of creating libraries in different types of databases are different. Relational databases such as MySQL must first use create statements to create databases and table structures before inserting data, while in Redis, according to the number of databases in the configuration file, several databases have been generated and only need to be switched with SELECT. MongoDB is a special database, there is no concept of tables is libraries and collections, in certain cases, do not need to create their own, you can directly insert data is very convenient. There is no concept of library in Cassandra, there are entities of keyspace and tables in it. Some usage methods are similar to relational databases such as MySQL, and in some places there is a big gap between the two.

1.2 comparison between Cassandra and HBase

HBaseCassandraHBase is based on Bigtable (Google) Cassandra based on DynamoDB (Amazon). It was originally developed by a former Amazon engineer at Facebook. This is one of the reasons why Cassandra supports multiple data centers. HBase uses Hadoop Infrastructure (Zookeeper,NameNode,HDFS). Organizations that deploy Hadoop must have knowledge of Hadoop and HBase. Cassandra and Hadoop are developed separately, and their basic tools and operational knowledge are different from those of Hadoop. However, for analysis, many Cassandra deployments use Cassandra + Storm (using zookeeper) and / or Cassandra + Hadoop. The HBase-Hadoop basic tool has several "mobile parts" made up of Zookeeper,Name Node,HBase master and data nodes. Zookeeper is clustered and naturally fault-tolerant. The name node needs to be clustered to be fault tolerant. Cassandra uses a single node type. All nodes are equal and perform all functions. Any node can act as a coordinator to ensure that there is no Spof. Adding Storm or Hadoop certainly adds complexity to the infrastructure. HBase is ideal for range-based scanning. Cassandra does not support range-based row scans, which may be limited in some use cases. HBase provides asynchronous replication across a HBase cluster. Cassandra random partitions provide row replication across a single row. HBase only supports ordered partitions. Cassandra officially supports ordered partitioning, but Cassandra does not use orderly allocation for production users, which is caused by hot spots such as "hotspots" that are difficult to create and operate. Due to ordered partitioning, HBase can be easily placed horizontally, while also supporting Rowkey range scanning. If the data is stored in a column in Cassandra to support range scanning, the actual limit for row size in Cassandra is 10 megabytes. HBase supports atomic comparisons and settings, and HBase supports transactions within a row. Cassandra does not support atomic comparisons and settings. HBase does not support single-line read load balancing, and a row is provided by only one zone server at a time. Cassandra will support load balancing for single-line reads. The Bloom filter can be used with HBase as another form of index. Cassandra uses the bloom filter for key lookups. Triggers are supported by the coprocessor function in HBase. Cassandra does not support coprocessor functionality

In recent years, with the development of big data's technology and industrial chain, Hadoop, Spark, Storm and other technologies have developed rapidly. at the same time, many technicians related to big data are in short supply and their value has increased a lot, which is the envy of losers like me. And HBase is the pioneer and cornerstone in big data's storage field. Plays a very important role. But the volume of the overall architecture is indeed not small, and the overall architecture is much more complex than Cassandra, which virtually increases the complexity and maintainability of the system.

1.3 Internet companies using Cassandra

Abroad:

EBay:200+TB,400+M write, 100m read, application scenarios: Social Signals on the product details page, such as Like,Want,Own,Favorites, etc.; hunch taste graph; time series of users and products, such as mobile notification, anti-cheating, soa, monitoring, log services, etc.

Netflix: a large cluster of 288 to 96,000 60 instances, 1.1 million writes per second, 3 zone automatic replicas of AWS EC2 region in the eastern United States, for a total of 3.3 million writes per second

Apple:75000+ nodes, 10s of PBs,Millions ops/s, largest cluster 1000 + nodes

Domestic:

According to public information, there should be a cluster of at least 1500 servers. The reasons for choosing cassandra are as follows: small team staff, tight demand, choose open source projects; no single point, no center, suitable for online business; code is easy to understand, team members have code foundation; community is more active.

The specific usage of Hangzhou Tongdun technology is not clear, except that the underlying data storage architecture is mainly based on Cassandra, is a big data risk control, anti-fraud company, the development is very rapid.

2. Installation and practice

1. Environmental requirements

Installing CassandraPrerequisitesThe latest version of Java 8, either the Oracle Java Standard Edition 8 or OpenJDK 8. Toverify that you have the correct version of java installed, type java-version.For using cqlsh, the latest version of Python 2.7. To verify that you havethe correct version of Python installed, type python-- version needs the support of java8 and python2.7 according to the official website. Now many production environments already use CentOS7.X operating system, while CentOS7.X comes with python2.7. Let's check it ourselves. If you are missing python2.7 and java8, please install it yourself.

2. Common installation methods

Binary installation

Source code installation

Installation of package managers such as yum

Installation instructions page: http://cassandra.apache.org/download/

The binary installation method is simple and quick, and does not need to be compiled. After downloading the installation package, it is less dependent on the network.

3. Stand-alone installation test

Operating system: CentOS 7.1

Cassandra:Cassandra 3.11.1

Installation method: yum installation can surf the Internet

Yum source information:

/ etc/yum.repos.d/ cassandra.repo [Cassandra] name=Apache Cassandrabaseurl=gpgcheck=1repo_gpgcheck=1gpgkey= install sudo yum install cassandra startup service service cassandra start service boot start chkconfig cassandra on

There are many related contents of Cassandra. Then we will introduce the common operations on keyspace, table operations, additions, deletions, modifications and queries, daily monitoring, security and backup, high availability clustering and other related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report