Open Source Nosql Database Cassandra3.0 practice-Cluster deployment and plug-in use 07/13 Update SLTechnology News&Howtos

Open Source Nosql Database Cassandra3.0 practice-Cluster deployment and plug-in use

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Brief introduction

Cassandra is an open source distributed NoSQL database system. The main feature of Cassandra is its centerless design. Its distributed cluster consists of a bunch of database nodes to form a distributed network service. A write operation to Cassandra will be copied to other nodes, and the read operation to Cassandra will also be routed to a node to read. For a Cassandra cluster, scaling performance is relatively simple, just add nodes to the cluster.

With the popularity of Nosql, Hbase and Mongodb have become the representatives of NoSQL database, while Cassandra is rarely used in China (it is said that Cassandra is used on a large scale). According to the Baidu index, the popularity of cassandra is far lower than that of mongodb and Hbase.

In foreign countries, according to the latest data of database rating website DB-Engines 16.10, cassandra has risen to seventh, ranking much higher than Hbase.

Advantages:

1. Flexible mode

With Cassandra, like document storage, you don't have to solve the fields in the record in advance. You can add or remove fields at will while the system is running. This is an amazing efficiency gain, especially on large deployments.

2. True scalability

Cassandra is a horizontal extension in a pure sense. To add more capacity to the cluster, you can point to another computer. You don't have to restart any processes, change application queries, or manually migrate any data.

3. Multi-data center identification

You can adjust your node layout to avoid a fire in a data center, and a backup data center will have at least a complete copy of each record.

4. Scope query

If you don't like all key value queries, you can set the range of keys to query.

5. List data structure

Supercolumns can be added to 5 dimensions in mixed mode. This is very convenient for each user's index.

6. Distributed write operation

You can read or write any data anywhere and at any time. And there will be no single point of failure.

Shortcoming

1. The reading performance is too slow.

The centerless design results in the inverse entropy calculation when reading the data, which leads to a great loss of performance, and even seriously affects the operation of the server.

two。 Data synchronization is too slow (the final consistency delay may be very large)

Because of the non-central design, it depends on each node to transmit information. Notify each other of the status. If there are multiple copies of the replica set and the node has downtime, the delay may be very large and the efficiency may be very low to achieve data consistency.

3. Using inserts and updates instead of queries is inflexible and all queries need to be defined in advance.

Unlike most databases optimized for read, Cassandra's write performance is theoretically higher than read performance, so it is very suitable for streaming data storage, especially where the write load is higher than the read load. Compared with HBase, its random access performance is much higher, but it is not very good at interval scanning, so it can be used as the real-time query cache of HBase, which is processed by big data in batch by HBase, and the interface of random query is provided by Cassandra.

4. Direct access to hadoop is not supported, and MapReduce cannot be implemented.

Now big data's pronoun is hadoop, as a framework for massive data does not support hadoop and MapReduce, it will be replaced. Unless Cassandra can provide other positioning, or massive data solutions. DataStax, which is refactoring HDFS's file system with Cassandra, doesn't know if it will succeed.

One: deploy cassandra

Planning:

Cluster node: 3

10.10.8.3

10.10.8.4

10.10.8.5

(1) configure jdk

10.10.8.3 、 10.10.8.4 、 10.10.8.5

$wget http://download.oracle.com/otn-pub/java/jdk/8u112-b15/jdk-8u112-linux-x64.tar.gz$ tar xf jdk-8u112-linux-x64.tar.gz-C / opt$ vim / etc/profile add export JAVA_HOME=/opt/jdk1.8.0_112export PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar$ source / etc/profile

(2) install Cassandra

10.10.8.3 、 10.10.8.4 、 10.10.8.5

$wget http://apache.fayea.com/cassandra/3.0.9/apache-cassandra-3.0.9-bin.tar.gz$ tar xvf apache-cassandra-3.0.9-bin.tar.gz-C / opt$ ln-s / opt/apache-cassandra-3.0.9 / opt/cassandra

(3) configuration

10.10.8.3 、 10.10.8.4 、 10.10.8.5

The simplified configuration of $copy conf/cassandra.yaml conf/cassandra.yaml.bak$ vim conf/cassandra.yaml#cassandra-3.0.9, which can run the minimum configuration of the cluster. Cluster_name:'My Cluster' # Cluster name num_tokens: 256seed_provider:-class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters:-seeds: "10.10.8.3 Magi 10.10.8.4 Magi 10.10.8.5 # Node ip list listen_address: 10.10.8.5 # process listener address storage_port: 7000 # Port number start_native_transport: true # enable native protocol native_transport_port: 9042 # client interactive port data_file_directories:-/ data/cassandra/dbdata # data location You can write multiple directories commitlog_directory:-/ data/cassandra/commitlog # commitlog to separate disk from data directory to improve performance saved_caches_directory:-/ data/cassandra/caches # cache data directory hints_directory:-/ data/cassandra/hintscommitlog_sync: batch # batch record commitlog In commitlogcommitlog_sync_batch_window_in_ms: 2 # batch mode, the interval of batch operation cache is # commitlog_sync: periodic # periodically record commitlog, every time there is data update, commitlog#commitlog_sync_period_in_ms: 10000 # periodic mode, refresh commitlog interval partitioner: org.apache.cassandra.dht.Murmur3Partitionerendpoint_snitch: SimpleSnitch

If you use the default configuration of cassandra, you only need to modify the following line. For other performance parameters, please refer to the official documentation.

10 cluster_name:'My Cluster'71 hints_directory: / data/cassandra/hints169 data_file_directories:170-/ data/cassandra/dbdata175 commitlog_directory: / data/cassandra/commitlog287 saved_caches_directory: / data/cassandra/caches343-seeds: "10.10.8.3 listen_address 10.10.8.5" 473 listen_address: localhost

(4) create the corresponding directory

10.10.8.3 、 10.10.8.4 、 10.10.8.5

$mkdir-p / data/cassandra/ {dbdata,commitlog,caches,hints}

(5) start the process

10.10.8.3 、 10.10.8.4 、 10.10.8.5

$/ opt/cassandra/bin/cassandra

Second: use of plug-in tools

(1) nodetool tool

Nodetool is a cassandra cluster and node management and information viewing tool.

1: view cluster status

$/ opt/cassandra/bin/nodetool statusDatacenter: datacenter1===Status=Up/Down | / State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns (effective) Host ID RackUN 10.10.8.3 304.71 KB 256 68.0% 64b6a935-caa6-4ed5-857b-70963e74a81d rack1UN 10.10.8.4 173.84 KB 256 65.3% Db77bd8a-2655-41c6-b13e-584cf44b8162 rack1UN 10.10.8.5 297.2 KB 256 66.7% 8fac64f8-1ed9-4ca3-af70-dee9ebcf77c2 rack1

2: current node status

$/ opt/cassandra/bin/nodetool infoID: db77bd8a-2655-41c6-b13e-584cf44b8162Gossip active: trueThrift active: trueNative Transport active: trueLoad: 173.84 KBGeneration No: 1478159246Uptime (seconds): 4554Heap Memory (MB): 297.65 / 7987.25Off Heap Memory (MB): 0.00Data Center: datacenter1Rack : rack1Exceptions: 0Key Cache: entries 14 Size 1.08KB, capacity 100MB, 110hits, 127requests, 0.866 recent hit rate, 14400 save period in secondsRow Cache: entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in secondsCounter Cache: entries 0, size 0 bytes, capacity 50 MB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in secondsToken: (invoke with-T/--tokens to see all 256 tokens)

3: shut down the process of cassandra

/ opt/cassandra/bin/nodetool stopdaemonCassandra has shutdown.error: Connection refused-- StackTrace-java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect (Native Method) at java.net.AbstractPlainSocketImpl.doConnect (AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress (AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect (AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect (SocksSocketImpl.java:392)

4: view the data details, read and write times, response time of each column, etc.

$/ opt/cassandra/bin/nodetool cfstatsKeyspace: system_traces Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Flushes: 0 Table: events SSTable count: 0 Space used (live): 0 Space used (total): 0 Space used by snapshots (total): 0 Off heap memory used (total): 0 SSTable Compression Ratio: 0.0 Number of keys (estimate): 0 Memtable cell count: 0 Memtable data size: 0 Memtable off heap memory used: 0 Memtable switch count: 0

(2) cqlsh command line tool

Cqlsh is the client command line tool of cassandra, which replaces cassandra-cli in the previous version and can add, delete, modify and query data and other column operations.

$/ opt/cassandra/bin/cqlshUsage: cqlsh [options] [host [port]] CQL Shell for Apache Cassandra

1: install python2.7 (dependent on python)

$yum install openssl-devel # prevents python from compiling without ssl modules, resulting in cqlsh being unavailable $wget https://www.python.org/ftp/python/2.7/Python-2.7.tgz$ tar xf Python-2.7.tgz$ cd Python-2.7$ mkdir / usr/local/python27 $. / configure-- prefix=/usr/local/python27 $make&&make install$ ln-s / usr/local/python27/bin/python2.7 / usr/bin/python2.7

If you encounter ImportError: No module named _ ssl, install openssl-devel, and then compile and install python

2: connect host

$/ opt/cassandra/bin/cqlsh 10.10.8.3 9042Connected to My Cluster at 10.10.8.3 CQL spec 9042. [cqlsh 5.0.1 | Cassandra 3.0.9 | CQL spec 3.4.0 | Native protocol v4] Use HELP for help.cqlsh > show version [cqlsh 5.0.1 | Cassandra 3.0.9 | CQL spec 3.4.0 | Native protocol v4] cqlsh > show hostConnected to My Cluster at 10.10.8.3 V4.

The 3:help command can see commands related to CQL data manipulation language.

Cqlsh > helpDocumented shell commands:==CAPTURE CLS COPY DESCRIBE EXPAND LOGIN SERIAL SOURCE UNICODECLEAR CONSISTENCY DESC EXIT HELP PAGING SHOW TRACINGCQL help topics:=AGGREGATES CREATE_KEYSPACE DROP_TRIGGER TEXT ALTER_KEYSPACE CREATE_MATERIALIZED_VIEW DROP_TYPE TIME ALTER_MATERIALIZED_VIEW CREATE_ROLE DROP_USER TIMESTAMPALTER_TABLE CREATE_TABLE FUNCTIONS TRUNCATEALTER_TYPE CREATE_TRIGGER GRANT TYPES ALTER_USER CREATE_TYPE INSERT UPDATE APPLY CREATE_USER INSERT_JSON USE ASCII DATE INT UUID BATCH DELETE JSON BEGIN DROP_AGGREGATE KEYWORDS BLOB DROP_COLUMNFAMILY LIST_PERMISSIONSBOOLEAN DROP_FUNCTION LIST_ROLES COUNTER DROP_INDEX LIST_USERS CREATE_AGGREGATE DROP_KEYSPACE PERMISSIONS CREATE_COLUMNFAMILY DROP_MATERIALIZED_VIEW REVOKE CREATE_FUNCTION DROP_ROLE SELECT CREATE_INDEX DROP_TABLE SELECT_JSON cqlsh >

references

Link: http://cassandra.apache.org/doc/latest/-official documentation

Link: http://jingyan.baidu.com/article/7e440953ec8a7e2fc0e2ef9b.html-advantages

Link: https://www.zhihu.com/question/19592244/answer/21430967-shortcomings

Connection: http://yikebocai.com/2014/06/cassandra-principle/-principle

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.