Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the brief introduction and usage of Aerospike

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

What is the brief introduction and how to use Aerospike? I believe many inexperienced people don't know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Write at the forefront

After working for a few years, I have not systematically studied, and many things do not have complex work scene experience. I finally decided to sign up for a high-salary training camp for pull hook. I have really learned a lot here. I have also learned a lot after learning, and there are regular internal tweets and more opportunities, which have really helped me to improve. Special thanks to the gentle and lovely little bamboo head teacher and the serious responsible and handsome old Coke mentor for their help!

Aerospike introduction

Aerospike (AS for short) is a distributed, scalable key-value storage NoSQL database. T-level big data's highly concurrent structured data storage read and write operations are subtle, 99% of the responses can be implemented in a hybrid architecture within 1 millisecond, and the index can be stored in memory, while data can be stored on mechanical hard disk (HDD) or solid state disk (SSD) (also can be stored in memory) AS internal access SSD shields the file system level, direct access address, and ensures data reading speed. AS supports both secondary index and Client aggregation, and supports simple sql operation (aql). Compared with other nosql databases, it has some advantages.

Aerospike application scenario

As a high-capacity NoSql solution, Aerospike has not been widely used in domestic factories. It is suitable for scenarios where the capacity requirement is relatively large and the QPS is relatively low. At present, it is mainly used in the Internet advertising industry (abroad).

Personalized recommendation advertising application

Personalized recommendation factory announcement is based on and mastering consumers' unique preferences and habits, and makes accurate advance or guidance to consumers' purchase needs, in the right position and at the right time, to present advertisements that are highly consistent with their needs to consumers in an appropriate form, so as to promote users' consumption behavior.

After the user behavior log collection system collects the log, it is pushed to ETL for data cleaning and conversion. The data after ETL is sent to the recommendation engine to calculate the recommendation result of each consumer. The recommendation logic includes rules and algorithms. The specific rules include the user's recent browsing, adding shopping cart, adding collection, and so on.

The algorithm includes commodity similarity, user similarity, text similarity, picture similarity and so on. The results of the recommendation engine are stored in the Aerospike cluster and provided to the advertising engine for real-time acquisition.

Application of real-time bidding advertising

When a user browses a site that joins SSP (supply-side platform), SSP will send the request to AD EXCHANGE (advertising trading platform), and then ADX will send the request to several DSP,DSP (demand-side platform) according to its own DMP (data management platform). By bidding on the understanding of the secondary user, the winning DSP will get the opportunity to show the advertisement. The key to the success of DSP bidding (RTB: real-time bidding) is that DMP can analyze and locate user attributes according to users' historical browsing and other data, in which UserProfile (user profile) is a very important part of real-time bidding advertising. Analyze the log offline and in real time through HDFS and HBASE respectively, and then put the tags of the user portrait (tag: programmer, otaku.) The result is stored in the high-performance Nosql database Aerospike, and the data is backed up to the remote data center at the same time. The front-end advertising request reads the corresponding user profile data from the user profile database through the decision engine (delivery engine), and then bids according to the bidding algorithm. After the success of the bidding, the advertisement can be displayed. After the successful bidding, what kind of advertisement is shown to the user is completed by the personalized recommendation advertisement mentioned above.

Comparison between Aerospike and Redis

Aerospike is the data storage of NoSQL, Redis is cache Aerospike is multithreaded, while Redis is single-threaded Redis requires developers to manage shards and provide sharding algorithms to balance data between shards; client: hash consistency hash codis: agents handle sharding RedisCluster: hash slots while AerospikeDB can automatically handle the equivalent of sharding In Redis, in order to increase the throughput, it is necessary to increase the number of Redis fragments, reconstruct the slicing algorithm and rebalance the data, which usually requires downtime

In AerospikeDB, data volumes and throughput can be dynamically increased without downtime, and AerospikeDB can automatically balance data and traffic; in Redis, if replication and failover functions are needed, developers need to synchronize data at the application layer; in AerospikeDB, you only need to set replication factors, and then AerospikeDB can complete synchronous replication operations to maintain instant consistency; and AerospikeDB can transparently complete failover Redis runs in memory, AerospikeDB stores indexes in memory, data is stored in HDD, SSD, or in memory

Aerospike architecture

Aerospike is divided into three layers: the Client layer: access to data in Aerospike Server. Including CRUD, batch operations and queries based on secondary indexes is a "smart" client that supports most of the mainstream languages such as Cramp +, Java, GoLang, Python, C #, Php, Ruby, JavaScript and so on. The tracking node senses the location of the data storage and immediately senses the cluster configuration change when the node starts or stops. Efficient, stable, and internal connection pool Distribution layer:

Responsible for managing the balanced distribution of data, backup, fault tolerance and data synchronization between different clusters. It mainly consists of three modules:

Cluster Management Module is used to track cluster nodes. The key algorithm is the Paxos-like unanimous voting process to determine which nodes are part of the cluster. Aerospike implements specialized heartbeat detection (active and passive) to monitor connectivity between nodes.

Data Migration Module this module ensures data redistribution when nodes are added or removed, ensuring that each data block is replicated across nodes and data centers according to the replication factor configured by the system.

Transaction Processing Module ensures the consistency and isolation of read and write, and the write operation first writes the copy to the main library. The module includes: Sync/Async Replication (synchronous / Asynchronous replication): to ensure write consistency, the update is propagated to all copies before the data is submitted and the results are returned to the client. Proxy (proxy): the client may expire briefly during cluster reconfiguration, and the transparent proxy requests to other nodes. Duplicate Resolution (replica resolution): resolves conflicts between different data replicas when the cluster recovers from the active partition.

Data layer: responsible for data storage, Aerospike belongs to the key-value database with weak syntax. The data storage mode is as follows:

Namespace namespace. The data is stored in the namespace, which is equivalent to database in RDBMS. A maximum of 32 can be set. A namespace can be associated with multiple SSD, and a SSD can be associated with only one namespace. Each namespace contains 4096 fragmented set sets, similar to database tables. A namespace can have up to 1023 set record records, similar to a row of records in a database. Using weak syntax (Schema-Less), bin can add or decrease key primary keys at any time, similar to the primary keys in the data table.

Bin is similar to database fields and supports Java basic data types: List, Map, Blob, up to 32767 bin indexes under one namespace: Aerospike Index contains primary index (Primary Index) and secondary index (Second Index). The index is stored in memory and is not stored in the hard disk. The primary index (Primary-Index) is located in the primary key (key). The hybrid secondary index (Secondary-Index) using hash tables and red-black trees is on a non-primary key, which allows one-to-many relationships. Hybrid disk with hash table and B-tree: Aerospike can directly access raw blocks (original block) direct addressing in the hard disk, and specially optimizes Aerospike's minimal read, large block write and parallel SSD to increase response speed and throughput.

Installation of Aerospike using Aerospike

1. Download aerospike-server-community-5.0.0.7-el6.tgz from the Internet

Wget https://www.aerospike.com/download/server/latest/artifact/el6

Download the latest version of aerospike-server-community 2, extract aerospike-server-community-5.0.0.7-el6.tgz

Tar-zxvf aerospike-server-community-5.0.0.7-el6.tgzmv aerospike-server-community-5.0.0.7-el6 aerospike-server

3. Install aerospike-server and aerospike-tools

Cd aerospike-server./asinstall

4. Verify that the installation is successful

[root@192 aerospike-server] # yum list installed | grep aerospikeaerospike-server-community.x86_64 5.0.0.7-1.el6 installedaerospike-tools.x86_64 3.26.2-1.el6 installed# uninstall aerospike [root@localhost ~] # rpm-eaerospike-server-community.x86_64 [root@localhost ~] # rpm-eaerospike- tools.x86_64 [root@localhost ~] # rm-rf / etc/aerospike/

5. Aerospike-server start, stop, restart, status

Systemctl start aerospikesystemctl stop aerospikesystemctl restart aerospikesystemctl status aerospike

6. Aerospike-server management: asadm

Asadm goes to the management end Admin > infoAdmin > I net

7. Aerospike-server operation: aql

Aql > show namespaces+-+ | namespaces | +-+ | "test" | | "bar" | basic concept of Aerospike

Namespace policy container, similar to database in RDBMS, can set the number of copies, memory size, validity period, storage pilot, and file storage location. Sets is similar to a table in RDBMS. Records is similar to the row Bin in RDBMS, similar to the column in RDBMS, and can have multiple bins in a row.

Data manipulation of Aerospike-primary key bins insertion can be different from INSERT INTO [.] (competition,) VALUES (,) DELETE FROM [.] WHERE competition = is the namespace for the record. Is the set name for the record. Is the record's primary key. Is a comma-separated list of bin names. Is comma-separated list of bin values does not have update when insert competes with each other The data are modified Examples: INSERT INTO test.demo (foo, bar) VALUES ('key1', 123,' abc') DELETE FROM test.demo WHERE competition = 'key1' insert into test.user1 (PK,name,age,sex,address) VALUES (2pr) VALUES 21,' insert into test.user2 (pk,name,sex,age) values. 23)-- the competition is all 1, but the modification of the original record is insert into test.user2 (pk,name,sex,age) values (1) QUERY SELECT FROM diaochanzhong 18) QUERY SELECT FROM. SELECT FROM [.] WHERE = SELECT FROM [.] WHERE BETWEEN AND SELECT FROM [.] WHERE competition = SELECT FROM [.] IN WHERE = SELECT FROM [.] IN WHERE BETWEEN AND is the namespace for the records to be queried. Is the set name for the record to be queried. Is the record's primary key. Is the name of a bin. Is the value of a bin. Is the type of an index user wants to query. (LIST/MAPKEYS/MAPVALUES) can be either a wildcard (*) or a comma-separated list of bin names. Is the lower bound for a numeric range query. Is the lower bound for a numeric range query. Examples: SELECT * FROM test.demo SELECT * FROM test.demo WHERE Competition = 'key1' SELECT foo, bar FROM test.demo WHERE Competition =' key1' SELECT foo, bar FROM test.demo WHERE foo = 123 SELECT foo, bar FROM test.demo WHERE foo BETWEEN 0 AND 999 select * from test.user2 where name='zhaoyun'-No index is established Cannot query Error: (201) AEROSPIKE_ERR_INDEX_NOT_FOUND create index idx_1 on test.user2 (name) string select * from test.user2 where name='zhaoyun' +-+ | name | sex | age | address | +- -+ | "zhaoyun" | "M" | 21 | "beijing" | +-+ CREATE INDEX ON [.] () NUMERIC | STRING | GEO2DSPHERE CREATE LIST/MAPKEYS/MAPVALUES INDEX ON [.] () NUMERIC | STRING | GEO2DSPHERE Client of CREATE INDEX idx_foo ON test.demo (foo) NUMERIC DROP INDEX test.demo idx_fooAerospike (Java)

Pom.xml introduces aerospike-client

Com.aerospike aerospike-client 4.4.9

API usage of aerospike-client

/ / IP+port AerospikeClient client=new AerospikeClient ("192.168.127.128", 3000); / / write policy WritePolicy wp=new WritePolicy (); / / timeout wp.setTimeout (1000); / * key * / Key k1=new Key ("test", "user1", 1); / * bins * / / KV Bin b11=new Bin ("name", "zhangfei"); Bin b12=new Bin ("sex", "M") Bin b13=new Bin ("age", 23); / write value client.put (wp,k1,b11,b12,b13); / / read value Record r1=client.get (wp,k1, "name", "age", "sex"); System.out.println (R1); System.out.println ("= ="); Key k2=new Key ("test", "user1", 2) / * bins * / / KV Bin b21=new Bin ("name", "diaochan"); Bin b22=new Bin ("sex", "F"); Bin b23=new Bin ("age", 21); / / write value client.put (wp,k2,b21,b22,b23); / * get data for the specified key * / / batch execution policy BatchPolicy bp=new BatchPolicy (wp) / / key's array Key [] ks= {K1 ~ K2}; / / Loop output for (Record r:client.get (bp,ks)) {System.out.println (r);} Aerospike cluster implements Aerospike cluster management

The cluster handles node membership and ensures that current members are consistent with nodes in all clusters. Including: cluster view, node discovery and view change

Cluster view

Each Aerospike node automatically assigns a unique node identifier, which consists of a MAC address and a listening port. Including: cluster_key: is a randomly generated 8-byte value that identifies an instance of the cluster view. Succession_list: is a collection of unique node identifiers that are part of a cluster.

Node discovery

Between nodes to detect the validity or failure of nodes by heartbeat messages each node in the cluster maintains an adjacency table, which is a list of other nodes that have recently sent heartbeat messages to that node. If the heartbeat times out, the node fails. Removing Aerospike from the adjacency table also uses other information as a backup secondary heartbeat mechanism. Each node in the cluster evaluates the health score of each neighboring node by calculating the average message loss.

View change

Once the pickup table changes, it triggers to run a Paxos consensus algorithm to determine a new cluster view. Aerospike treats the node with the highest node identifier in the adjacency table as Proposer, and assumes the role of Proposal to generate a new cluster view. If accepted, the node begins to redistribute data (Rebalence).

Aerospike data distribution

Aerospike has an intelligent partition algorithm, that is, according to the RIPEMD-160 algorithm, the key entered by the user is re-hash a key and take the first 20 bits, and then distribute the data to each node in a relatively balanced manner. And satisfy the configuration of the namespace configuration file (for example, how many backups are saved, whether they are on disk or in memory). Each Digest Space (namespace) is divided into 4096 non-overlapping partitions, which is the smallest unit of Aerospike data storage

As above, in a 4-node cluster, each node stores the master node of the 1amp 4 data, as well as a copy of the 1max 4 data. If Node 1 is not accessible, a copy of Node 1 is copied to another node. The replication factor (replication factor) is a configuration parameter that cannot exceed the number of cluster nodes. The more copies, the higher the reliability. The higher the write request that must go through all copies of the data. In practice, most deployments use a data factor of 2 (a master data and a copy). Synchronous replication ensures instant consistency and no data loss. The write transaction is propagated to all copies before committing the data and returning the results to the client. After the master success and backup success, the client thinks that it is successful. During the cluster reconfiguration, when the Aerospike intelligent terminal sends a request to those outdated error nodes, the Aerospike intelligent cluster will transparently proxy the request to the correct node.

Aerospike cluster configuration and deployment

There are two ways to build clusters: Multicast Multicast (UDP) and Mesh Grid (TCP) Multicast Multicast (UDP)

Heartbeat {mode multicast multicast-group 239.1.139.1 port 3000 address 192.168.127.131 interval 150 timeout 10}

Udp is an unreliable protocol, so it may cause nodes to fall off, and the network may not support Multicast Mesh Grid (TCP).

Heartbeat {mode mesh # add current node address here address 192.168.127.131 port 3000 # add all cluster node address here mesh-seed-address-port 192.168.127.131 3002 mesh-seed-address-port 192.168.127.128 3002 interval 150 timeout 10} cluster deployment

After installing Aerospike on 192.168.127.128, modify the configuration file / etc/aerospike/aerospike.conf

Vim / etc/aerospike/aerospike.confservice {user root group root paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1. Pidfile / var/run/aerospike/asd.pid proto-fd-max 15000} logging {# Log file must be an absolute path. File / var/log/aerospike/aerospike.log {context any info}} network {service {address any port 3000 access-address 192.168.127.128 3002} heartbeat {mode mesh address 192.168.127.128 port 3002 # all cluster mesh-seed-address-port 192.168.127.128 3002 mesh-seed-address-port 192. 168.127.131 3002 # To use unicast-mesh heartbeats Remove the 3 lines above, and see # aerospike_mesh.conf for alternative. Interval 150 timeout 10} fabric {address any port 3001} info {address any port 3003}} namespace test {replication-factor 2 memory-size 256M storage-engine memory namespace bar {replication-factor 2 memory-size 256M storage-engine memory}

After installing Aerospike on 192.168.127.131, modify the configuration file

Public class GuavaDemo {service {user rootgroup rootpaxos-single-replica-limit 1 # Number of nodes where the replica countis automatically reduced to 1.pidfile / var/run/aerospike/asd.pidproto-fd-max 15000}} logging {# Log file must be an absolute path.file / var/log/aerospike/aerospike.log {context any info}} network {service {address anyport 3000access-address 192.168.127.131 3002} heartbeat {mode meshaddress 192.168.127.131port 3002#all clustermesh-seed-address-port 192.168.127. 131 3002mesh-seed-address-port 192.168.127.128 3002# To use unicast-mesh heartbeats Remove the 3 lines above, andsee# aerospike_mesh.conf for alternative.interval 150timeout 10} fabric {address anyport 3001} info {address anyport 3003} namespace test {replication-factor 2memory-size 256Mstorage-engine memory} namespace bar {replication-factor 2memory-size 256Mstorage-engine memory} Aerospike Cluster access Host [] hosts = new Host [] {new Host ("192.168.127.128", 3000), new Host ("192.168.127.131", 3000)} ClientPolicy policy = new ClientPolicy (); AerospikeClient client = new AerospikeClient (policy, hosts); / / write policy WritePolicy wp = new WritePolicy (); / / timeout wp.setTimeout (500); Key key1 = new Key ("test", "SUser", "11"); Bin bin11 = new Bin ("name", "zhangfei-c"); Bin bin12 = new Bin ("age", 25) Bin bin13 = new Bin ("sex", "Mmurc"); client.put (wp, key1, bin11, bin12, bin13); Key key2 = new Key ("test", "SUser", "22"); Bin bin21 = new Bin ("name", "zhaoyun-c"); Bin bin22 = new Bin ("age", 24); Bin bin23 = new Bin ("sex", "Mmurc"); client.put (wp, key2, bin21, bin22, bin23) Record R1 = client.get (wp, key1, "name", "age", "sex"); System.out.println (R1); after reading the above, have you mastered the brief introduction of Aerospike and how to use it? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report