Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Elasticsearch does not support transactions. Is there any good way to make up for it?

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Elasticsearch does not support any good remedy for transactions. in view of this problem, this article introduces the corresponding analysis and solutions in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible way.

1. Question

How does es work with hive or mysql? Is there any good remedy for transactions that es does not support?

2. The core concept of transaction

If a database claims to support transactional operations, the database must have the following four ACID features:

Atomicity (Atomicity)

Atomicity means that all operations contained in a transaction either succeed or fail to roll back

Consistency (Consistency)

Consistency means that a transaction must transform the database from one consistency state to another, that is, a transaction must be in a consistency state before and after execution.

Isolation (Isolation)

Isolation means that when multiple users access the database concurrently, such as operating the same table, the transactions opened by the database for each user cannot be disturbed by the operations of other transactions, and multiple concurrent transactions should be isolated from each other.

Persistence (Durability)

Persistence means that once a transaction is committed, the change to the data in the database is permanent, and the operation of committing the transaction will not be lost even if the database system encounters a failure.

To better understand ACID, take bank account transfers as an example:

1START TRANSACTION

2SELECT balance FROM checking WHERE customer_id = 10233276

3UPDATE checking SET balance = balance-200.00 WHERE customer_id = 10233276

4UPDATE savings SET balance = balance + 200.00 WHERE customer_id = 10233276

5COMMIT

Atomicity: either commit completely (10233276 checking balance minus 200 million) or roll back completely (the balance of both tables does not change)

Consistency: the consistency of this example is that 200 yuan will not go missing because the database system crashes after line 3 and before line 4, because the transaction has not yet been committed.

Isolation: allows operation statements in one transaction to be isolated from statements of other transactions, such as transaction A running after line 3 and before line 4, when transaction B goes to query the checking balance, it can still see the 200 yuan subtracted from transaction A (account money unchanged), because transaction An and B are isolated from each other. Transaction B does not observe a change in the data until transaction A commits.

Persistence: once a transaction is commit, changes to the data are persistent.

3. Elasticsearh does not support transactions

Some databases that support ACID data storage include: Postgres, SQLite, Oracle, MySQL (with InnoDB), and MongoDB (4.0 +), not including Elasticsearch.

The underlying technology of Elasticsearch is that Lucene,Lucene is an information retrieval technology that pursues speed rather than redundancy. Lucene has a completely different architecture that can provide extremely fast performance, but at the cost of being more vulnerable to data loss.

There are many ways to lose data, and you need to recreate it if necessary. Yes, Elasticsearch has a snapshot / restore feature, but this process is only partially restored in the event of data loss. Unless you have additional backup storage for data on other systems, updates between the latest snapshot and the interruption will be lost. Snapshot / restore does not help when the brain is split, because there is no mechanism to coordinate updates for each partition. Updates will be lost.

4. Scenarios supported by Elasticsearch

Data security scenario: ElasticSearch's shard supports replication, and multiple copies of data can be saved. If one machine dies, the data still exists on other machines, so you don't have to worry about losing it.

Access security scenario: with x-pack open source, ElasticSearch supports authentication, so you don't have to worry about unauthorized access. Or with the help of a third party search-guard and so on.

Migration features: ElasticSearch supports many plug-ins, and it is easy to import and export data between and other open source systems.

Data integrity: ElasticSearch supports preserving the original text of the data.

5. Scenarios not supported by Elasticsearch

Transactions are not supported, as mentioned earlier.

Similar to the complex multi-table association operations through foreign keys in the database, Elasticsearch inherently lacks support.

There is a certain delay in reading and writing, and the written data can be retrieved in 1 second at the earliest.

Real-time interpretation of the official website:

In Elasticsearch, this lightweight process of writing and opening a

New segment is called a refresh. By default, every shard is refreshed

Automatically once every second. This is why we say that Elasticsearch

Has near real-time search: document changes are not visible to search

Immediately, but will become visible within 1 second.

The default refresh rate setting is 1 second, which means that documents can be found from Index request to external visibility, at least 1 second. Compulsorily, your network and CPU can't be faster. This is a delay sacrifice made by Lucene to improve the throughput of write operations. Of course, this setting can be adjusted manually, but it is not recommended that you touch it, which will greatly affect search performance. Different applications have different definitions of real-time, depending on your needs.

ES is not a relational database, so if your data will benefit from foreign keys, etc., then ES is not a good choice for your main data store.

6. consideration of database selection for system design.

Which product is used as the data warehouse or master database storage depends entirely on the specific application scenario.

If the ability to obtain and analyze information is your primary need, then there is no doubt that Elasticsearch is a good choice.

If your data does not have frequent update operations and no transactional operations, then you can use Elasticsearch instead of other storage.

For the scenarios with high real-time requirements, it is necessary to combine the database with ACID characteristics and Elasticsearch.

The core issues of type selection are as follows:

7. How can database be used in conjunction with Elasticsearch?

Note at design time:

Each Elasticsearch index created should be supported by an ACID-compliant data store.

The database should be the real final source from which the index is populated.

If an exception occurs (node loss, interruption, or misoperation) that results in the loss of the index, you will be able to recover it completely.

The general usage is to have one in another database such as NOSQL and then synchronize it to ES in real time, one for key-value queries and one for various other queries. If the ES is upgraded, for example, the data structure has changed, then the old version of the data can not be used, and a copy of the NOSQL can be imported into the new version and can be restored.

Logstash's synchronization plug-in, such as logstash_input_jdbc, does not support synchronous deletion operation. It is recommended to update operation and mark flag instead, or implement synchronous deletion operation through business logic.

Core operations:

Only retrieval fields are stored in ES, which is convenient for fast retrieval and full-text retrieval.

All fields are stored in Mysql, taking advantage of the ACID transaction feature.

Establish an association through the associated fields, for example, news_id should have the same value in ES and mysql.

The core data is first quickly obtained from Id (such as news_id) through ES, and then queried twice by Mysql.

This is the answer to the question about how to make up for Elasticsearch's non-supporting affairs. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report