Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Comparative Analysis of hadoop, spark, hive, solr, es and YDB in vehicle impromptu Analysis

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Since 2012, the Traffic Administration of the Ministry of Public Security has promoted the vehicle detection and control system (bayonet system for short) nationwide, by integrating and sharing local vehicle intelligent monitoring records and other information resources, a national vehicle detection and control system with horizontal networking and vertical connection has been established. To achieve a wide range of vehicle detection and control and early warning interception, vehicle trajectory, traffic flow analysis and judgment, key vehicle control, traffic violations screening and detection of vehicle-related cases and other applications. It plays an important role in solving hit-and-run cases, investigating and dealing with illegal car-related acts, public security prevention and control, anti-terrorism and maintaining stability.

With the continuous increase of networking units and access bayonets, the vehicle detection and control systems deployed in various provinces, cities and regions have accumulated a huge amount of passing data. Up to now, 32 provinces (autonomous regions and municipalities) across the country have completed the networking of arrest, inspection, distribution and control systems, with more than 50000 access ports and more than 200 billion motor vehicle traffic data. Take a medium-sized province and city as an example, 3 million pieces of passing information are collected every day, 1 billion pieces of passing information are collected every year, and more than 20 billion pieces of passing information will be gathered in the whole province every year. How to manage and make good use of such a huge amount of data has become a great challenge for all provinces and cities.

With the continuous expansion of the application of vehicle network and car bayonet, the continuous accumulation of vehicle data. It is a great test for the storage, processing and query of the original data, so we need a distributed computing platform with real-time processing and multi-dimensional query.

1. Decomposition of key requirements 1. Vehicle track query

It can query the status and track the order of the vehicle according to the input license plate number or through the fuzzy query of the license plate number. Passing record query, passing track query, foothold analysis, track playback.

two。 Geographical location retrieval

The longitude and latitude can be filtered quickly according to the longitude and latitude coordinates, such as specifying a coordinate to quickly delineate the vehicles within 10 kilometers around.

3. Multi-dimensional collision, multi-dimensional query

A dimensional query that requires five conditions, the most commonly used are time, terminal number, and type.

According to multiple dimensions, any combination of conditions can be filtered to carry out data collision.

Vehicle collision analysis can also be carried out according to multiple geographical coordinates.

4. Analysis on the Law of vehicle Travel

Can be according to a car, or a group of vehicles for statistical analysis, to understand the travel rules of vehicles, travel time, frequent entry and exit places.

5. Analysis of vehicles with abnormal travel rules

Select the identification of strangers / cars in a certain area. Identification of people / vehicles with abnormal travel rules.

6. Adjoint analysis

Human-vehicle trajectory fitting to determine whether there is substitute driving behavior, trailing, stalking identification.

7. Data collision analysis

Data collision can be carried out according to multiple geographical locations and time, and data collision analysis can be carried out according to serial time.

8. Key vehicle analysis

According to the statistics of the number of key vehicles such as passenger transport, dangerous goods transport and special vehicles in a certain area, the traffic law is found. Give early warning warnings to vehicles with abnormal driving time in the road section, key vehicles driving on this section for the first time, and passenger vehicles still on the road from 2 to 5 o'clock.

9. Statistical analysis of vehicle entry and exit

Mining statistics within a period of time in a certain area (can set central urban areas, prefectural and municipal areas, provincial and municipal areas, highways, etc.), in and out areas, main trunk roads frequently driving vehicles, "migratory birds" vehicles, the number of passing vehicles and classified statistics according to vehicle type and vehicle license issuing place.

2. Key technical capability requirements 1. Data size-number of data nodes

It can carry tens of billions of increments per day, and the data can be retained for a long time.

It also needs to support tens of billions or even hundreds of billions of data increments every day for the next three to five years.

Each data node can handle 2 billion of the data per day.

two。 Flexibility of query and statistics function

According to different manufacturers, models, there are often great differences in logic, and the different query logic of their business will also be quite different, so a query system requires very flexible and can handle complex business logic and algorithms, rather than some conventional simple statistics.

Can support complex SQL

When the business can not meet the needs, you can expand SQL, customize the development of new logic, udf,udaf,udtf.

To be able to support fuzzy retrieval

For mailboxes, mobile phone numbers, license plate numbers, web addresses, IP addresses, program class names, combinations of letters and numbers, and so on, the data will match incompletely, resulting in incomplete data search, missing and missing data due to participle, and there is a greater risk to the business in the scenario where exact matching is required for fuzzy retrieval.

Multidimensional analysis of multidimensional collisions

A dimensional query that requires five conditions, the most commonly used are time, terminal number, and type.

3. Retrieval and concurrency performance

Each query can return less than 100 pieces of data within 1 second, and the number of concurrency is not less than 200 (less than 6 nodes). The number of concurrency can be increased proportionally with the increase of the number of nodes.

4. Data Import and timeliness

There is a high requirement for the timeliness of data, which requires that a vehicle can reach the level of system searchability and analysis within minutes after generating data. The retrieval performance is very high, and the above typical requirements all require that the results and details can be returned within seconds.

Batch import using SQL mode should also support streaming import of kafka.

5. Stability-and single point of failure

Easy to deploy, easy to expand, easy to migrate data

Multi-data copy protection, hardware is not afraid of hardware damage

Service anomalies can be detected and recovered automatically, alleviating the pain that operators often need to get up in the middle of the night.

The system can not have any single point of failure, when there is a problem with a server, it can not affect the online business.

After tens of billions of data, there can be no frequent OOM, nor can there be node tuning.

When an exception occurs in the system, it can automatically detect the abnormal service and restart the recovery service automatically. It is not possible for the operation and maintenance staff to go to the computer room in the middle of the night to restart. The service needs to have the characteristics of automatic migration and recovery, which can greatly reduce the workload of operation and maintenance personnel.

It provides current-limiting control for import and query, overload protection control, and even lossy query and lossy services in extreme scenarios.

6. High sorting performance is required.

Sorting can be said to be the hard index of many log systems (such as sorting in reverse order of time). If a big data system cannot be sorted, basically the system belongs to an unavailable state, sorting can be regarded as a "rigid requirement" of big data system, no matter what big data uses is hadoop, or spark, or impala,hive, sorting is essential, sorting performance testing is also essential.

7. User interface

Try to use the SQL interface. If the program interface learning cost and access cost are high.

8. Facilitate the import and export of the surrounding system

Can be integrated with existing common systems such as hadoop,hive, traditional database, kafka, etc., to facilitate data import and export.

Support any dimension export of original data

You can either complete the table or filter the partial export.

Support the export of data after filtering through various combined calculations

You can combine multiple tables of Y with multiple tables of other systems after filtering calculation before exporting

You can import multiple data from one table to another.

Data can be exported to other systems (such as hive,hbase, database, etc.)

You can also import data from other systems into the current system.

Can be exported as a file or imported from a file.

You can stream import from kafka, or you can write plug-ins and export to kafka.

9. Data storage and recovery

Data cannot be stored on local disk, so it is difficult to migrate and recover.

1)。 There is no good speed control mechanism for disk reading and writing, and there is no good flow control mechanism for imported data, so it is impossible to control the flow. In the production system, disk speed control and flow speed control are necessary, and the business peak can not cause a great impact on the system, resulting in hang or hanging of disks.

2)。 Local hard disk local bad points, resulting in local data damage may not be recognized for the system, but for the index, even if it is only a reading exception of byte data, it will cause index pointer confusion, resulting in the loss of retrieval result data, or even the whole index is invalidated, but the local disk can not find and correct these errors in time.

3)。 The data is stored on the local disk, and once the local storage disk of nearly 20T is damaged, it needs to be restored from the replica before it can continue to serve, and the recovery time is too long.

To store data on HDFS

1)。 Based on HDFS to do disk and network to do read and write speed control logic.

2)。 Disk local bad points hdfs is equipped with crc32 check, there are bad points will be found immediately, does not affect the service, and will automatically switch to the data without bad points to continue reading.

3)。 If the local disk is damaged, HDFS automatically recovers the data without interrupting reading and writing or service interruption.

10. Data migration

Such a scheme cannot be adopted:

Praise the computer room relocation machine, can not let the operation and maintenance staff carefully index 1-to-1 replication, this relocation plan often takes several weeks, and is very error-prone.

In the process of migration, in order to ensure the consistency of the data, it is necessary to interrupt the service or interrupt the real-time import of the data, so that the static landing of the data is not allowed to change before the migration can be carried out. This solution has been interrupted for too long.

To adopt such a migration plan

1.hdfs automatically migrates data through balance.

two。 You can control the bandwidth traffic during the migration.

two。 The service is not interrupted during the migration, and the expansion and removal of hdfs machines have no effect on the service.

11. Increase the active and standby kafka

The KAFKA master / slave setting is used. When there is a problem with the master KAFKA, it will automatically switch to the slave KAFKA, which does not affect the online business.

twelve。 Scalability-early warning and online expansion

When there is a bottleneck in the system storage, it can alarm in time, and it is easy to expand the storage capacity and balance the data. The capacity can be expanded online when it is expanded.

13. System monitor and control

There is a mature system storage monitoring platform, which can monitor the running status of the platform in real time, and inform the monitoring personnel in time if there is a problem.

First, the existing solutions in the industry-analysis of advantages and disadvantages 1. Open source big data system solutions (Hadoop, Spark, Hive, Impala) data scale-number of data nodes √ is based on HDFS, the data can be infinitely expanded, and it is easy to store PB-level data. Query and statistics flexibility √ 1.SQL support is relatively complete.

two。 The integration with the surrounding system is very convenient, and the data import and export is flexible.

3. Support JDBC mode, can be seamlessly integrated with common report systems

Retrieval and concurrency performance × this kind of system is not designed for impromptu query and is more suitable for offline analysis. Generally speaking, the running time of a HiveSQL ranges from a few minutes to several hours. If it is tens of billions of dollars in scale, the data analysis time may reach several hours. According to the budget of the existing XX department, it may take several days. The fundamental reason is that this kind of system uses violent scanning. That is, if it is 10 billion pieces of data, it is also scanned from beginning to end, and the performance can be imagined.

There is almost no concurrency at all. It takes hours for a single concurrency.

Due to the characteristics of data import and timeliness × HDFS, the data delay is large, and the conventional applications are Tunable 1 data, that is, one day delay. Stability-No single point of failure with single point of failure √, relatively perfect sorting performance × violent sorting method. Tencent, the first in the industry, uses 512 machines with a response of more than 90 seconds. √ uses hive jdbc interface. At present, hive is the impromptu standard of big data SQL to facilitate the surrounding system.

Import and export of

Due to the hive interface is adopted in √, the ecosphere is developed based on the ecosphere, and it is very convenient to integrate with the surrounding ecosystem. A series of ecological tools are available to integrate data with common systems to store and restore the strengths of √ hadoop, hardware damage, automatic migration of tasks after machine downtime, no human intervention, and no service impact.

1. From the beginning of the design, Hadoop assumes that all the hardware is unreliable, once the hardware is damaged, the data will not be lost, and there are multiple copies that can recover the data automatically.

two。 There is a relatively complete scheme for data migration and machine expansion, with non-stop service and dynamic expansion.

Data migration √ adding active / standby kafka × hive can not connect kafka early warning and online expansion √ industry has a complete scheme system monitoring √ hdp has a complete scheme 2. Data scale of flow computing system (Storm, Spark Streaming)-number of data nodes √ data scale can expand with the node query and statistics function flexibility × can not view detailed data, can only see the summary results of a specific granularity, but the passing record cannot be calculated first, that is, it is impossible to predict which car may commit a crime and which car will have an accident, so it cannot be calculated. Retrieval and concurrent performance √ 1. Calculate the data that needs to be queried in advance, and directly access the pre-calculated results when querying, and the performance is very good.

two。 The pre-calculated result set is stored in HBase or traditional database, and the concurrency is better because the data scale is not large.

Data import and timeliness √ is very timely. Generally, it is imported in the way of message queue with Kafka, and the timeliness can be seen for several seconds. Stability-No single point failure with single point failure √, perfect sorting performance √ pre-calculation method, good sorting result in advance, good performance user interface × java interface, independent API, need to write programs similar to mapreduce to facilitate with the surrounding system

Import and export of

* it is difficult to develop an interfacing program independently for data storage and recovery. Machines damaged by √ will be removed automatically, migrated automatically, and the service will not be interrupted. Data migration √ data migration, expansion, and disaster recovery all have perfect solutions. The expansion of Storm requires a simple Rebanlance. Adding active and standby kafka √ can support early warning and online expansion of √. System monitoring √ has a complete scheme. 3. Full-text retrieval system (Solr, ElasticSearch) data scale-number of data nodes × 1. Typical usage scenarios are at the level of 10 million. If a large amount of memory is given, the amount of data can be in the hundreds of millions.

two。 The limit of memory in its own system, more than 10 billion will be a huge challenge-unless it is a machine with 512 gigabytes of memory, about 20 to 30, and the total amount of data is 10 billion, not 10 billion every day.

Query and statistics function flexibility × 1. It is born for the scene of search engine, and the analysis function is weak. Only the simplest statistical function can not meet the needs of complex statistical analysis of passing records, and can not support the functions of complex SQL, multi-table association, nested SQL and even custom functions.

two。 The integration with the surrounding system is troublesome, data import and export is too troublesome, even not feasible, the third party has SQL engine plug-in, but all are simple SQL, and because Merger server is a single-node problem, many SQL query performance is very low, do not have versatility.

3. It can not be integrated with the common report system that supports jdbc standard, so it is expensive to customize and develop.

4. For mailboxes, mobile phone numbers, license plate numbers, web addresses, IP addresses, program class names, combinations of letters and numbers, and other data will match incompletely, resulting in incomplete data search, missing and missing data due to participle, and there is a greater risk to the business in scenarios where exact matching is required for fuzzy retrieval.

5. It is realized based on lucene word segmentation, but does not consider the matching order of words, does not guarantee the continuity of matching words, and can be interspersed with other words.

Multi-column group by and statistics are not supported in 6.solr and es (because they cannot be crossed). The so-called implementation is Cartesian after a single-column group by and a query that is re-carried out according to each cell.

Retrieval and concurrent performance √ 1. The inverted index is used to locate the relevant records directly according to the index, without the need for full-table violent scanning, and the retrieval query performance is particularly high.

two。 Concurrency is good when the level is below 10 million and more memory is given.

Data import and timeliness × 1. Real-time import is supported, and the import performance is better on the scale of tens of millions of data.

two。 After the data exceeds 100 million, the real-time import of the production system will often have the problem of OOM and high CPU load, so the data can not be imported in real time. Generally, the systems with more than 10 billion data use offline indexing, that is, the timeliness of the data is delayed by one day.

3. Without a good merge control strategy, the system will have a phased (several minutes) extremely high load (index merging). At this time, the occupation of system resources is very high, and the response speed of the front desk query is very slow.

Stability-and single point failure × 1. Once the data scale exceeds 10 billion, OOM and node tuning will occur frequently.

two。 Once the service cannot be automatically restored after film adjustment, operation and maintenance personnel are required to restart the relevant services.

3. The system has no overload protection, and often a person makes a complex query, which leads to the overall downtime of the cluster and the collapse of the system.

In the process of index merging in lucene, every commit has to remap the full range of ord relations. When the data scale is small, the mapping of the whole index file is fine, but when the amount of data reaches 100 million, or even 10 billion, this mapping relationship will take up a lot of CPU, memory and hard disk resources, so when the amount of data exceeds 100 million, solr and Es will have relatively large data. Real-time indexing is almost impossible, and frequent ord relational mapping can make the entire system unavailable.

Sorting performance × sorting by violent full table traversal, the performance is poor, and the whole system is often paralyzed because of sorting.

Using the Sort interface of lucene, it is essentially with the help of the violent scanning of docvalues. If the amount of data is large, the sorting process consumes a lot of memory and IO, and the sorting takes a lot of time.

The user interface × adopts the way of java API, and the cost of user learning is high.

Because it is not a general communication protocol, it is troublesome to integrate and dock with other big data systems.

Convenience and peripheral system

Import and export of

X is difficult, and the docking program needs to be developed independently.

It is difficult for data to be exported to other systems, and exports of more than a million levels are basically not feasible, and there are no highly available export solutions.

It is basically impossible to export full data, let alone after multi-table complex operation.

Data storage and recovery × index is stored on local hard disk, so it is difficult to recover.

1. There is no good speed control mechanism for disk reading and writing, and there is no good flow control mechanism for imported data, so it is impossible to control the flow. In the production system, disk speed control and flow speed control are necessary, and the business peak can not cause a great impact on the system, resulting in hang or hanging of disks.

two。 Local hard disk local bad points, resulting in local data damage for lucene can not be identified, but for the index, even if it is only a reading exception of byte data, it will cause index pointer confusion, resulting in the loss of retrieval result data, or even the whole index is scrapped, but solr and es can not find and correct these errors in time.

3. The data is stored on the local disk, and once the local storage disk of nearly 20T is damaged, it needs to be restored from the replica before it can continue to serve, and the recovery time is too long.

Data migration × 1. If you praise the computer room relocation machine, the operation and maintenance personnel need to index 1-to-1 replication carefully, the relocation plan often takes several weeks, and is very error-prone.

two。 In the process of migration, in order to ensure the consistency of the data, it is necessary to interrupt the service or interrupt the real-time import of the data, so that the static landing of the data is not allowed to change before the migration.

Add field √ support, add master / slave kafka × no support, need separate business development to import api early warning and online capacity expansion × number of shards can not be changed at will. If you want to expand the number of shards, you need to rebuild all historical indexes, that is, the legendary reindex. In addition, the service cannot be automatically restored after problems occur. Need operation and maintenance personnel to go to the site to restore service system monitoring √ es itself has a paid version of the monitoring system 2. The final solution-Yanyun YDB hybrid solution integrates the advantages of multiple systems

In view of the above typical scenarios, we finally integrate multiple systems, give full play to the respective advantages of the system, enhance strengths and circumvent weaknesses, and integrate deeply. As an impromptu analysis engine for vehicle detection and control, Yanyun YDB has been successfully deployed or tested in more than 10 cities and achieved very good results, some even exceeding customer expectations.

YDB is a real-time, multi-dimensional and interactive query, statistics and analysis engine based on Hadoop distributed architecture. It has the statistical analysis capability of ten thousand dimensions in a trillion data scale, and has enterprise-level stable and reliable performance.

YDB is a fine-grained index, a precise-grained index. The data is imported immediately, the index is generated immediately, and the relevant data is located efficiently through the index. YDB is deeply integrated with Spark, and Spark directly analyzes and calculates the result set of YDB retrieval. The same scenario speeds up the performance of Spark a hundred times.

Yanyun recommended configuration

Yanyun YDB high performance configuration (millisecond response)

1. Machine memory: 128G

two。 Disks: enterprise SSD,600~800G * 12 disks

3.CPU:32 threads (2, 16 cores, 32 threads)

4. 10 Gigabit network card

General configuration of Yanyun YDB (second response)

1. Machine memory: 128G

two。 Disks: disks for 2T*12

3.CPU:24 threads (2, 12 cores, 24 threads)

4. Gigabit Ethernet

Index comparison data size-number of data nodes √ 1. At Tencent, we have achieved 53 machines to handle daily increments of 180 billion per day, with a total data scale of several trillions (about each data 1kb)

two。 The ordinary machine recommended by Yanyun

According to the sample data estimated, it is suitable for each node to process 30-5 billion of the data in real time every day.

The scale of data processed and the query response speed increase linearly according to the number of nodes.

Query and statistics function flexibility √ 1. Supports hive SQL expression, supports all hive built-in functions, can nest SQL, can associate multiple tables, or can customize UDF,UDAF

two。 The built-in word segmentation type will ensure the query accuracy, and there will be no omission. The built-in word segmentation type can well solve the problem of missing query data caused by lucene default word segmentation. In addition, YDB can customize and expand any type of luene word segmentation. Such as lexicon segmentation, semantic segmentation, Pinyin segmentation and so on.

3. Can support multi-dimensional query, multi-dimensional statistics, and analysis of any dimension.

Retrieval and concurrent performance √ normally supports concurrent queries of 20000,300, with continuous stress testing for more than 20 days.

But at present, there is no great concurrency in my real production system, and the largest concurrent system is 100 concurrent queries triggered by the system every 5 minutes, but there will be a 5-minute break after the query is completed.

Data import and timed √ data are generated in about 1-2 minutes, which can be checked in the system.

The daily increment of hundreds of billions can reach trillions of dollars.

Stability-with single point of failure √ 1. Using Spark Yarn method, the system is down, the hardware is damaged, the service will be migrated automatically, and the data will not be lost.

two。 There is a daemon that automatically restarts the service if an exception is found. There is no need for operation and maintenance personnel to go to the computer room to restart the machine.

3. Yanyun YDB only needs to be deployed on one machine and is automatically distributed by Yarn. There is no need to maintain the configuration of a bunch of machines, so it is easy to change the parameters. Easy to deploy, easy to expand, easy to migrate data

4. Multi-data copy protection, hardware is not afraid of hardware damage

5. Service anomalies can be detected and recovered automatically, alleviating the pain that operators often need to get up in the middle of the night.

The system can not have any single point of failure, when there is a problem with a server, it can not affect the online business.

After tens of billions of data, there can be no frequent OOM, nor can there be node tuning.

When an exception occurs in the system, it can automatically detect the abnormal service and restart the recovery service automatically. It is not possible for the operation and maintenance staff to go to the computer room in the middle of the night to restart. The service needs to have the characteristics of automatic migration and recovery, which can greatly reduce the workload of operation and maintenance personnel.

6. It provides current-limiting control for import and query, overload protection control, and even lossy query and lossy services in extreme scenarios.

7. We fixed a lot of spark's bug to make the system more stable than open source systems.

Http://blog.csdn.net/qq_33160722/article/details/60583286

Sorting performance √ uses Yanyun's unique BLOCK SORT technology to sort 10 billion data in seconds.

Please refer to the technical principle.

Http://blog.csdn.net/muyannian/article/details/60755273

The user interface √ adopts the way of SQL, and the user learns Chen Ben is low.

Support HIVE JDBC access (programming), can be command line access (scheduled tasks), http access.

Hive's JDBC agreement is already big data's de facto standard.

It can be seamlessly docked with conventional big data system (such as hive,spark,kafka, etc.), and also provides an extended interface.

The import and export of massive data is flexible and convenient, and it can also be integrated with common report tools and SQL visualization tools that support jdbc.

Convenience and peripheral system

Import and export of

√ export

Support any dimension export of original data

You can either complete the table or filter the partial export.

Support the export of data after filtering through various combined calculations

You can combine multiple tables in YDB with multiple tables in other systems after filtering calculation before exporting

You can import multiple data from one table in ydb to another table in YDB

You can export the data in YDB to other systems (such as hive,hbase, database, etc.)

You can also import data from other systems into YDB.

Can be exported as a file or imported from a file.

Import

Batch import using SQL, and streaming import of kafka is also supported

1. The design and implementation of the index will not load all the data into the internal memory for mapping like solr and es, which greatly reduces the risk of OOM either in the import or in the query process.

two。 Different merging strategies for multiple areas of memory and disk, combined with speed control logic, keep the performance occupied by imports within a certain range, make the system more stable, and minimize the situation that a few minutes generated by index merging occupy a large number of resources. disperse the occupation of resources, so that the query of front-end users is more stable.

3. Combining the advantages of storm streaming processing, using the way of docking message queue (such as kafka), the data can be found in ydb about 1-2 minutes after the data is imported into kafka.

Data storage and recovery √ stores data on HDFS

1.YDB based on HDFS to do disk and network do read and write speed control logic.

two。 Disk local bad points hdfs is equipped with crc32 check, there are bad points will be found immediately, does not affect the service, and will automatically switch to the data without bad points to continue reading.

3. If the local disk is damaged, HDFS automatically recovers the data without interrupting reading and writing or service interruption.

Data Migration √ 1.hdfs automatically migrates data through balance.

two。 You can control the bandwidth traffic during the migration.

two。 The service is not interrupted during the migration, and the expansion and removal of hdfs machines have no effect on the service.

Add master / slave kafka √ support, timing aggregation √ without service interruption during switching process is compatible with the characteristics of hive itself, and is suitable for timing calculation of a large number of data backend.

Built-in support to directly export the calculation results of ydb to oracle, there is no need to write a separate etl program.

Real-time aggregation √ is compatible with the characteristics of its own index and is suitable for scanning a small range of data.

Focusing performance has a lot to do with the number of hits and the number of machine disks. If 150 million records are selected from 300 billion raw data for statistical summary, the iops of 6 cluster Sata disks cannot achieve this performance, and ssd disks are required.

Early warning and online expansion of √ 1. Data is stored on HDFS, not on local hard disk. Capacity expansion, migration and disaster recovery are as stable and reliable as Hadoop.

two。 For kafka consumption delay and node downtime, there is an early warning mechanism, which can be seen on the moniter page.

The system monitors √ 1. There is a complete index monitoring system, which can monitor the running status of the cluster by real-time YDB.

two。 Based on the early warning system with storage, a notification alarm will be issued when a problem occurs.

3. If the machine is abnormal, the warning alarm will also be displayed.

4. You can customize the alarm logic

Examples of common SQL writing methods

5. Track query / key vehicle analysis description generally search for the track of a particular vehicle according to a license plate number. The XX system is used to track the criminal process of the suspect, or to analyze key vehicles. SQL/*ydb.pushdown ('- >') * /

Select hphm,kkbh,jgsj,jgsk,quyu from ydb_jiaotong where hphm= "Cloud NEW336" order by jgsj desc limit 10

/ * (') * /

Select hphm,jgsj,kkbh from ydb_oribit where ydbpartion = '20160619' and ((jgsj like' ([201604290000 to 201604290200]) 'and kkbh='10000000') or (jgsj like' ([201604290001 to 201604290301]) 'and kkbh='10000001') or (jgsj like' ([201604290002 to 201604290202]) 'and kkbh='10000002')

/ * (') * /

Select hphm,jgsj,quyu from ydb_jiaotong where ((jgsj like'([201604111454 TO 201604131554]) 'and quyu=' Anyi Road GE Xin') or (jgsj like'([201609041043 TO 201609041050]) 'and quyu=' Hengren Road Sun Star City') or (jgsj like'([201609040909 TO 201609040914]) 'and quyu=' Huanjiang Road Dayou Tianyuan'

/ * (') * /

Select hphm,jgsk,kkbh from t_kk_clxx where

Jgsk like'([20161201153315 TO 20161204153315])'

And kkbh IN ('320600170000089780',' 320600170000000239',''320600170000000016,' 3206001700000018, '3206001700002278,' 320600170000092977, '320600170000092097,' 320600170000092117')

/ * (') * /

Select jgsk,jgsj,quyu from ydb_jiaotong where hphm=' black NET458'

/ * (') * /

Select hphm,kkbh from ydb_jiaotong where hphm_search='NEW33'

/ * ('

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report