In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
What is lightweight? Technical terms aside, the effect is to achieve operational goals, using lighter, more time-saving methods; so what is high performance? The most direct statement is that it is more efficient and faster than the common methods.
The following is an introduction to the lightweight, high-performance multidimensional analysis suite provided by Runqian.
The exact meaning of lightweight is a programming pattern relative to the heavyweight framework. The advantage of lightweight is that it has no dependence on containers, is easy to configure, is more general, has a shorter startup time, and can fully reduce development complexity, while high performance refers to a faster and more accurate way to obtain the desired results than common methods. Specific to the dry report, here will introduce you through the multi-dimensional analysis page to face the simple SQL query.
The process is simple: we enter a simple SQL statement on the multidimensional analysis page, submit it through the aggregator JDBC, and then execute a SQL query on the group table to return the results to the multidimensional analysis front end. The structure diagram is as follows:
Among them, the group table file is collected by the aggregator from various heterogeneous data sources and calculated. The specific method can be referred to the "aggregator tutorial-Group Table".
Because the group table file has the ability of independent computing, it can provide data source services for the front end without the database, so it is very suitable to be used as middleware to implement this set of olap suite. Next we will use the aggregator to do a test to further understand and use the group table file, and compare with oracle to more intuitively understand the lightweight and high-performance characteristics of this suite.
Test environment processor Inter (R) Core (TM) i5-6200U CPU @ 2.30GHz 2.40GHz 8G hard disk 1TB operating system Windows10 family Chinese version (64-bit)
The equipment used in the test is not high-end, but because the purpose of the test is to compare the group table and oracle, so the comparison between the data in the same environment, the equipment impact factors are small, performance comparison is still a valid reference.
Set Authorization
We use the aggregator to query and compare the data. To do this, we need to download the trial license of the aggregator with high-performance storage on the official website or dry college, and then select the license file in "tools-options-Environment". The results are as follows:
Generate test data
In order to fully reflect the test effect, the data of the test case must be large enough. We will create a datasheet with 50 million data pieces and at least ten fields.
First, generate a txt format file of the test data in the aggregator:
AB1=create (ID, product number, color, memory, expanded capacity, number of cores, home screen size, battery capacity, rear camera, front camera, body weight, warranty time, dual card dual standby, time to market, unit price, inventory)
2 [Black, White, Silver, Rose Gold, Tu Hao Jin] [0BZ 0.5, 0.75, 0.9, 0.9, 1] 3 [32, 128, 64, 16, 8]
4 [64,16,8,128,32]
62018-01-01=workdays (A6 Magazine A6jewelry 364) 7for 500=100000.new ((A7-1) * 1000000mm rand ()): product number, A2 (B2.pseg (rand ()): color, A3 (B2.pseg (rand (): memory, A4 (B2.pseg (rand (): extended capacity, A5 (B5.pseg (rand (): number of cores (string (round ((rand () * (6-5) + 5), 1)) + "inch"): main screen size, (string (int (round ((rand () * (4-3) + 3), 1) * 1000)) + "mAh"): battery capacity, (string (int ((rand () * (3-2) + 2), 1) * 1000)) + "10 / pixel"): rear camera (string (int (round ((rand () * (2-1) + 1), 1) * 1000)): front camera, (string (int ((rand () * (2-1) + 1), 2) * 100)) + "g"): body weight, (string (int (rand () * 3y1)) + "year"): warranty time, if (rand () file ("D:\\ test.txt") .export @ ta (B7)
The size of the generated txt file is 6284m.
Then we need to dump the mobile product table data in this txt into the aggregator group table file test.ctx with SPL language script. The SPL script is as follows:
A1=file ("D:\\ test.ctx") 2=A1.create (ID, product number, color, memory, expanded capacity, number of cores, home screen size, battery capacity, rear camera, front camera, body weight, warranty time, dual card dual standby, time to market, unit price, inventory) 3=file ("D:\\ test.txt"). Cursor@t () 4=A2.append (A3)
Test.ctx is a group table file, which is stored in columns by default and supports parallel computing in any segment, which can effectively improve the query speed. Note that when generating a group table, the data needs to be pre-sorted and dimension fields reasonably defined.
The size of the group table file is about 3312m, and the time required to dump the data from txt to the group table is 98s.
Accordingly, we use sqlloader to import data from txt into oracle, which takes about 18 hours, and the oracle table with the same field and amount of data takes up 6683m of space.
It can be seen that the time of importing txt data into the group table file and the oracle table is really very different, and the size of the space occupied by the data into oracle is about the same as that of the txt file, compared with half of the space occupied by the group table file, which benefits from the compression effect of the group table file. We will further discuss and analyze the features related to high performance later.
Query data
Let's compare the time it takes to query oracle tables and group table files with SQL, and speak directly in terms of effects:
Query the specified field:
Query group table: select ID, product number, inventory from test.ctx limit 5000
Query oracle:select ID, product number, inventory from myTestTable where rownum 2000
Query data with black color, inventory greater than 500 and unit price greater than 2000: query group table takes 92s; query oracle table takes 2353s
At this point, we can also try parallel queries:
Select / * + parallel (4) * / ID, product number, unit price, inventory from myTestTable/test.ctx where color = 'black' and inventory > 500and unit price > 2000
It takes 84s to query group table and 2105s to query oracle table.
After executing the above query statement, we can find that querying the group table is much faster than querying the oracle table directly. Of course, detailed queries are rarely used in multidimensional analysis, and the performance requirements for detailed queries are not high. We pay more attention to the performance of summary statistics, that is, in the case of GROUP BY, let's try it. :
Query group table / oracle:select color, max (inventory) from test.ctx/myTestTable group by color
Query the maximum inventory of each color of the product: 12s for the group table and 297s for the oracle table
Query group table / oracle:select color, memory, max (inventory) from test.ctx/myTestTable group by color, memory
Query the maximum inventory of each memory in each color of the product: 17s for querying group tables and 334s for querying oracle tables
Query group table / oracle:select / * + parallel (4) * / color, memory, max (inventory) from test.ctx/myTestTable group by color, memory
Try to query the maximum inventory of each memory in each color of the product in parallel: 13s for querying group tables and 308s for querying oracle tables
Query group table: select memory, sum (inventory) from test.ctx where time to market between date ('2018-01-01') and date ('2018-12-31') group by memory
Query oracle:select memory, sum (inventory) from myTestTable where time-to-market between to_date ('2018-01-01-01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01 / 01) / and to_date (2018 / 12 / 31) / from myTestTable where / group by
Query the total inventory of various kinds of memory in the products listed in 2018: 5s for querying group tables and 10s for querying oracle tables
Query group table: select memory, min (body weight), avg (unit price) from test.ctx where time to market between date ('2018-01-01') and date ('2018-12-31') group by memory
Query oracle:select memory, min (body weight), avg (unit price) from myTestTable where time to market between to_date ('2018-01-01-01-01-01) and to_date (' 2018-12-31) group by memory
Query the minimum fuselage weight and average unit price of products listed in 2018: 6s for query group table and 11s for oracle table
Query group table: select memory, min (body weight), avg (unit price) from test.ctx where time to market between date ('2018-01-01') and date ('2018-12-31') group by memory having avg (unit price) > = 1500
Query oracle:select memory, min (body weight), avg (unit price) from myTestTable where time to market between to_date ('2018-01-01-01-01-01) group by memory having avg (unit price) > = 1500
Query the minimum fuselage weight and average unit price of the products listed in 2018 with a unit price of not less than 1500: 14s for query group tables and 38s for oracle tables
There is no doubt that for summary statistics, querying group tables is also significantly faster than querying oracle.
According to the comparison of the group table with a large amount of data and the query speed of the database, it can be found that the less the query conditions are, the more obvious the efficiency comparison between them is, and the data of the query group table is always faster than that of the query database. This clearly reflects the characteristics of high performance in addition to group tables. From this, we infer that under the multi-dimensional analysis page, querying the group table is more efficient than querying the database.
Next, we will combine the group table with the analysis interface for query.
Combine with the analysis interface
(1) add the aggregator JDBC to the report and connect to the data source:
(2) copy the aggregator raqsoftConfig.xml to the classpath of the report WEB-INF:
Copy the raqsoftConfig.xml under [aggregator directory]\ esProc\ config to [report directory]\ report\ web\ webapps\ demo\ WEB-INF\ classes
(3) put the group table file under the aggregator addressing path:
In this example, we will use the group table file test.ctx.
(4) modify the multi-dimensional analysis page
Open [report Directory]\ report\ web\ webapps\ demo\ raqsoft\ guide\ jsp\ olap.jsp, and modify the DataSource in jsp to the data source name "esproc" set in (1). Still using the above example, here we use "select memory, avg (unit price) as average unit price average from test.ctx where time to market between date ('2018-01-01') and date ('2018-12-31') group by memory"
(5) visit the page
We start the server by double-clicking the startdemo.bat under [report directory]\ report\ bin, or by clicking on the Tomcat server in the report IDE.
Enter "http://localhost:6868/demo/raqsoft/guide/jsp/olap.jsp?sqlId=sqlId1"" in the browser address bar to access the page, and the page can display the data returned from the aggregator JDBC, and you can drag and drop and other operations in the page.
Of course, we still need to compare the time of querying group table files with that of querying oracle: it takes 7s to query group table files and 22s to query oracle, and the group table is obviously better.
By combining the group table with the multi-dimensional analysis interface and taking numbers from the group table rather than from the database, users can make reports with large amounts of data more conveniently, which greatly shortens the time waiting for data display; compared with the expensive professional database and the relatively closed BI data source, the aggregator can provide a more economical and simple solution, and can collect data from a variety of heterogeneous data sources to generate group table files for use. At the same time, the whole configuration process is very simple, which reflects the lightweight and high-performance characteristics of this suite.
Compared with the ordinary database scheme, the binary files stored in the aggregator, that is, group table files, can directly improve performance. When generating the group table, the dimension field is specified, and the data itself will be stored orderly according to the dimension field, so that the commonly used conditional filtering calculation does not rely on the index to ensure high performance. In addition, a lighter point is that the group table files are compressed and stored, which significantly reduces the hard disk space occupied and reads faster.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.