It is enough for data analysis. 37 scenarios tell you why. 04/17 Update SLTechnology News&Howtos

It is enough for data analysis. 37 scenarios tell you why.

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

[Report Query Performance] 1. Query performance is low due to large data volume or concurrency, and BI interface drag response is slow.

Write simpler and more efficient algorithms through aggregators to speed up the calculation process and improve query performance

It adopts the controllable storage and index mechanism of concentrator to provide high-speed data storage for BI (CUBE).

2. T +0 real-time full-volume query report involves a large amount of data, which affects the operation of production system, and it is difficult to implement cross-database mixed operation after database division.

Separate hot and cold data, store only current hot data in database, store cold data in file system or database, complete cross-source (database) calculation through concentrator, complete multi-source data summary and complex calculation, and realize T+0 full-volume data real-time query.

The aggregator provides basic SQL translation functions for different databases. After the data is divided into databases (isomorphism and heterogeneity are acceptable), cross-database queries can still be performed using general SQL.

3. Too many data association operations, dozens or even dozens of table JOIN, poor performance

The aggregator redefines the association operation, and can select different and efficient association algorithms according to the calculation characteristics to improve the performance of multi-table association.

One-to-many primary and foreign key tables can be pointer joined to improve performance

One-to-one same-dimensional tables and many-to-one sub-tables can be sorted to improve performance

4. Complex data source SQL, multiple nesting levels, uncontrollable database optimization path, low computing performance

Calculator adopts process calculation, which is implemented step by step to simplify implementation code, without nesting

Intermediate results can be reused in the process, with higher performance

5. Reports fetch large amounts from database, JDBC transport performance is low

Aggregators improve fetch performance by establishing multiple connections with databases through (multithreaded) parallel computation

Large amount of cold data can be stored in the file system outside the library in advance, and the concentrator can directly query and calculate based on the file, avoiding data retrieval through JDBC.

6. It is difficult to present large list reports in time, and the efficiency of page turning is very poor when using database paging mode.

The aggregator divides calculation and presentation into two asynchronous threads. The fetch thread issues SQL to cache data locally and hand it to the presentation thread to quickly present the report. In addition, the fetch thread only involves one transaction without data inconsistency, ensuring data accuracy.

[Report Query Development] 7. Report development is endless, taking up too much work for programmers, and no cost-effective solution can be found.

Aggregator helps report development to be completely tooled, not only the report presentation layer is tooled, but also the report data calculation layer is tooled, thus reducing the difficulty of report development and making report implementation faster and simpler.

Lower staffing requirements, no need for professional programmers

Report business instability leads to endless reports that are impossible to eliminate. Aggregators provide the lowest cost response.

8. Business people need a lot of data, agile BI doesn't work, and technology departments take time and effort to deal with these needs

As a complete calculation engine, the concentrator supports process calculation and development quickly.

The algorithm is simple and suitable for ordinary technicians

Provides visual programming environment, ready-to-use and easy to use

With multi-source support, direct calculation based on Excel/TxT/DB without warehousing

9. Data source SQL or stored procedures are too complex, nested or multi-step, debugging development is very difficult

The concentrator simplifies the difficulty of algorithm development through process calculation and step-by-step programming. The algorithm is short and step-by-step, which reduces the difficulty of maintenance. It greatly improves the difficulty of writing, debugging and maintaining thousands of SQL lines.

SQL (stored procedure) syntax involves database dialects and is difficult to transplant

As a general computing engine outside the database, the concentrator can write general algorithms that do not depend on the database. When the database changes, there is no need to change the core algorithm, which is easy to transplant.

11. Complex procedure operations are difficult to write with SQL, requiring a lot of external Java calculations, and development efficiency is low.

Aggregator provides complete structured data computing capabilities, solves the JAVA set of operational difficulties, no longer need to use JAVA programming

Aggregators are also easy to integrate into existing applications, integrating perfectly with applications

12. Java and SQL data source written separately from the report template storage, program coupling is too strong, but also difficult to achieve hot switch

Aggregators can be stored and managed together with report templates as a separate calculation layer for reports, and can be deployed separately from applications to reduce application coupling.

Explains the execution of a concentrator script that enables hot switching

13. Open source databases such as MySQL, window functions and many other high-level syntax are not supported, and development is difficult.

As a complete calculation engine, the concentrator provides rich structured data operation functions to improve the coding difficulties caused by MySQL's inability to use window functions.

14. Some databases, such as Vertica, do not support stored procedures well, making it difficult to implement complex procedures

As a complete structured data computing engine, the concentrator can act as a general out-of-library stored procedure, providing strong computing power and easy portability independent of the database.

15. It involves NoSQL data, text, Excel, etc., and SQL operations cannot be used.

Aggregators provide the ability to use SQL queries directly against files

You can also write scripts to read NoSQL, text, Excel data, and perform calculations with comparable or lower complexity than SQL

16. It involves real-time data such as Web or IOT, and has json/xml format to be processed. Importing database in advance is not only inefficient but also affects real-time performance.

Aggregator provides support for hierarchical data such as JSON/XML. Computing based on this type of data is not only simple to encode, but also has high performance and good real-time performance.

ETL development and performance 17. ETL tools cannot directly solve complex business logic, but also write a large number of scripts, and ETL tools script function is often weaker than SQL, development is difficult

Aggregator has strong computing power and is very good at complex calculations. It can assist or replace existing ETL tools to implement complex business logic. The implementation complexity is much lower than hard-coded ETL calculation scripts.

SQL (stored procedure) lacks debugging mechanism and development efficiency is low

Calculator supports process calculation, provides visual programming environment, calculation results of each step are what you see, and also provides editing and debugging functions such as setting breakpoints, single-step execution, execution to cursor, etc., with high development efficiency.

19. Stored procedure steps, code length, hundreds or even thousands of lines, extensive use of temporary tables, low performance and difficult to maintain

Compared with the storage process, which requires repeated reading and writing of disks, the concentrator provides rich calculations, greatly reduces the landing of intermediate results, and has higher performance.

Calculator adopts process calculation, provides rich function class library, and realizes short and easy maintenance algorithm.

Calculator scripts can be written and run independently of the database, reducing database security risks

20. It involves NoSQL, text, Excel and other data outside the database. SQL cannot be used. It can only be hard coded. The development efficiency is too low and difficult to maintain.

Aggregators provide query capabilities against files via SQL

Multi-source hybrid calculations can also be performed directly for NoSQL, text, and Excel, encoding efficiency is much higher than hard coding

21. Integration involving multiple databases and non-databases. SQL cannot be calculated across data sources. It needs to be summarized into a single database in advance. ETL is made into ELT and LET. The database is bloated and has poor performance.

As a complete computing engine, the concentrator can realize real ETL. Based on the multi-source hybrid computing capability, the multi-source data is first cleaned (E) and transmitted (T), and the sorted data is loaded (L) to the target database, avoiding excessive overhead in time, space and management caused by aggregation to a single database.

22. Complex operations can only be completed by Java development or UDF writing outside the library, and the labor cost is high

Calculator adopts process calculation, writes code step by step, provides rich class libraries and methods, is simple to develop and easy to maintain, greatly reduces coding difficulty and improves implementation efficiency

23. - The production library and the analysis library are together, and the large data operation may affect the operation of the production system, and it is difficult to achieve real-time and full-volume calculation by dividing the library.

The concentrator can perform mixed calculation based on the production library and the analysis library. Small real-time hot data can be checked from the production library to minimize the impact on the production system. Large historical cold data can be checked from the analysis library. The mixed calculation of the two parts of data realizes the real-time calculation of the full amount of data.

24. Data volume becomes larger, data warehouse performance becomes lower, always need to expand, high cost

The concentrator allows a large amount of historical data that no longer changes to be exported from the database to the file system storage. With the complete data computing capability of the concentrator, it directly calculates based on the file system and supports mixed calculation with the database, thus reducing the pressure of database expansion and low implementation cost.

25. The central data warehouse supports too many applications, too much concurrency leads to uncontrollable performance, and the front-end user experience is poor

Aggregators are easy to integrate and migrate part of the calculations and data in the data warehouse to the application layer. Data storage and calculations are implemented with the help of the computing power of the aggregators, and the pressure on the data warehouse is shared.

26. There are a large number of intermediate tables of non-original data in the data warehouse, which are redundant and difficult to manage for a long time.

The concentrator supports migrating intermediate tables of the database to file systems with higher I/O performance, reducing database redundancy. The concentrator directly calculates based on files, with higher performance. It is also convenient to implement parallel calculations, further improving efficiency.

The intermediate table adopts the tree structure of file system for classification management outside the database, which is better than the linear structure of database and convenient to manage.

27. Many business applications require the deployment of separate data marts or pre-existing databases, which is costly

The strong computing power of the concentrator + data cache + data gateway + multi-source mixed calculation can replace a single data mart or a pre-database at a low cost.

[Hadoop Big Data Platform] 28. The Hadoop cluster is small in scale, with only a few or a dozen nodes. The data managed is not much, so it is difficult to give full play to its advantages, but maintenance is very complicated.

As a lightweight big data solution, the concentrator is very suitable for clusters with several to dozens of nodes. Compared with hadoop, the concentrator has higher resource utilization and saves resources. The same calculation index requires less hardware, and the same hardware calculation efficiency is higher.

29. Hadoop struggles to perform the required computations, and as a result, traditional databases are deployed alongside to perform the computations, which are cumbersome and inefficient

Aggregators can use hadoop as a data source to perform calculations that are difficult for hadoop to complete.

At the same time, it supports real-time query, avoiding the problems of high ETL time cost, poor data real-time performance and high commercial RDB price cost caused by RDB deployment.

30. The computing interfaces provided by Hadoop/Spark are not enough, and UDFs have to be written for complex operations, resulting in low development efficiency.

The concentrator calculation engine has the characteristics of simple and efficient implementation of complex calculations. It is suitable for scenes where hadoop or spark is often needed to write UDF, which greatly improves development efficiency.

Hadoop/Spark storage and scheduling is too automated to control data distribution and task scheduling for optimal performance

The concentrator provides flexible data distribution and calculation distribution, and can implement personalized big data calculation according to data characteristics, calculation characteristics and hardware characteristics to obtain the highest performance. It solves the problem that hadoop/spark is too transparent to implement high performance calculation.

32. Spark consumes too much memory, hardware costs are too high, and many calculations exceed the memory range and cannot be implemented.

The concentrator provides both in-memory and out-of-memory computing methods. Due to the efficient computing model, memory computing is more efficient and memory utilization is lower, thus reducing costs.

When memory capacity is insufficient or full memory computing is not needed, the concentrator uses external memory computing to reduce dependence on memory capacity and lower hardware costs.

33. Try to solve big data query problem with HBase, but the effect is very poor

With the help of high-performance computing and high-performance data storage characteristics, the concentrator optimizes the system and solves the problem of low efficiency of batch query of KV database such as Hbase.

Python is not specifically designed for structured data computing. Open source package contributors are different, styles are not uniform, and complex processes are not simple to write.

The concentrator is designed for structured data calculation, supports procedural calculation, provides rich structured data set calculation functions, and provides a visual editing and debugging environment that is ready to use. It is very suitable for desktop data analysis.

35. Non-library data such as Excel/json, Python and other open source technologies are rich in interfaces, but the versions are chaotic and difficult to control.

The concentrator has complete data calculation capabilities. As a commercial software, it provides rich interfaces to process Excel/JSON and other non-database data. It is ready to use, avoiding the confusion and difficulty of using Python and other open source technologies.

36. Python lacks its own big data, can hardly write parallel programs, and cannot fully utilize multi-CPU power

The concentrator provides multi-threaded parallel computing and distributed computing capabilities. Parallel computing can be achieved through simple scripts, and high-performance computing can be implemented by fully utilizing multi-CPU core capabilities.

37. Python code is difficult to integrate with Java, and algorithms often have to be rewritten when they need to be embedded in production systems

Aggregators can be seamlessly embedded into applications as computing middleware, and scripts written for desktop data analytics can be ported directly to production systems without rewriting

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.