Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Big data era: how far can traditional BI go?

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Engaged in BI for many years, experienced the great construction of the business analysis system, the great development period, also had the honor to be in the alternation between big data and the traditional BI system, so in particular, how far can the traditional BI go?

传统BI还能走多远?

Technology serves the business, so instead of talking about technology here, we explain the reasons more from the perspective of users, and manage eight aspects, each of which is experienced by the author. Of course, any exhaustive method can not prove that it is absolutely correct, but I hope it can cause thinking.

1. Resource application-from month to day, do not whisper on the same day

Since the enterprise has three resource pools of MPP, HADOOP and stream processing to download, tenants take effect basically on a WYSIWYG basis. In order to facilitate the application, the company even launched a resource package, and we applied for a resource call package. This resource application model provides a basic guarantee for the flexible opening of data to the outside world. In half a year, internal and external tenants have opened more than 100 (previously called data Marts). In retrospect, without this ability, the company's external realization is basically impossible.

Whether it is Aliyun or AWS, it is this routine, but why do enterprises do it themselves? because the larger enterprise itself is a huge market, there are all kinds of application requirements, in terms of data, security, interface, technology and other aspects, are not suitable for external platforms.

In the minicomputer phase of the traditional BI, there is no concept of resource pool, and the resource declaration is calculated according to the number of hardware, and the budget needs to be applied in advance. Even if the hardware is in place, the integration time is too long. I remember that in the past, 11 data marts were planned for 11 prefectures and cities, and they were divided into 12 partitions with four 570s for more than a month.

System download in the resource granularity, application speed, resource dynamic expansion and other aspects of the traditional BI, in the rapid deployment of business has incomparable advantages, laid a good foundation for business innovation. If you have done DB2 project integration or something, and each time involves planning, delimiting, partitioning, installation, etc., you know what waiting is.

2. Data collection-diversity can create more application scenarios.

传统BI还能走多远?

The basic routines of traditional ETL are to export text from the source database, and then import it to the destination database through client tools, export with EXPORT, transfer with FTP, import with IMPORT, of course, the same type of database may use DBLINK and other shortcuts, the program uses ODBC to connect to the database to operate. Many companies have specially developed some tools for conducting data between multiple databases, of course, the general enterprise-level platform is not used, the scalability and flexibility are too poor. The traditional ETL technology is very suitable for static applications with days or months as the analysis cycle. download

I think in most enterprises, the cycle of data analysis is basically still days. I have been doing BI for 10 years. I remember that for a long time, enterprises used ETL data in monthly units. Of course, from a business point of view, it is enough. Some people will ask, how much practical significance is it that the cycle of data is reduced to hours, minutes, seconds and even real-time? But is it true that there is no need for shorter-cycle analysis in business? Is it because of the routine habits of BI analysis or the lack of ability?

From a numerical point of view, the business staff always want you to get the data as soon as possible. We originally only published monthly reports, but later, when the performance improved, complex dailies were also available, and dailies became standard. After dailies, should real-time become standard in the future?

From the perspective of application, in addition to a bunch of operating index reports, enterprises generally have practical needs for data from the perspectives of marketing and risk control. The effect of real-time marketing is obviously better than static marketing. BAT can not live without real-time marketing, and real-time risk control is obviously better than offline risk control. For example, anti-fraud system, if not real-time monitoring, how to intervene in the fraud? download

From a trend point of view, if you agree that the future world is a personalized world, then only real-time data can contain more information and give you more personalized services, and you will think of too many scenes that need to be collected in real time.

Even if you don't have any of the requirements mentioned above, technology and business will always interact. If you have the ability to provide by the hour, people will create hourly business scenarios, and if you have the ability to provide them in real time, they will create real-time business scenarios. It is not clear who is the egg and who is the chicken, but if you want to serve better, you should be more forward-looking on the technical level.

But can traditional BI support it? When the BI of traditional enterprises is not real, it is not that there is no demand, but it may be due to lack of ability. I remember that it was also very difficult for CRM to set up a real-time index monitoring on-line. In the past, there were only monthly reports, but now, if there is no daily newspaper, can you still live? I remember that many years ago, the first daily account report was made by the IT staff himself, because the ability was reached. What about the next 10 years? download

ETL is a concept in traditional data warehouse. I think it's time to upgrade. Diversified collection methods are king, which is the trend of the times. Three things are the most important. One is that a hundred flowers blossom in the collection mode, that is, messages, data streams, crawlers, files, and log increments can all support. Second, the flow of data is not one-way, not only E, but also X, that is, exchange, which greatly derives the connotation of ETL. Third, the data acquisition is distributed, which can be extended dynamically in parallel, and the problem of reading and writing can be well solved. These are exactly what traditional BI cannot do.

3. Computing performance-cost performance is the king, and the speed of change is faster than expected.

传统BI还能走多远?

DB2 and Teradata have been occupying a huge share in the field of data warehouse. It took us half a year to replace the two P780s with GBASE+HADOOP. The overall performance can be said to be 1.5 times that of the original, but the investment is only a few times. Although some tuning is involved in the early stage, there are higher requirements for the code, but the performance-to-price ratio is very high. The key is to be able to dynamically expand multi-tenants, and the disaster recovery capability is also super DB2. I remember that in the past, when there was a problem with the node in DB2, although it could also be switched, the performance was often reduced by half, which greatly affected the business. download

For different data processing methods are often treated equally, but in fact, different data processing stages, the requirements for data processing are structurally different, some simple transformation and summary, processing outside the database is more cost-effective than in-database processing, but the traditional BI is accustomed to importing all the data into the data warehouse, wasting precious minicomputer system resources and low performance-to-price ratio. Therefore, the current MPP+HADOOP mixed data warehouse is becoming a trend, HADOOP is good at massive and simple batch processing, MPP is good at data association analysis, such as eBAY, China Mobile and so on have adopted similar schemes.

From a comprehensive point of view, data warehouses such as DB2 certainly have their advantages, such as the proud stability, but these technologies rely too much on foreign countries, feel that the ability of operation and maintenance is getting worse and worse, the solution of key problems is more and more inadequate, and the word "stability" should also be put in a big question mark. I don't know how other enterprises feel. To believe that the author is not playing domestic GBASE advertising, there are many pitfalls, but it is worth having.

4. Report system-Aesthetic fatigue is inevitable, personalization is the trend to download

传统BI还能走多远?

Many commercial reporting systems have been used, such as BRIO, BO, BIEE and so on. The system provides a good visual interface and is good for the presentation of lightweight data, but I think this is not attractive for large enterprises.

First, the substitutability is too strong, now there are too many open source components and the functions are the same, so why should we use standardized bundled things? it seems unnecessary for companies with certain development capabilities.

Second, the open source is too poor, enterprises have a large number of personalized requirements, such as security control and so on, but the openness of these products is poor, often can not meet the requirements.

Third, it is not flexible, and then universal, can it do better than EXCEL? do not expect to extract a report directly from a report system and paste it into a report, which always requires secondary processing. Since this is the case, it is not as simple as pouring data directly into EXCEL.

Fourth, the speed is too slow, the current report is no longer the traditional BI sense of the report, because the dimension and granularity requirements are very fine, the results of more than 100 million records are not a few, such as our index database is 10 billion records a year, the traditional BI report simply can not support, good-looking is temporary, business personnel are always most concerned about the speed of the report.

Of course, it may still be attractive to small businesses, but in this open era, with an endless stream of demand and new technologies, can such standardized products catch up with changes? What if you want HBASE to combine with BIEE? Do you want to wait for the manufacturer to launch the version slowly, or just do it yourself?

5. Multi-dimensional analysis-poor adaptability, customization is the direction to download

I have used some commercial multidimensional analysis systems, also known as OLAP, such as IBM's ESSBASE. OLAP is a concept put forward by foreigners decades ago, and the desired results can be quickly obtained through dimensional analysis, but how practical is this OLAP?

OLAP products always want to solve a professional analysis problem by universal means, which has been hard since its birth, because the analysis is changeable, do you want to use SQL to do whatever you want in the background or to carry out fixed and complex multi-dimensional operations in the face of a rigid interface? The author, as a technical staff, does not like to use it, but the business staff do not like to use it, and the threshold for operation is on the high side.

In openness, the traditional OLAP background engine is still the traditional database, obviously does not support some massive big data system; playing CUBE is a design activity, very time-consuming, every time to update data to re-hit CUBE, always let the author crazy, do not know what improvement; 10 million-level data volume, 10 dimensions estimated is also its performance limit; finally, the previous CUBE can really solve your current analysis problem?

Taobao's data Rubik's cube illustrates the development direction of OLAP to a certain extent. To provide specific multi-dimensional data solutions for specific business problems, we need to provide users with a specialized system that OK in terms of experience, performance and speed. download

Business-oriented and customized background data solutions (such as various big data components) are the direction of OLAP in the future.

6. Mining platform-from the sample to the full, the equipment needs to be fully upgraded

传统BI还能走多远?

SAS and SPSS are the sharp tools of traditional data mining, but most of the time they can only carry out sampling analysis on PC. Obviously, big data's full analysis can not be borne, such as social network, time series and so on.

The traditional data mining platform does not seem to be able to come up with anything. IBM DB2 had a DATA MINER before, but later it gave up. Teradata can and has its own algorithm library, but its computing power is obviously inadequate in the face of massive data. It is a level inferior to big data's SPARK. Most of the partners we have come into contact with have begun to use SPARK as a standard suite of massively parallel algorithms. download

Even with traditional algorithms such as logical regression and decision tree, SPARK can obviously train based on more sample data or even full data, which is much better than SPSS,SAS only messing around on PC.

The SAS and SPSS of traditional BI are still valid, but the full algorithm based on big data platform should also be brought into the field of vision of BI.

7. Data management-if you don't keep pace with the times, you will die.

Data management system is very difficult to build, because without you, the production system will not die, it is also difficult to evaluate the value, and the cost of operation and maintenance is too high, accidentally fall into the problem of who serves whom. download

The first contact with the metadata management system was in 2006-2007. at that time, it was quite forward-looking to do metadata. I have been doing it for many years, but I understand a truth: if you treat metadata as a plug-in, this metadata system is not likely to succeed. This seemingly feasible method of post-recording, no matter how perfect the system is, how powerful the analytical ability of the system is. It will eventually lead to the phenomenon of two skins of source system and metadata, losing its due value.

As long as this problem is not solved, I seriously doubt the real success of traditional BI metadata management. In the era of big data, with the continuous enrichment of data volume, data types and technical components, it is even more impossible to do post-event metadata.

What does the data management system in the new era look like? As soon as we advocate production-management, that is to say, the rules of metadata management are fixed in the system production process in a systematic way, we advocate undocumented data development, because documents are metadata. All requirements for metadata have been combed into rules and become part of the data development environment. For example, when you build a table, when you give you a visual development interface, the definition of the table forces you to enter the necessary instructions online, and the code you write is regularized so that the metadata can be parsed automatically and become part of the data quality control. download

Second, to be able to evaluate the effectiveness of the data, through one means, the data can be associated with the application, and the value of the application can be transmitted to the value of the data, providing a standard for the value management of the data. The most depressing thing about doing data is that I created a model. But I do not know the value of this model, and my work has become dispensable. I do not know how to carry out optimization. Where hundreds of thousands of tables are rotten, I dare not clean them.

Third, cross-platform management, so many technical components, such as HADOOP, MPP, stream processing, etc., your management system should be able to seamlessly link up and transparent access, each new type of components, must be able to access the management system in time, otherwise, access to one, the data on this component will become free data, data management is impossible.

Data management, the most afraid of semi-pull sub-project, to be systematic, it must be done thoroughly, otherwise, it is not as good as documentation, there is no big difference.

8. Review and position-BI does BI's work, each performing his own duties

Traditional BI, too many reports, too few research platforms and algorithms, too much repetitive work, too little creative work, with the development of business, BI people gradually grow old, but there are not many things left in the system, very regrettable.

The era of big data has come, and this situation needs to be changed. it is time to re-examine my positioning. The number of reports is indeed the basic work of BI, but the person engaged in BI should not always play the role of a donkey, it should be the one who is finally at the helm. I can pull for a while, but I need to study how to pull faster, and finally let the machine take my place, or make Ramo's work very pleasant. Those who need it can pull it by themselves.

BI people have too much to innovate and learn. If there are too many fetching numbers, set up a counting robot, if too many reports, set up an index system, if too much demand, create a self-help tool or give a tenant environment, tempt business staff to do it by themselves, the demand is endless, the desire is never satisfied, rely on human flesh to fill the hole, never fill, need the guidance of BI people, give people to fish, it is better to teach people to fish.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report