In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
How to create a data architecture to promote innovation, this article introduces the corresponding analysis and answers in detail, hoping to help more partners who want to solve this problem to find a simpler and easier way.
The data architecture of the past can no longer meet today's demand for speed, flexibility, and innovation. The key to a successful upgrade (and a huge potential return) is agility.
Over the past few years, organizations have had to quickly deploy new data technologies on top of their existing infrastructure, driving market-driven innovations such as customized quotes, real-time alerts, and predictive maintenance.
However, the addition of technologies such as data lake, customer analysis platform, and flow processing have greatly increased the complexity of the data architecture, which often seriously hinders organizations from providing new functions. maintain existing infrastructure and continuously ensure the integrity of artificial intelligence models.
The current market dynamics should not be slowed down. Advanced companies such as Amazon and Google have been using artificial intelligence innovations to upend traditional business models, which requires laggards to rethink all aspects of their business to keep pace. Cloud providers have launched cutting-edge products (such as serverless data platforms that can be deployed immediately) that give adopters faster time to market and greater agility. Analytics (Analytics) users require more compatible tools (such as automated model deployment platforms) so that they can use the new model faster. Many organizations have adopted application programming interfaces (API) to enable data from different systems to access the data lake and quickly integrate insights directly into front-end applications. Today, as companies begin to study the unprecedented humanitarian crisis caused by the COVID-19 pandemic and prepare for the next normal operation, their need for flexibility and speed will only increase, not decrease.
For companies that want to enhance their competitive advantage (or even maintain an equal advantage), they must define, implement, and integrate data stacks in a new way, while leveraging cloud (in addition to infrastructure as a service) and new concepts and components.
Six transformations to create a disruptive data architecture
We found that companies are making six fundamental changes to their data architecture blueprints that can deliver new capabilities faster and greatly simplify existing architectural approaches. These shifts involve almost all data activities, including collection, processing, storage, analysis and disclosure. While organizations can implement some changes without any impact on their core technology stacks, many organizations still need to make careful architectural adjustments to existing data platforms and infrastructures. including a variety of legacy and relatively new technologies previously used.
Such work is not trivial. Investment in creating capabilities for basic use cases, such as automatic reporting, can run into tens of millions of dollars, while investment in architectural components that deploy excellent functionality, such as real-time services to compete with the most innovative disrupters, can run into hundreds of millions of dollars. Therefore, it is critical for organizations to have a clear strategic plan, and data and technology leaders must make bold choices to give priority to shifts that will directly affect business goals and to invest in architectures with moderate complexity. As a result, the blueprints for data architecture among companies tend to look very different.
If invested properly, the return on investment will be very generous (one American bank can earn more than $500 million a year, and an oil and gas company has achieved a profit margin growth of 12% to 15%). We find that these benefits come from everything: saving IT costs, increasing productivity, reducing regulatory and operational risks, and providing new features, new services, and even the business as a whole.
So what key changes do organizations need to consider?
1. From local data platform to cloud-based data platform
The cloud can be a disruptive driver of a new data architecture approach because it provides companies with a way to quickly expand artificial intelligence tools and capabilities to gain a competitive advantage. Major cloud providers such as Amazon Web Services (AMZN), Google Cloud Platform (GOOG), and Microsoft Azure (MSFT) have revolutionized the way organizations purchase, deploy and run data infrastructure, platforms, and applications on a large scale.
For example, a utility company combines a cloud-based data platform with container technology to modularize applications with micro-services, such as searching billing data or adding new attributes to accounts. This enables the company to deploy new self-service capabilities to about 100000 business customers in days rather than months. provide end users with large amounts of real-time inventory and transaction data for analysis and reduce costs by "buffering" transactions in the cloud rather than on more expensive local legacy systems.
Concepts and components that work
Serverless data platforms such as Amazon S3 and Google BigQuery allow organizations to create and run data-based applications indefinitely without having to install and configure solutions or manage workloads. Such products can lower professional barriers, reduce deployment time from weeks to minutes, and incur almost no operational overhead.
Due to the use of Kubernetes's containerized data solution (available through cloud providers and open source code, and can be quickly integrated and deployed), the company was able to separate and automate the deployment of other computing power and data storage systems. This feature ensures that highly complex data platforms are set up (for example, those required to retain data from one application session to another, and data platforms with complex backup and recovery requirements) can be extended to meet needs, so it is particularly useful.
two。 From batch processing to real-time data processing
The cost of real-time data communications and streaming media functions has been greatly reduced, paving the way for their mainstream use. These technologies enable a series of new business applications: for example, transportation companies can provide customers with accurate arrival time forecasts in seconds when taxis arrive; insurance companies can analyze real-time behavior data from smart devices to customize rates; and manufacturers can predict infrastructure problems based on real-time sensor data.
Real-time streaming features such as subscription mechanisms enable data consumers, including data marts and data-driven employees, to subscribe to a variety of "topics" so that they can get continuous updates of the transactions they need. The general data lake usually acts as the "brain" for such services, retaining all fine-grained transactions.
Concepts and components that work
Messaging platforms such as Apache Kafka provide fully scalable, persistent and fault-tolerant publish / subscribe services that can process and store millions of messages per second for immediate or future use. Compared to traditional enterprise communication queues, this can support real-time use cases, bypass existing batch-based solutions and take up less space (and less cost base).
Stream processing and analysis solutions, such as Apache Kafka streams, Apache Flume, Apache Storm, and Apache Spark streams, enable direct analysis of messages in real time. The analysis can be rule-based or can include advanced analysis to extract events or signals from the data. Analysis tends to integrate a large number of historical data to compare various patterns, which is particularly important in recommendation and prediction engines.
Alert platforms such as Graphite or Splunk can trigger various business actions to users, for example, if sales representatives fail to meet their daily sales targets, they are notified or integrated into existing processes that may run on ERP or CRM.
3. From pre-integrated business solutions to modular best-in-class platforms
To scale applications, companies often need to break through the limitations of the legacy data ecosystem provided by large solution vendors. Now, many companies are moving towards a highly modular data architecture that uses the best, frequently used open source components that can be replaced by new technologies as needed without affecting the rest of the data architecture.
The utility company mentioned earlier is transitioning to this approach to quickly deliver new, data-based digital services to millions of customers and access cloud-based applications on a large scale. For example, the company accurately shows customers' energy consumption every day and compares real-time analytical insights into peer consumption. The company has established a separate data layer that contains a variety of commercial databases and open source components. The data is synchronized with the back-end system through a proprietary enterprise service bus, while the microservices hosted in the container run business logic in the data.
Concepts and components that work
Data pipelines and API-based interfaces simplify integration between different tools and platforms by freeing data teams from different layers of complexity, shortening time to market, and reducing opportunities for new problems in existing applications. When requirements change, these interfaces also make individual components easier to replace.
Analytical workstations such as Amazon Sagemaker and Kubeflow simplify the creation of end-to-end solutions in highly modular architectures. Such tools can connect to a variety of basic databases and services and make highly modular designs a reality.
4. From peer-to-peer to disconnected data access
People can expose data through API, which ensures that the practice of directly viewing and modifying data is limited and secure, while giving people faster access to common data sets. This allows data to be easily reused (reused) between teams, which accelerates access and enables seamless collaboration between analysis teams, making it possible to develop various artificial intelligence use cases more efficiently.
For example, one pharmaceutical company is creating an internal "data market" for all employees through API to simplify and standardize the use of core data assets rather than relying on proprietary interfaces. The company will gradually migrate its most valuable existing data feeds (data feed) to an API-based structure within 18 months, while deploying an API management platform to showcase various API to users.
Concepts and components that work
Enterprises must create an API management platform (commonly referred to as an API gateway) to create and publish data-based API, enforce usage policies, control access, and measure usage and performance. The platform also allows developers and users to search for existing data interfaces and reuse them instead of creating new ones. The API gateway is usually embedded as a separate area within the data center, but it can also be developed as an independent function outside the center.
Enterprises often need a data platform to "buffer" various transactions outside the core system. Such buffers can be provided by a central data platform such as a data lake or in a distributed data grid, which is an ecosystem. It consists of the best platform (including data lake, data warehouse, etc.) created for the expected data usage and load of each business domain. For example, one bank created a column database (columnar database) to provide customer information (such as recent financial transactions) directly to online and mobile banking applications and to reduce the expensive workload on the mainframe.
5. From Enterprise Warehouse to Domain-based Architecture
Many leaders in charge of data architecture have moved from a central enterprise data lake to "domain-driven" designs that can be customized and "fit for a purpose" to reduce time to market for new data products and services. As a result of this approach, although the dataset may still reside on the same physical platform, the task of the "product owner" in each business area (for example, marketing, sales, manufacturing, etc.) is to organize the dataset in an easy-to-use manner, making it applicable to both users in the domain and downstream data users in other business domains. This approach needs to be carefully weighed so as not to become fragmented and inefficient, but it can reduce the time required to create a new data model in the data lake (usually from months to days). It can be a simpler and more effective option when reflecting joint business structures or complying with regulatory restrictions on data mobility.
One European telecom provider uses a distributed domain-based architecture, so sales and operations staff can provide data such as customers, orders, and bills to data scientists for artificial intelligence models or directly to customers through digital channels. Instead of creating a centralized data platform, the company deployed a variety of logical platforms managed by product owners in the company's sales and operations team. The company also encourages product owners to use data for analysis and to use digital channels, forums and hackathons to promote adoption.
Concepts and components that work
The data infrastructure as a platform provides a range of common tools and functions for storage and management to speed up implementation and obviate the need for data producers to create their own data asset platforms.
Data virtualization technologies, which began in niche areas such as customer data, are now adopted by companies to manage the use of distributed data assets and integrate them.
Data cataloging tools allow companies to search and study data even if they do not have full access or are not well prepared. The directory usually also provides metadata definitions and end-to-end interfaces to simplify access to data assets.
6. From a strict data model to a flexible, extensible data model
Predefined data models from software vendors and proprietary data models that meet specific business intelligence requirements are often created in highly standardized architectures (schema) with fixed database tables and data elements, thus greatly reducing redundancy. While this approach remains the standard for data submission and regulatory-centric use cases, it also requires organizations to go through a long development cycle and have extensive system knowledge when merging new data elements or data sources. because any changes can affect the integrity of the data.
In order to gain greater flexibility and competitive advantage when studying data or supporting advanced analysis, companies are moving towards a "schema-light" approach, in which they use non-normalized data models with fewer physical tables to organize data for high performance. This approach has many benefits-flexible data exploration, more flexible storage of structured and unstructured data, and reduced complexity. because data leaders no longer need to introduce other layers of abstraction, such as multiple "joins" between highly normalized tables, to query relational data.
Concepts and components that work
Data point modeling techniques, such as Data vault 2.0, ensure that the data model is extensible so that data elements can be added or removed in the future within a limited range of interruptions.
Graphic database is a kind of NoSQL database, which has attracted much attention in recent years. In general, NoSQL databases are ideal for digital applications that require a lot of scalability and real-time capabilities, as well as data layers that serve artificial intelligence applications because they can take advantage of unstructured data. In particular, graphical database provides the function of modeling the relationship between data in a powerful and flexible way. Many companies are using graphical database to create a master database to adapt to the ever-changing information model.
Technical services such as Azure Synapse Analytics enable people to access file-based data similar to relational databases by dynamically applying various table structures to various files. Users have the flexibility to continue to use a variety of common interfaces (such as SQL) when accessing data stored in files.
Use JavaScript object notation (JSON) to store information, which allows organizations to change the database structure without having to change the business information model.
How to start
Data technology is evolving rapidly, which makes the traditional work of defining the state of the target architecture for three to five years and working in this direction both risky and inefficient. As long as they develop practices that enable data and technology leaders to quickly evaluate and deploy new technologies so that they can adapt quickly, they will be better served. Here are four important practices:
Apply the way of thinking learned in testing to architecture creation and try to use a variety of components and concepts. This agile practice has been used in application development for a long time and has recently been used in the data field. For example, leaders can start with a small budget, create a minimum viable product or integrate existing open source tools to create a temporary product and put it into production (using the cloud to accelerate the process). So that they can demonstrate their value before they are expanded and further developed. Instead, leaders should not engage in lengthy discussions about the best designs, products, and suppliers in order to find the "perfect" choice, followed by lengthy budget approvals.
Establish a data "tribe", with a team of data managers, data engineers, and data modelers responsible for creating an end-to-end data architecture. These tribes are also committed to creating standard, repeatable data engineering processes and feature engineering processes to support the development of highly modelable datasets. These agile data practices help accelerate time to market for a variety of new data services.
Investing in data operations (DataOps, that is, enhanced DevOps for data) helps accelerate the design, development, and deployment of new components in the data architecture, so that teams can quickly implement and frequently update solutions based on feedback.
Create a data culture in which employees want to apply new data services within the scope of their duties. An important tool for achieving this goal is to ensure that data strategies are relevant to business goals and reflected in the messages that executives send to the organization, which helps to emphasize the importance of this work to the business team.
With the increasing application of data, analysis, and artificial intelligence in the daily operations of most organizations, it is necessary to take a completely different approach to data architecture in order to create and develop data-centric enterprises. it's obvious. Data and technology leaders who adopt this new approach can better position their companies as agile, resilient, and competitive in the future.
This is the answer to the question on how to create a data architecture to promote innovation. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.