In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article shows you that CDW on Azure provides cost-effective and scalable case analysis, which is concise and easy to understand. I hope you can get something through the detailed introduction of this article.
The Cloudera data Warehouse (CDW) service is a managed data warehouse that runs a powerful engine for Cloudera on containerized architectures. It is part of the new Cloudera data platform, or CDP, which was launched on Microsoft Azure earlier this year. CDW services allow you to meet SLA, use new use cases with zero friction and minimize costs. Today, we are pleased to announce that CDW is fully available on Microsoft Azure. The service is available as part of CDP through the Azure Marketplace. There are often three situations when discussing data warehouses with our customers. Companies can never get what they need as soon as possible. SLA is often missed, especially as the number of users and use cases grow. And there is pressure to move to the public cloud, even if it is not a fully mandatory command.
Although there are many factors contributing to this situation, there is only one answer to how to deal with it: CDW. This article describes the representative examples faced by our customers and explains how CDW solves these problems. It also looks at the key roles played by several Azure services (such as Azure Kubernetes services and ADLS Gen2) in the solution. We will inspect a company that makes equipment for aircraft. Like many companies, there are a large number of analysts studying selected data, line of business (LOB) managers focus on operational excellence, and data scientists look for competitive advantage in new data sets. However, like many of our customers, there are challenges, as shown by the following four protagonists: 1. Ramesh's team of business analysts can generate reports that run the business. But as the team grew, the warehouse's ability to meet the SLA and maintain the budget declined.
A) CDW provides Ramesh with cost-effective and scalable reports and dashboards, so their SLA will not be missed.
2. Kelly is a data architect who needs to run a temporary discovery workload for activity analysis. However, due to the risk of contention caused by workloads bound to SLA, she is not allowed to use the warehouse.
A) CDW enables Kelly to process data in the warehouse without affecting other workloads.
3. Data scientist Olivia cannot acquire the ability to explore new supply chain data in the warehouse. As a result, the opportunity for optimization is missed.
A) CDW provides unlimited computing resources for Olivia to throw away any data in the object store in a matter of minutes.
4. Mariana is an operations manager who needs to view high-volume sensor data in real time and the ability to combine it with customer experience data. Current warehouses cannot handle this quantity or diversity, so Mariana must use a valuable budget to build another silo.
A) CDW provides a single platform for Mariana to execute traditional data warehouses and new use cases that require different technologies. At the same time keep a copy of each dataset and take advantage of shared metadata and security.
In the following sections, we will further explain how CDW and Azure provide these capabilities.
Capability 1-efficient and scalable report and dashboard data management
Ramesh and his business analysis team release reports around the clock. The business is based on the insights provided by his team, especially those related to customer sentiment, which is critical given the recent drop in travel spending. Therefore, they can not miss the SLA, otherwise the enterprise will develop blindly. Regardless of the growth in the volume of data and the number of analysts, reports must be delivered even if the budget is shrinking.
Whenever there is no query, the computing resources in the CDW virtual warehouse (VW) remain paused without any cost. Volkswagen automatically launches the first query after Ramesh arrives in the morning. If the query load increases to the saturation point later because many of Ramesh's colleagues are online later in the morning, VW will detect this and provide more computing resources to handle the load while maintaining performance. This is called automatic scaling. Once the load falls to a low level (his colleagues have lunch without him), these additional computing resources are released, so there is no cost. Finally, at the end of the day when Ramesh finally left work and the query was all over, Volkswagen automatically paused itself and fell to free again. CDW can provide this pay-on-demand capability by using Azure Kubernetes Services (AKS) to quickly provide compute Pod and release them when they are no longer needed. These Pod use Standard_E16_v3 to calculate the instance size (16 vCPU,128 GiB RAM,400 GiB local SSD). AKS eventually uses the VM zoom set in the background to enable and control automatic scaling. Once Ramesh's team runs the query, they can largely satisfy their SLA through the three cache levels built into the service:
Data caching-when data is first read from ADLS, it is cached on the compute node that uses the data. Queries that then need the same data get the data from the local cache rather than the ADLS. This cache type is supported by both Hive LLAP and Impala VW.
Result set caching-after the result is sent back to the client, the result set is also cached in storage on the HiveServer2 node. If the exact same query is reached again (which is common in dashboards and BI use cases), the results are provided directly from the HS2 cache. Currently, only Hive LLAP VW supports this cache type.
Materialized view-you can define the structure and content of the materialized view (MV), and Hive will select data from the base table to populate it. For subsequent queries that access the base table, if Hive detects that data can be provided from the MV, it transparently rewrites the query to use the query, thus avoiding the need to scan the base table again, join the data, summarize it, and so on. Currently, only Hive LLAP Volkswagen supports this feature.
With this level of intelligence and performance optimization, Ramesh and teams can grow as the amount of data and business needs grow, paying only the resources needed for the actual work. Capability 2-temporarily explore workloads to supplement SLA constraints
The chief marketing officer asked data architect Kelly to provide metrics to quantify the impact of recent marketing activities. The warehouse has the required data, but it is also running at full capacity. Kelly will need to use multiple query types to explore the data, and is not sure how long it will take or how much CPU and memory she needs. Under such vague requirements, IT does not allow her to perform this work on the data warehouse because of the risk of affecting the operational workload of the SLA constraint. Her query may deplete CPU resources and expel all hot data from the cache. As a result, CMO has no metrics to help understand the impact of its marketing investments. With CDW,Kelly, you can have your own computing environment that can query warehouse data but is completely isolated from other SLA-constrained workloads. CDW can do this by managing data contexts (table definitions, authorization policies, metadata) separately from the storage and computing layers. In this way, multiple computing environments can share the same data context. Cloudera shared data experience (SDX) is a term provided for this hosting context. The key enabling feature of SDX is the ability to store metadata and security rules reliably in a persistent database. To do this, we use Gen5 4 vCore, the memory optimization option to use the Azure database for PostgreSQL. This managed Postgres service is easy to integrate, highly available, and easy to manage. Using this as a single source of fact for metadata and other persistent states, CDW can safely run multiple computing environments needed for your workload in parallel. When computing resources are needed in this case, another way that CDW provides is to extend your workload from a local CDH or HDP cluster to a CDP running in a public cloud. In this case, the workload Manager tool is used to analyze your internal workload, identify candidate workloads suitable for bursts (in which case temporary exploration queries interfere with SLA-bound queries), and then copy data and metadata to CDP. You can now run workloads securely in your cloud environment. If you do this, you may want to use Microsoft ExpressRoute to ensure good performance and consistent data movement latency. Function 3-Quick configuration to keep up with business data scientist Olivia (Olivia) occasionally needs to test supply chain optimization assumptions with new data files that are not yet in the warehouse. But the central IT department never planned such a sudden workload, nor did it have the resources to do a new ETL project to integrate this new data (whose value has not yet been validated) into the warehouse. This leads to missed opportunities to reduce supply chain costs and risks. If you use CDW,Olivia, you will be able to simply start a new Hive LLAP VW, which only takes a few minutes, and then create an external table definition on the data file so that she can start querying them. With Hive, you can query semi-structured text files and delimited files (such as CSV or TSV) locally. There is a standard open source library for querying JSON and other file formats. Also, you can always define your own Serializer-Deserializer (SerDe) for custom formats. Even with these basic file formats, Hive converts the data to its column memory format to benefit from caching and IO efficiency optimizations. The ability to quickly provide query capabilities for any data in the object store provides great flexibility and flexibility. You can quickly explore new data and use new use cases to keep up with the pace of business development. However, this is possible because of scalable, high-performance ADLS Gen2 services. The ABFS connector at Hadoop provides this critical junction, bridging enterprise data that has been stored in ADLS and the second-generation ecosystem where Cloudera provides analytics capabilities. Function 4-Multimodal analysis of new use cases for shared resources Manufacturing LOB Operations Manager Mariana was commissioned by his COO to increase production by avoiding unplanned equipment downtime. She estimates that this will require storing 1 million sensor readings per second, retaining 15 months of data to accommodate historical trend analysis, the ability to run any SQL on the data, and the need to access raw data and aggregations. In short, she needs a highly scalable real-time data warehouse that provides time series functions without causing financial losses. Current data warehouse teams do not have access to these performance requirements, and the traditional time series database used by one of their teams cannot handle such a long history or perform arbitrary SQL. With the CDP platform, Mariana can stand up and support the application's infrastructure within an hour, in which case Azure Compute VM with standard local redundant SSD storage will be used. Cloudera's time series products mainly rely on Apache Kudu storage engine and Apache Impala for SQL queries. You can use Apache NiFi to extract data from Azure Event Hub or Kafka or one of many other supported sources. The combination of a powerful Cloudera engine and a powerful Azure infrastructure means that the ambitious requirements of Mariana can be met. She did such a good job for her chief operating officer that the chief executive noticed this and asked her to improve customer satisfaction (that is, air passengers) by building more reliable aircraft engines. However, the warehouse does not know the machines running in the factory workshop in real time, so there is no easy way to integrate and associate data with customer experience data. Therefore, she does not know what adjustments should be made in the factory to improve quality. With Cloudera,Mariana, you can run queries that combine data in a time series application with other data in the warehouse to derive a correlation between the manufacturing process and the customer experience (as shown in flight delays). As mentioned above, this is enabled through SDX, but in this case, there is a higher level of security because Mariana is not allowed to view personally identifiable information (PII) in customer data. Because CDP integrates with Azure Active Directory to obtain user identity and group membership, it can use Apache Ranger to enforce complex role-based or attribute-based access controls to dynamically mask all PII data when Mariana accesses data. She can now do her job safely and make CEO happy by doing her part to improve customer satisfaction. Using CDW for Azure to change your data warehouse experience with Cloudera Data Warehouse running on Azure, you can cost-effectively expand the reports and dashboards of collated data without waiting for the traditionally long configuration cycle. You can enable ad hoc exploration on top of workloads bound to SLA without the risk of losing those protocols due to resource contention. You can quickly configure resources as needed, so you always say "yes" to any business request that requires any form of analysis, and you can take advantage of shared resources to take advantage of the broader scope of multimodal analysis for new use cases. The above is the cost-effective and scalable case analysis provided by CDW on Azure. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.