In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly analyzes the relevant knowledge of what the benefits of the mixed data lake are, the content is detailed and easy to understand, the operation details are reasonable, and has a certain reference value. If you are interested, you might as well follow the editor and learn more about "what are the benefits of mixed data lake".
Data lake and data warehouse are both established terms when storing big data, but these two terms are not synonyms. The data lake is a large amount of raw data that has not yet been used. On the other hand, a data warehouse is a repository of structured filtered data for specific purposes.
Common ground
The data warehouse and data lake represent a central database system that can be used for analytical purposes in the company. The system extracts, collects and saves relevant data from a variety of heterogeneous data sources, and provides it to the downstream system.
The data warehouse can be divided into four sub-processes:
Data acquisition: obtaining and extracting data from various data repositories.
Data storage: data storage in a data warehouse, including long-term archiving.
Data supply: provide the required data to the downstream system and provide the data Mart.
Data evaluation: analysis and evaluation of data inventory.
Difference
Data warehouses combine classic ETL processes with structured data in relational databases, while data lakes use paradigms and read patterns such as ELT, as well as frequently used unstructured data [2].
Differences Data Warehouse vs. Lake
At the top, you can see the main differences. The technology you use is also completely different. For data warehouses, you will use SQL and relational databases, while for data lakes, you may use NoSQL or a mixture of both.
Combine the two in a mixed data lake
So how to combine these two concepts? In the following figure, you can view the architecture from a high point of view.
The process is to load unstructured and unconverted data into the data lake. From here, on the one hand, you can use the data for ML and Data Science tasks. On the other hand, data can also be converted into structured form and loaded into the data warehouse. From here, you can achieve classic data warehouse distribution through data marts and (self-service) BI tools.
> Hybrid Data Lake Concept-Image from Author
The main technologies that can be used in this architecture are:
ELT / ETL process through talend,Google Dataflow,AWS Data Pipeline
Through Data Lake-HDFS,AWS Athena and S3 Magi Google Cloud Storage
Data Warehouse passes-Google BigQuery,AWS Redshift,Snowflake
Note: technologies such as Google's BiqQuery or AWS Redshift are often seen as a mixture of data warehouse technology and data lake technology because they usually already have some of the characteristics of NoSQL.
This is the end of the introduction on "what are the benefits of the mixed data Lake". More relevant content can be searched for previous articles, hoping to help you answer questions and questions, please support the website!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.