In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "what is the basic concept of big data's research and development". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what is the basic concept of big data's research and development"?
Preface of 0x00
Do you know your data?
A few days ago, I was suddenly inspired to sort out my understanding of the data, so I have this blog or this series of blogs to talk about the data.
There are many data practitioners, such as data development engineer, data warehouse engineer, data analyst, data mining engineer, data product manager and so on. Children's shoes in different positions have different understanding of data and different emphasis. So, is there any basic knowledge about data that all data practitioners should know? How different is the understanding of the data in different positions? Is it necessary for data development engineers to understand how data analysts view data?
This series of blogs will try to learn, mine and summarize this content, and fly together in the ocean of data.
0x01 data? data!
At the beginning, there are a few questions:
Do you know the amount of data access to your system?
Do you know the distribution of the data?
Do you know what hidden holes there are in your commonly used data?
If you don't know much about the problems mentioned above, then we can happily communicate and discuss together in the future. If your answer to all the questions mentioned above is "Yes", then I will still try to keep you with new questions. For example:
Since you know the amount of data access to the system, do you know how much data fluctuates every day? To what extent is the volatility normal?
What is the distribution of the data you know? Apart from gender, age and urban distribution, what is the distribution?
Do you know which data is most used and which data is unvisited in such a large data warehouse?
What are the core dimensions in the most commonly used batch of data? Is the caliber of data between two tables with the same dimension the same?
Assuming you are a little confused or interested in the above questions, we officially begin our journey of understanding the data.
Overview of 0x02
Now, we roughly divide data practitioners into data cluster operations and maintenance, data development engineers, data warehouse engineers, data analysts, data mining engineers and data product managers. This section starts with an introduction to roughly explain that different positions have different understanding of data, which will be explained in detail later.
First of all, there is a lot of overlap in data-related positions at work, so it is difficult to distinguish the responsibilities of different positions across the board. For example, a data development engineer is a big concept. He can do data access, data cleaning, data warehouse development, data mining algorithm development, and so on. For example, data analysts, many data analysts have to do data analysis. Also have to do some needs to raise the amount, and sometimes have to do all kinds of processing on their own.
The larger the company's data team, the more subdivided the job responsibilities, and vice versa. Here, let's make a comparison between data development engineers and data warehouse engineers to illustrate the differences in data understanding among students with different responsibilities. We assume that data development engineers focus on data access, storage and basic data processing, while data warehouse engineers focus on the design and development of data models (such as dimensional modeling).
The most basic understanding of data development engineers is that they need to know the access status of the data, for example, how much data is accessed every day, how much data is accessed, how many services are connected, how much access is available to each service, and how much fluctuation is normal? Then you need to be sure about the storage cycle of the data, for example, how many tables have a storage period of 30 days and how many are 90 days? How much new storage is added to the cluster every day, and how long will it take for the cluster storage to burst?
Data warehouse engineers should also have a certain perception of the above, but there will be differences. For example, data warehouse engineers will pay more attention to the data state of the business used in their warehouse modeling. Then you need to know the data distribution of the destination business, such as age distribution, gender distribution, geographical distribution and so on. In addition, we should also pay attention to the problem of data caliber, for example, there are many user tables, whether the gender values of each table are: male, female, unknown, or will be useful numerical types: 1 male, 2 female, 0 unknown.
Then the focus of data development engineers on data anomalies may be on whether today's data is delayed landing, whether the total amount fluctuates greatly, and whether the data availability rate is normal.
The focus of data warehouse engineers on data anomalies may be whether there is a surge in the amount of data with gender 0 in today's landing data (which may cause data skew), and whether the values of a key dimension are all empty.
The above examples may be solved together in a data quality monitoring system, but we do not discuss the design of the system here, but have the overall consciousness and thinking first.
At this point, I believe you have a deeper understanding of "what is the basic concept of big data's research and development". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.