In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
DIKW system
The DIKW system is a system of data, information, knowledge and wisdom, which can be traced back to the poem "Rock" written by Thomas Sturnus Eliot. In the first paragraph, he wrote: "where did we lose the wisdom in knowledge?" where did you lose the knowledge in the information? " (Where is the wisdom we have lost in knowledge? / Where is the knowledge we have lost in information? ).
In December 1982, American educator Harlan Cleveland quoted these verses from Eliot and put forward the idea of "information as resources" (Information as a Resource) in his book futurist.
After that, educator Milan Sereni and management thinker Russell Aikov further developed this theory. The former wrote "Management support system: towards Integrated knowledge Management" (Management Support Systems: Towards Integrated Knowledge Management) in 1987, and the latter wrote "from data to Wisdom" ("From Data to Wisdom", Human Systems Management) in 1989.
DIKW system in the Field of data Engineering
D:Data (data), which is the lowest material in DIKW system, generally refers to the original data, which contains (or does not contain) useful information.
I:Information (information), as a concept, information has various meanings. In data engineering, it refers to the higher-level data (specific data) found after the original data is integrated and extracted by a data engineer (using relevant tools) or a data scientist (using mathematical methods) according to certain rules.
K:Knowledge (knowledge), and these understandings have the potential to be used for specific purposes. In data engineering, it means to make the information targeted and practical, so that the extracted information can be used for commercial applications or academic research.
W:Wisdom (Wisdom) refers to some conclusions drawn from independent thinking and analysis of knowledge. In data engineering, engineers and scientists do a lot of work to extract as much value as possible with computer programs, but data analysts are needed to gain insight into higher values from the data and even be able to predict the future.
Career division in the field of data engineering:
Data engineering is a whole process of collecting, processing and extracting value (into I or K) of data (D).
First of all, let's introduce several related roles: Data Engineer (data engineer), Data Scientist (data scientist), and Data Analyst (data analyst). These three roles have high overlapping tasks and require close cooperation, but they are slightly different in their respective areas of responsibility. These roles in most companies hold multiple roles according to each person's own skills, so it is sometimes difficult to distinguish:
Data Engineer data engineer: analyzing data requires the use of computers and various tools to automate the process of data processing, including data format conversion, storage, update, query. The job of a data engineer is to automate the development of tools, which belongs to the infrastructure / tools (Infrastructure/Tools) layer.
However, this role does not appear very often, because there are off-the-shelf database technologies such as MySQL and Oracle, and many large companies only need DBA. With the open source of NoSQL technologies such as Hadoop and MongoDB, there is not much work for data engineers in big data's scenario, which is generally handed over to data scientists.
Data Scientist data scientist: data scientist is an intermediate role combined with mathematics, who needs to use mathematical methods to process raw data to find higher-level data invisible to the naked eye, usually using statistical machine learning (Statistical Machine Learning) or deep learning (Deep Learning).
Some people call Data Scientist programming statisticians (Programming Statistician) because they need to have a good statistical foundation, but they also need to participate in the development of programs (based on Infrastructure), and now many positions of data scientists require them to be also data engineers. Data scientists are the main force in converting D to I or K.
Data Analyst data analyst: data engineers and data scientists have done a lot of work to extract as much value as possible with computer programs, but to really gain more value from the data requires a wealth of industry experience and insight, which requires human intervention. What Data Analyst needs is to have a deep understanding of its business, to be proficient in using the tools at hand (whether it's Excel, SPSS, Python/R, the tools developed by engineers, and, if necessary, to be able to act as engineers and scientists and try their best to get the tools they need), to analyze the data in a targeted manner, and to present the findings to other functional departments and eventually turn them into action. This is how to get the data into Wisdom. What is data analysis:
Baidu encyclopedia: data analysis refers to the process of using appropriate statistical analysis methods to analyze a large number of collected data, extract useful information and form conclusions, and then study and summarize the data in detail. This process is also the support process of the quality management system. In practice, data analysis can help people make judgments so that appropriate action can be taken.
Process of data analysis:
Data collection-"data processing -" data analysis-"data presentation"
Data collection: collection and operation of local data or network data. Data processing: the regularity of data, integrated storage according to a certain format. Data analysis: scientific calculation of data, using relevant data tools for analysis. Data presentation: data visualization, using relevant tools to display the analyzed data. The tool of data analysis: the statistical analysis software developed by SAS:SAS (STATISTICAL ANALYSIS SYSTEM, referred to as SAS) is a powerful database integration platform. Expensive, banks or large enterprises can only afford to do offline analysis or model use. SPSS:SPSS (Statistical Product and Service Solutions, Statistical Product and Service solution) is a series of products launched by IBM for statistical analysis, data mining, predictive analysis and decision support tasks. It has a growing history of more than 40 years and is expensive. R/MATLAB: suitable for academic data analysis, it needs to be converted to Python or Scala in practical application, and MATLAB (commercial mathematics software produced by MathWorks) is free. Scala: is a functional programming language, proficient in the use of high development efficiency, with Spark suitable for large-scale data analysis and processing, Scala running environment is JVM. Python:Python has many mature frameworks and algorithm libraries in the field of data engineering and machine learning. Data-centric applications can be built only with Python. Python is very popular in the field of data engineering and machine learning.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.