In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
What are the general steps of data mining? The basic steps of data mining, data mining process definition, establishment of data mining database, analysis of data, preparation of data, establishment of model, evaluation model and implementation. Dig out potential patterns to help decision makers adjust market strategies, reduce risks and make correct decisions. Let's take a look at it with the editor.
Data mining refers to the non-trivial process of revealing hidden, previously unknown and potentially valuable information from a large amount of data in the database. Data mining is a decision support process, which is mainly based on artificial intelligence, machine learning, pattern recognition, statistics, database, visualization technology, etc., to analyze enterprise data and make inductive reasoning highly automatically. dig out potential patterns to help decision makers adjust market strategies, reduce risks and make correct decisions.
Data mining steps. By analyzing each data and finding its rules from a large amount of data, data mining mainly has three steps: data preparation, law search and law representation. Data preparation is to select the required data from relevant data sources and integrate it into a data set for data mining; rule search is to find out the laws contained in the data set in some way; the rule representation is to express the found rules in a way that can be understood by the user as much as possible. The tasks of data mining include association analysis, cluster analysis, classification analysis, anomaly analysis, specific group analysis and evolution analysis.
Data mining steps:
1. Definition problem
The first and most important requirement before starting knowledge discovery is to understand data and business issues. There must be a clear definition of the goal, that is, to decide what you want to do. For example, when you want to improve the utilization of e-mail, what you want to do may be "increase user utilization" or "increase the value of one-time user use". The models established to solve these two problems are almost completely different. A decision must be made.
2. Establish a data mining database.
The establishment of data mining library includes the following steps: data collection, data description, selection, data quality evaluation and data cleaning, merging and integration, building metadata, loading data mining library, and maintaining data mining library.
3. Analyze the data
The purpose of the analysis is to find the data fields that have the greatest impact on the predicted output and to determine whether export fields need to be defined. If the dataset contains hundreds of fields, then browsing and analyzing the data will be very time-consuming and tiring, and you need to choose a good interface and powerful tool software to help you do these things.
4. Prepare data
The last step in data preparation before building a model. This step can be divided into four parts: select variables, select records, create new variables, and convert variables.
5. Establish a model
Modeling is an iterative process. You need to carefully examine different models to determine which model is most useful for the business problems you are facing. First use part of the data to build the model, and then use the remaining data to test and verify the resulting model. Sometimes there is a third dataset, called the validation set, because the test set may be affected by the characteristics of the model, and a separate dataset is needed to verify the accuracy of the model. Training and testing data mining models need to divide the data into at least two parts, one for model training and the other for model testing.
6. Evaluation model
After the model is established, the results obtained must be evaluated and the value of the model must be explained. The accuracy obtained from the test set is only meaningful for the data used to build the model. In practical application, it is necessary to further understand the types of errors and the related costs. Experience has proved that an effective model is not necessarily the right model. The direct reason for this is the various assumptions implied in the modeling, so it is important to test the model directly in the real world. First apply it in a small area, obtain the test data, and then popularize and implement it on a large scale after feeling satisfied. After the model has been established and verified, there are two main ways to use it. The first is to provide reference to analysts; the other is to apply this model to different data sets.
The above is a brief introduction of the general steps of data mining, of course, the detailed use of the above differences still have to be used by everyone to understand. If you want to know more, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.