In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Ten practical Analysis methods in data Mining
1. MBR Analysis based on History
The main concept of MBR analysis method based on Memory- Based Reasoning is to use known cases (case) to predict some attributes (attribute) of future cases, usually looking for the most similar cases to compare the two main elements of MBR, namely distance function (distance function) and combination function (combinationfunction). The purpose of the distance function is to find the most similar cases; the combination function combines the attributes of similar cases for prediction. The advantage of MBR is that it allows various types of data that do not have to obey certain assumptions. Another advantage is that it has the ability to learn. It can acquire knowledge about new cases through the study of old cases. What is more criticized is that it needs a lot of historical data and enough historical data to make good predictions. In addition, the memory-based inference method is also time-consuming in processing, and it is difficult to find the best distance function and binding function. Its scope of application includes deceptive behavior detection, customer response prediction, medical diagnosis and treatment, response classification and so on.
two。 Shopping basket analysis
The main purpose of shopping basket analysis (Market Basket Analysis) is to find out what kind of things should be put together.
The commercial application is to understand what kind of customers and why these customers buy these products through their purchasing behavior, and to find out the relevant association rules. Enterprises gain benefits and establish competitive advantages through the mining of these rules. For example, retail stores can use this analysis to change the arrangement of goods on the shelves or to design business packages that attract customers, and so on.
The basic operation process of shopping basket analysis includes the following three points:
Choose the right item: what is correct here means that for the enterprise, it is necessary to select the truly useful items from hundreds or thousands of items. The association rules are mined through the discussion of co-occurrence matrix (co- occurrence matrix).
Overcome the practical limitations: the more items you choose, the longer the calculation takes (showing an exponential increase). At this time, some techniques must be used to reduce the loss of resources and time.
Shopping basket analysis technology can be applied to the following problems: for credit card purchases, it can predict what customers are likely to buy in the future. For telecom and financial services, different service portfolios can be designed to expand profits through shopping basket analysis. The insurance industry can use shopping basket analysis to detect potentially unusual insurance combinations and take precautions. For patients, shopping basket analysis can be used as a basis for judging whether these combinations of courses will lead to complications.
3. Decision tree
Decision tree (ecision Trees) has a strong ability to solve classification and prediction, it is expressed in the way of rules, and these rules are expressed in a series of questions, which can finally derive the desired results by constantly asking questions. A typical decision tree has a root at the top and many leaves at the bottom, which breaks down records into different subsets, and the fields in each subset may contain a simple rule. In addition, decision trees may have different shapes, such as binary trees, ternary trees, or mixed decision trees.
4. Genetic algorithm.
Genetic algorithm (Genetic Algorithm) learns the process of cell evolution. Cells can produce better new cells through continuous selection, replication, mating and mutation. The genetic algorithm works in a similar way. It must establish a model in advance, and then through a series of processes that seem to produce new cells, use the fit function (fitness function) to determine whether the resulting offspring can survive only the most consistent result of the pattern. The program operates until the function converges to the optimal solution. Genetic algorithm has a good performance on the cluster problem, and can generally be used to assist the application of memory-based reasoning and neural networks.
5. Cluster analysis.
Cluster analysis (Cluster Detection) covers a wide range of technologies, including genetic algorithms, neural networks, and cluster analysis in statistics. Its goal is to find out the previously unknown similar groups in the data. in many analyses, cluster detection technology is used at the beginning of the research.
6. Connection analysis
Connection Analysis (Link Analysis) is based on the graph theory (graph theory) in mathematics, and develops a model by recording the relationship between people. It takes the relationship as the main body, and develops quite a lot of applications from the relationship between people, things or things. For example, the telecommunications service industry can collect the time and frequency of customers using the phone through link analysis, and then infer what the customer preference is, and put forward a plan that is beneficial to the company. In addition to the telecommunications industry, more and more marketers are also using link analysis to do research that is beneficial to enterprises.
7.OLAP analysis
Strictly speaking, OLAP (On- Line Analytic Processing;OLAP) analysis is not a special data mining technology, but through online analysis and processing tools, users can better understand the potential implications of the data. Like some visual processing techniques, through charts or graphics, it will feel more friendly to the average person. Such tools can also assist in the goal of turning data into information.
8. Neural network
The neural network uses the method of repeated learning to hand over a series of examples to the study, so that it can sum up a style that can be distinguished. In the face of new examples, neural networks can be summed up according to their past learning achievements and deduce new results, which is a kind of machine learning. The related problems of data mining can also adopt the way of neural learning, and its learning effect is very correct and can be used as a prediction function.
9. Discriminant analysis
When the dependent variable of the problem is categorical1 and the independent variable (predictive variable) is metric, discriminant analysis is a very appropriate technique, which is usually used to solve the problem of classification. If the dependent variable consists of two groups, it is called a double group.
Body-discriminant analysis (Two- Group Discriminant Analysis); if it is composed of multiple populations, it is called multivariate discriminant analysis (Multiple Discriminant Analysis; MDA).
a. Find out the linear combination of predictive variables, so that the ratio of inter-group variation to intra-group variation is the largest, and each linear combination is the same as the previous one.
The linear combinations obtained are irrelevant.
b. Check whether there are differences in the center of gravity of each group.
c. Find out which predictive variables have the greatest ability to distinguish.
d. Assign the new subject to a group according to the predicted variable value of the new subject
10. Logical regression analysis
When the population in discriminant analysis does not conform to the hypothesis of normal distribution, logical regression analysis is a good alternative. Logical regression analysis and
Whether a non-predictive event (event) occurs, but instead predicts the probability of the event. It assumes that the relationship between independent variables and dependent variables is the shape of s rows.
When the independent variable is very small, the probability value is close to zero; when the independent variable value increases slowly, the probability value increases along the curve to a certain value.
At the degree, the curve cooperation rate begins to decrease, so the probability value is between 0 and 1.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.