In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces the data mining technology in the computer network, which has a certain reference value. Interested friends can refer to it. I hope you will gain a lot after reading this article. Let's take a look at it.
Data mining techniques include: 1, statistical technology; 2, association rules; 3, history-based analysis; 4, genetic algorithm; 5, aggregation detection; 6, connection analysis; 7, decision tree; 8, neural network; 9, rough set; 10, fuzzy set; 11, regression analysis; 12, difference analysis; 13, concept description and so on.
The operating environment of this tutorial: windows7 system, Dell G3 computer.
Data mining is a process of extracting hidden, unknown but potentially useful information and knowledge from a large number of, incomplete, noisy, fuzzy and random data.
The task of data mining is to discover patterns from data sets. There are many kinds of patterns that can be found, which can be divided into two categories according to their functions: Predictive patterns and Descriptive patterns.
There are many kinds of data mining techniques, and there are different classifications according to different classifications. The following focuses on some commonly used techniques in data mining: statistical technology, association rules, history-based analysis, genetic algorithm, aggregation detection, connection analysis, decision tree, neural network, rough set, fuzzy set, regression analysis, difference analysis, concept description and other thirteen commonly used data mining techniques.
1. Statistical technology
Data mining involves many scientific fields and technologies, such as statistical technology. The main idea of statistical technology for mining data sets is that statistical methods assume a distribution or probability model (such as a normal distribution) for a given data set, and then use corresponding methods to mine data sets according to the model.
2. Association rules
Data association is a kind of important discoverable knowledge in database. If there is a certain regularity in Division I of the values of two or more variables, it is called correlation. Relevance can be divided into simple correlation, temporal correlation and causal correlation. The purpose of association analysis is to find out the hidden connections in the database. Sometimes the association function of the data in the database is not known, even if it is known, it is uncertain, so the rules generated by association analysis have credibility.
3. MBR (Memory-based Reasoning) analysis based on history.
First look for similar situations based on empirical knowledge, and then apply the information of these situations to the current example. This is the essence of MBR (Memory Based Reasoning). MBR first looks for neighbors that are similar to the new records, and then uses these neighbors to classify and value the new data. There are three main problems with using MBR: finding certain historical data; determining the most effective way to represent historical data; and determining distance functions, federation functions, and the number of neighbors.
4. Genetic algorithm GA (Genetic Algorithms)
Based on evolutionary theory, the optimization techniques of design methods such as genetic combination, genetic variation and natural selection are adopted. The main idea is: according to the principle of survival of the fittest, form a new group composed of the most suitable rules in the current group, as well as the descendants of these rules. In typical cases, rule fitness (Fitness) is used to evaluate the classification accuracy of training sample sets.
5. Aggregation detection
The process of grouping a collection of physical or abstract objects into multiple classes composed of similar objects is called clustering. A cluster generated by clustering is a collection of data objects that are similar to objects in the same cluster and different from objects in other clusters. The degree of dissimilarity is calculated according to the attribute value of the description object, and distance is a common measure.
6. Connection analysis
Connection analysis, Link analysis, its basic theory is graph theory. The idea of graph theory is to find an algorithm that can get a good result but not a perfect result, rather than an algorithm to find a perfect solution. Connection analysis uses the idea that if imperfect results are feasible, then such an analysis is a good analysis. Using connection analysis, we can analyze some patterns from the behavior of some users; at the same time, the resulting concepts can be applied to a wider group of users.
7. Decision tree
The decision tree provides a way to show rules such as what values will be obtained under what conditions.
8. Neural network
Structurally, a neural network can be divided into input layer, output layer and hidden layer. Each node of the input layer corresponds to each prediction variable. The node of the output layer corresponds to the target variable, but there can be more than one. Between the input layer and the output layer is the hidden layer (invisible to neural network users). The number of hidden layers and the number of nodes in each layer determine the complexity of the neural network.
In addition to the nodes in the input layer, each node of the neural network is connected with many nodes in front of it (called the input nodes of this node), and each connection corresponds to a weight Wxy. The value of this node is obtained by the sum of the product of the values of all its input nodes and the corresponding connection weights as the input of a function. We call this function an active function or a squeeze function.
9. Rough set
Rough set theory is based on the establishment of equivalence classes within a given training data. All the data samples that form the equivalence class are indistinguishable, that is, they are equivalent for describing the attributes of the data. Given real-world data, there are usually some classes that cannot be distinguished by available properties. Rough sets are used to approximate or roughly define this class.
10. Fuzzy sets
Fuzzy set theory introduces fuzzy logic into data mining classification system, which allows the definition of "fuzzy" domain values or boundaries. Fuzzy logic uses truth values between 0.0 and 1.0 to indicate the extent to which a particular value is a given member, rather than the exact truncation of a class or collection. Fuzzy logic provides the convenience of processing at a high abstraction layer.
11. Regression analysis
Regression analysis is divided into linear regression, multiple regression and nonlinear coregression. In linear regression, the data is modeled by a straight line, and multiple regression is an extension of linear regression, involving multiple predictive variables. Nonlinear regression is to add a polynomial term to the basic linear model to form a nonlinear one-door model.
12. Difference analysis
The purpose of differential analysis is to try to find abnormal data, such as noise data, fraud data and other abnormal data, so as to obtain useful information.
13. Concept description
Concept description is to describe the connotation of a certain kind of object and summarize the relevant characteristics of this kind of object. Concept description is divided into characteristic description and distinguishing description, the former describes the common characteristics of certain objects, and the latter describes the differences between different kinds of objects. generating a characteristic description of a class only involves the commonness of all objects in this class.
Thank you for reading this article carefully. I hope the article "what are the data mining technologies in the computer network" shared by the editor will be helpful to everyone? at the same time, I also hope that you will support and pay attention to the industry information channel. more related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.