In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
How much do the partners who are interested in big data development technology know about data mining technology? In this article, big data Xiaobian shares some commonly used data mining techniques with friends who like the development of big data, hoping to help them.
1. Statistical technology
Data mining involves many scientific fields and technologies, such as statistical technology. The main idea of statistical technology for mining data sets is that statistical methods assume a distribution or probability model (such as a normal distribution) for a given data set, and then use corresponding methods to mine data sets according to the model.
Here I still want to recommend the big data Learning Exchange Group I built myself: 529867072, all of them are developed by big data. If you are studying big data, the editor welcomes you to join us. Everyone is a software development party. Irregularly share practical information (only related to big data software development), including the latest big data advanced materials and advanced development tutorials sorted out by myself. Welcome to join us if you want to go deep into big data.
2. Association rules
Data association is a kind of important discoverable knowledge in database. If there is a certain regularity in Division I of the values of two or more variables, it is called correlation. Relevance can be divided into simple correlation, temporal correlation and causal correlation. The purpose of association analysis is to find out the hidden connections in the database. Sometimes the association function of the data in the database is not known, even if it is known, it is uncertain, so the rules generated by association analysis have credibility.
3. MBR (Memory-based Reasoning) analysis based on history.
First look for similar situations based on empirical knowledge, and then apply the information of these situations to the current example. This is the essence of MBR (Memory Based Reasoning). MBR first looks for neighbors that are similar to the new records, and then uses these neighbors to classify and value the new data. There are three main problems with using MBR: finding certain historical data; determining the most effective way to represent historical data; and determining distance functions, federation functions, and the number of neighbors.
4. Genetic algorithm GA (Genetic Algorithms)
Based on evolutionary theory, the optimization techniques of design methods such as genetic combination, genetic variation and natural selection are adopted. The main idea is: according to the principle of survival of the fittest, form a new group composed of the most suitable rules in the current group, as well as the descendants of these rules. In typical cases, rule fitness (Fitness) is used to evaluate the classification accuracy of training sample sets.
5. Aggregation detection
The process of grouping a collection of physical or abstract objects into multiple classes composed of similar objects is called clustering. A cluster generated by clustering is a collection of data objects that are similar to objects in the same cluster and different from objects in other clusters. The degree of dissimilarity is calculated according to the attribute value of the description object, and distance is a common measure.
6. Connection analysis
Connection analysis, Link analysis, its basic theory is graph theory. The idea of graph theory is to find an algorithm that can get a good result but not a perfect result, rather than an algorithm to find a perfect solution. Connection analysis uses the idea that if imperfect results are feasible, then such an analysis is a good analysis. Using connection analysis, we can analyze some patterns from the behavior of some users; at the same time, the resulting concepts can be applied to a wider group of users.
7. Decision tree
The decision tree provides a way to show rules such as what values will be obtained under what conditions.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.