In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly analyzes the relevant knowledge points of data mining and modeling in big data, the content is detailed and easy to understand, the operation details are reasonable, and has a certain reference value. If you are interested, you might as well follow the editor to have a look, and follow the editor to learn more about "what data mining and modeling is like in big data".
Paper will sleep shallow, never know the matter want to practice.
Data mining is based on the principle of statistics and uses the algorithm tools in machine learning to realize the discovery of value information. Machine learning is a method to realize artificial intelligence, and deep learning is a technology to realize machine learning.
Four classical algorithms: classification, association, clustering, regression
First, supervised learning (in popular terms, it means knowing the type of sample, that is, knowing what kind of sample the current sample is. )
1. Classification analysis: find out the common characteristics of a group of objects and divide them into different categories according to the classification pattern, which are divided into linear classification and non-linear classification.
Typical algorithms of linear classification include logical regression and linear discriminant analysis. Classical nonlinear classification algorithms include K nearest neighbor (KNN), support vector machine (SVM), decision tree (D Tree), naive Bayesian (NB) 2, regression analysis: reflecting the temporal characteristics of transaction data attributes and the correlation between prediction data. Classification is different from classification in that classification is the discrete variable of the prediction target, while regression is the continuous variable of the prediction target. Typical regression analysis models include linear regression analysis, support vector machine (regression) and K nearest neighbor (regression).
Second, unsupervised learning (without any training data samples in advance, we need to model the data directly, that is, we do not provide experience and training samples, but rely entirely on our own groping)
1. Association analysis: rules that describe the relationship between data in a database. There are four indicators in association rule mining: confidence, support, expected confidence and promotion. Typical algorithms: Apriori algorithm, FP-Tree algorithm, PrefixSpan algorithm. 2. Cluster analysis: the label information of the training sample is unknown, and the inherent nature and law of the data are revealed through learning. Typical algorithms: K-means algorithm (K-means), DBSCAN (density-based clustering method with noise).
Wal-Mart classic marketing case: beer and diapers
The model found that: in the American Wal-Mart supermarket in the 1990s, when managers analyzed the sales data, they found an incomprehensible phenomenon: under certain circumstances, two seemingly unrelated items such as "beer" and "diapers" often appear in the same shopping basket. Reason analysis: in families with babies in the United States, the mother usually looks after the baby at home, and the young father goes to the supermarket to buy diapers. When my father buys diapers, he often buys beer for himself, so that beer and diapers, two seemingly unrelated items, often appear in the same shopping basket. Model application: try to put beer and diapers in the same area in the store, so that young fathers can find the two items at the same time and finish shopping quickly; and Wal-Mart allows these customers to buy two items at a time instead of one, thus getting a good sales revenue. Theoretical support: in 1993, Agrawal, an American scholar, proposed to find out the association algorithm of the relationship between goods by analyzing the collection of goods in the shopping basket, and to find out the purchase behavior of customers according to the relationship between goods. From the point of view of mathematics and computer algorithm, Agrawo put forward the calculation method of commodity correlation-Apriori algorithm. About "how data mining and modeling in big data" is introduced here, more related content can be searched for previous articles, hope to help you answer questions, please support the website!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.