In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you the "example analysis of feature engineering algorithms in python machine learning", which is easy to understand and well-organized. I hope it can help you solve your doubts. Let me lead you to study and study the "example analysis of feature engineering algorithms in python machine learning".
I. Overview of machine learning
Machine learning is to automatically analyze and obtain rules (models) from data, and use laws to predict unknown data.
Second, the composition of the data set 1. Data set storage
Historical data of machine learning is usually stored in csv files.
Reasons for not using mysql:
1. The reading speed is slow if the file is large.
2. the format does not meet the requirements of machine learning.
two。 Available dataset
Kaggle: big data competition platform, 800000 scientists, real data, huge amount of data
Kaggle Web site: https://www.kaggle.com/datasets
UCI:360 data sets, covering science, life, economy and other fields, hundreds of thousands of data
UCI dataset Web site: http://archive.ics.uci.edu/ml/
Scikit-learn: small amount of data, easy to learn
Scikit-learn Web site: http://scikit-learn.org/stable/datasets/index.html#datasets
3. The structure of commonly used data sets
Eigenvalue (conditions used to determine the target value, such as the area of the house, etc.) + target value (desired goal: such as house price)
Some datasets can have no target values.
III. Feature Engineering
The process of "transforming the original data into features that better represent the potential problems of the prediction model", called feature engineering, can improve the prediction accuracy of unknown data. If the feature is not good, it is likely that even if the algorithm is good, the result will not be satisfactory.
Pandas can be used for data reading and basic data processing.
Sklearn has more powerful interfaces for processing features
Feature extraction:
Feature extraction API:sklearn.feature_extraction
1. Dictionary data feature extraction
API:sklearn.feature_extraction.DictVectorizer
The syntax is as follows:
Dictionary data extraction: convert the category data in the dictionary into feature data respectively. Therefore, if the input is in the form of an array and has these characteristics of the category, it needs to be converted into dictionary data first, and then extracted.
two。 Text feature extraction
Count
Class: sklearn.feature_extraction.text.CountVectorizer
Usage:
1. Count all the words in all the articles and repeat them only once
two。 For each article, count the number of times each word appears in the list of words
3. Individual letters are not counted.
Note: this method does not support Chinese by default, each Chinese character is regarded as an English letter, a space or comma in the middle will be separated, similarly, a Chinese character will not be counted. (Chinese can use jieba participle: pip install jieba, use: jieba.cut ("I am a programmer"))
3. Text feature extraction: tf-idf
The above countvec cannot handle neutral words such as "tomorrow, noon, because". So you can use the tfidf method.
Tf:term frequency word frequency (same as countvec method)
Idf:inverse document frequency inverse document frequency log (total number of documents / number of documents in which the word appears)
Tf * idf importance
Class: sklearn.feature_extraction.text.TfidfVectorizer
4. Feature preprocessing: normalization
Feature preprocessing: convert the data into the data required by the algorithm through specific statistical methods
Feature preprocessing API:sklearn.preprocessing
Normalized API:sklearn.preprocessing.MinMaxScaler
When multiple features are equally important and there is a large gap between feature data, normalization is carried out. However, normalization is easily affected by outliers, so this method is less robust and is only suitable for traditional accurate small data scenarios.
5. Feature preprocessing: standardization
Transform the original data to a range with a mean value of 0 and a standard deviation of 1
Standardized API:
Sklearn.preprocessing.StandardScaler
Standardization is suitable for modern noisy big data scenes and is relatively stable when there are enough samples.
6. Feature preprocessing: missing value processing
Interpolation: to fill (usually by column) by missing the average or median of each row or column.
API:sklearn.impute.SimpleImputer
Missing value tag in data: default is np.nan
The above is all the contents of the article "example Analysis of feature Engineering algorithms in python Machine Learning". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.