Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the KNN algorithm and an example Analysis of News Classification

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Today, I will talk to you about what is the KNN algorithm and the sample analysis of news classification, many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

1. What is KNN algorithm

The full name of KNN is (K Nearest Neighbors) K nearest neighbor algorithm, which means K nearest neighbors. Is one of the simplest classification algorithms. KNN is classified by measuring the distance between different eigenvalues. Its idea is that if most of the k most similar samples in the feature space (that is, the nearest neighbor in the feature space) belong to a certain category, then the sample also belongs to this category, where K is usually an integer less than 20. In the KNN algorithm, the selected neighbors are all objects that have been correctly classified. In the decision-making of classification, this method only determines the category of the sample to be divided according to the category of the nearest sample or samples.

Algorithm flow

Execute for each unknown point:

Calculate the distance from an unknown point to all known category points

Sort by distance (ascending order)

Select the first k points that are closest to the unknown point

Count the number of categories in k points

The category with the highest frequency of occurrence of categories as unknown points among the above k points

Advantages and disadvantages

Advantages:

Simple, effective and easy to understand

Disadvantages:

K nearest neighbors need to save all data sets, so it consumes a lot of memory and requires very high equipment when the data set is large.

The distance from each unknown point to all known points needs to be calculated, which can be time-consuming

The result of classification is not easy to understand.

two。 Example

Classify according to the news text judgment, such as science and technology news or sports news and so on.

Training data sample table type:

#-*-coding: utf-8-*-import pandas as pdimport matplotlib.pyplot as pltimport matplotlibimport jiebafrom sklearn.feature_extraction.text import CountVectorizerfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.metrics import confusion_matriximport seaborn as sns####1. Solve the Chinese garbled problem # specify the default font matplotlib.rcParams ['font.sans-serif'] = [' SimHei'] matplotlib.rcParams ['font.family'] =' sans-serif' # solve negative sign'- 'the problem matplotlib.rcParams [' axes.unicode_minus'] = False###2 displayed as a square. Import data raw_train = pd.read_csv (". / train_sample_utf8.csv", encoding= "utf-8") raw_test = pd.read_csv (". / test_sample_utf8.csv", encoding= "utf8") # 3. View data print (raw_train.head (5)) print (raw_train.shape) print (raw_test.shape) # 4. Classify the data and display the chart # # plt.figure (figsize= (15,8)) # # plt.subplot (1,2,1) # to generate one row and two columns This is the first graph plt.subplot ('row', 'column', 'number') # # raw_train ["classification"]. Value_counts (). Sort_index (). Plot (kind= "barh", title=' training set news topic distribution') # # plt.subplot (1, 2, 2) # # raw_test ["classification"]. Sort_index (). Plot (kind= "barh", title=' test set news topic distribution') # 5. Define the word segmentation function def news_cut (text): return ".join (list (jieba.cut (text) # simply test the effect of word segmentation # test_content =" one day in early June, Chinese tourists from Shenzhen picked up their cameras to shoot the novel and exciting Hollywood Universal Studios theme park scene. " # # print (news_cut (test_content)) # 6. Use the encapsulated word segmentation function to segment the news content in the training set and test set raw_train ["word segmentation article"] = raw_train ["article"] .map (news_cut) raw_test ["word segmentation article"] = raw_test [article] .map (news_cut) # View data print (raw_train.head (5)) # 7. Load the stop word stop_words = [] file = open (". / stopwords.txt", encoding= "utf-8") for line in file: stop_words.append (line.strip ()) file.close () # 8. Use CountVectorizer statistics to count the frequency of words and convert them to the vector vectorizer = CountVectorizer (stop_words=stop_words) X_train = vectorizer.fit_transform (raw_train ["participle article"]) X_test = vectorizer.transform (raw_test ["participle article"]) # 9. Using knn algorithm to predict knn = KNeighborsClassifier (distance) knn.fit (X_train, raw_train ["Classification") Y_test = knn.predict (X_test) # # compare the real test value with the prediction chart And draw a thermal map to show ax = sns.heatmap (confusion_matrix (raw_test ["classification"] .values, Y_test), linewidths=.5,cmap= "Greens", annot=True, fmt='d',xticklabels=knn.classes_, yticklabels=knn.classes_) ax.set_ylabel ('real') ax.set_xlabel ('predicted') ax.xaxis.set_label_position ('top') ax.xaxis.tick_top () ax.set_title (' confusion matrix thermal map')

After reading the above, do you have any further understanding of what is the KNN algorithm and the sample analysis of news classification? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report