Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to solve the problem caused by spam messages by Smartbi in big data

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Big data in the Smartbi how to solve the problem caused by spam messages, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

As of December 2020, the number of mobile Internet users in China has reached 986 million. In the era of mobile Internet, personal information and user data have become important business resources. Some enterprises and individuals in order to seek economic benefits, resulting in frequent spam messages, so that people can not help but disturb. The protection of the peace of private life has become an urgent social problem to be solved.

Spam messages

Spam messages refer to short messages sent to users without their consent that users do not want to receive, or that users cannot refuse to receive according to their own wishes. it mainly includes the following attributes: (1) short messages such as commercial and advertising messages sent to users without the consent of users; (2) other short messages that violate the self-discipline norms of the industry.

The proliferation of spam messages has seriously affected people's normal life, the image of operators and even social stability. For example, pseudo-base stations can send messages to 100000 mobile phones within three kilometers. Now users can use Mobile Manager to intercept such text messages.

Users urgently need a fast and effective method to identify spam messages. Through the accurate identification of spam messages, in order to improve the communication environment of users, provide an effective basis for relevant departments, and safeguard the interests of operators. Data mining platform Smartbi is not to be outdone. In order to solve the problem of spam messages as soon as possible, Smartbi uses Smartbi Mining for modeling, uses random forest text classification algorithm to establish a reasonable short message recognition model, identifies spam messages, and solves the problems of operators and mobile phone users.

The Smartbi mining data mining platform divides the operation into four steps:

1. Data acquisition, get the required data set

two。 Data preprocessing, text Chinese word segmentation, stop word filtering, etc.

3. Model construction and evaluation, build a random forest model, and establish the accuracy of evaluation indicators, recall rate, F1 value to evaluate the classification effect of the model.

4. Analyze the results, summarize and make suggestions.

1 data acquisition

At present, a certain operator has accumulated a large amount of junk SMS data. The processed data is shown in figure 3-2. In this case, 295755 text messages are collected, and the field description is shown in Table 3-1.

Table 3-1 Field description

Figure 3-2 dataset

In order to identify the meaning of the field, a metadata editing node is added here to give an alias, as shown in figure 3-3.

Figure 3-3 metadata editing

2 data preprocessing

2.1. Participle

Chinese word segmentation refers to dividing a whole paragraph of text into entry information with minimum semantics, that is, taking the word as the basic unit, using the computer to automatically segment the Chinese text, and transforming the text data into a machine-recognizable form. English words are delimited by spaces, while Chinese words are written by characters, and there is no obvious distinction between words. therefore, Chinese word segmentation is the basis and key of Chinese information processing. The accuracy of word segmentation results has an important impact on the follow-up text mining. For example, in the selection of features, different effects of word segmentation will affect the importance of words in the text, thus affecting the selection of features.

Here, a word segmentation node is used to segment the text column. _ c2_seg is the string result after word segmentation, and _ c2_seg_words is the WrappedArray type result after word segmentation. The output result of word segmentation is shown in figure 3-4.

2.3 、 TF-IDF

Because text data can not be directly used for modeling, it is necessary to represent the text in a form that can be directly processed by the computer, that is, text digitization. The TF-IDF algorithm digitizes the text data. TF means word frequency and IDF means inverse text frequency index, which is used to evaluate the importance of a word to one of the documents in a file set or a corpus. The importance of a word increases in proportion to the number of times it appears in the document, but decreases inversely with the frequency of its appearance in the corpus. The higher the TF-IDF value, the more important the word.

We access the TF-IDF algorithm for decimation transformation, and the output result is shown in figure 3-6.

(3) build a model

This case uses the random forest algorithm model, through the feature selection _ c2_seg_words_filtered_idf column, the target label is target, the overall model training prediction is shown in figure 3-8.

Figure 3-8 build the model

4 model evaluation

Through the evaluation node access, as shown in figure 3-8, the evaluation result is shown in figure 3-10.

The result of the analysis shows that the score of F1 is 0.91, which shows that the effect of the model is good.

The model can better identify spam messages, filter spam messages effectively, and solve the problems of operators and users.

The case of Smartbi data mining platform uses short message data to identify spam messages. It mainly realizes the accurate identification of spam messages, and provides a solution for relevant operators to solve the problem of spam message filtering by obtaining the above mining results.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report