Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How can Python text automatically identify whether an individual is suicidal?

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Python text how to automatically identify whether individuals have suicidal tendencies, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

To simplify the problem, we divide the short texts into one of two categories, either normal Weibo or suicidal Weibo. In this way, with the Weibo tree hole last time, the training set and test set are very easy to get. Because it is a two-classification problem of short text, we can use scikit-learn 's SVM classification model.

However, it should be noted that our classifier can not guarantee that the classification results are 100% correct. After all, the mental state is difficult to be accurately identified by the text. We can only roughly judge the depression and intervene by the text. In fact, this is a problem that would rather kill a hundred by mistake than let one go. After all, let go of one, there may be a life quietly passed.

1. Data preparation

The data set is divided into two parts as a whole, one is the training set and the other is the test set. Among them, the training set and test set should also be divided into normal Weibo short text and suicidal tendency short text.

After manually screening the data obtained from the last article crawling a hole in the Weibo tree, select 300 items as the training set (a little less, in fact, the industry needs at least 3000). Then according to the last Weibo crawler randomly crawled 10000 normal Weibo as the training set. In addition, 50 pieces of Weibo and 50 pieces of ordinary Weibo were collected as the test set.

Each Weibo is stored in the txt file by line. In the training set, the normal Weibo was named normal.txt and the suicidal Weibo was named die.txt. The test set is stored in a file with the suffix _ test.txt:

In addition, next we will use a machine learning toolkit called scikit-learn (sklearn), which packages many machine learning models and preprocessing methods, which makes it convenient for us to build classifiers and install them in CMD/Terminal by typing the following commands:

Pip install-U scikit-learn

If you have not already installed Python, please read this article to install Python, and then execute the above command to install sklearn.

two。 Data preprocessing

We use a typical Chinese natural language preprocessing method: stutter the text and then digitize it.

Since words like "die", "I don't want to live" and "I'm gone" are quite common in Weibo, which is suicidal, we can use TF-IDF to digitize strings. If you don't know TF-IDF, please read this article: "tf-idf algorithm for text processing and its practice":

Https://suool.net/2019/01/26/tf-tdf-exercise/

The digital part of the code is as follows.

3. Training

Using scikit-learn 's SVM classification model, we can quickly train and build a classifier:

Here we ignore the explanation of the principle of SVM, the principle of SVM can refer to this article, "support Vector Machine (SVM)-principles":

Https://zhuanlan.zhihu.com/p/31886934

4. test

When testing, we need to calculate the classification accuracy and recall rate of the model for the two categories respectively. Scikit-learn provides a very useful function, classification_report, to calculate them:

Results:

The classification accuracy of suicidal tendencies Weibo is 100%, but the recall rate is not enough. It only found 60% of 50 items, that is, 30 suicidal tendencies Weibo.

For the classification of normal Weibo, the accuracy rate is 71%, that is, some normal Weibo are classified as suicidal Weibo, but the recall rate is 100%, that is, there is no normal Weibo that is not classified.

This is based on the fact that there are not enough training sets. Our number of suicidal tendencies on Weibo is only 300. this is far from enough. If we can increase the number to 3000, I believe the results will improve a lot, especially for the recall rate of suicidal Weibo. It is estimated that in the end, the accuracy and recall rate of the model will reach at least 95%.

After reading the above, have you mastered the method of Python short text to automatically identify whether an individual is suicidal? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 294

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report