Case Analysis of python data processing 05/04 Update SLTechnology News&Howtos

Case Analysis of python data processing

2025-05-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

Today, the editor will share with you the relevant knowledge points of python data processing example analysis, the content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.

First, preface

We now have a very large data set. It is a json file, which stores nearly 100, 000 pieces of data, and now needs to be cleaned and processed.

Second, python module import jsonimport jieba

We need to use the json module to process the json file, and use the jieba library to analyze the part of speech, so that we can achieve our needs.

2.1, add the stop word list

Disable the word list .txt and store the stop word list in stopwords because there are some punctuation marks in our target analysis json.

Stopwords = [line.strip () for line in open ("disable word list .txt", encoding= "utf-8") .readlines ()]

Basically as shown in the figure:

A+str (b) + c this is the file name, and a+b+c=./json/poet.song.0.json b is incremented to achieve dynamic values.

With open (a+str (b) + c as fp:

Because there are nearly 500 json files. There are thousands of sets of data in each file, and I am now trying to optimize the code. Now I extract it once, and it takes about five minutes to store the required data in the file.

2.2, sequential read

Define an empty string to convert a json object to a python object. Define an empty list to store verses.

Loop json_data I for each element in it.

New append to list_paragraphs list

Loop j for every sentence in it.

The code is shown in the figure:

Use the jieba library to analyze the part of speech of str content [note that the name, verb. It is a coincidence that the ranking output is two words, and there is no word limit.

Words = jieba.lcut (str_s)

Now words traverses the list of parts of speech that have been analyzed.

Exclude special symbols

For word in words: if word not in stopwords: if len (word) = 1: continue else: counts [word] = counts.get (word,0) + 1

Add one to the frequency of occurrence.

2.3 menthol lambda function

Use the lambda function, sort quick sort, to traverse the top 50 parts of speech of the output frequency.

Items.sort (key=lambda x reverse=True x [1])

Then assign the value word, count.

Word, count = items [I] print ("{: 7}" .format (word, count)) three, run

3.1. save the file f=open ('towa.txt', "a", encoding='gb18030') f.writelines ("title:" + textxxx) f.writelines (word_ping)

These are all the contents of the article "python data processing case Analysis". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.