In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the relevant knowledge of "what are the python word segmentation tools and how to use them?" the editor shows you the operation process through actual cases. The operation method is simple, fast and practical. I hope that this article "what are the python word segmentation tools and how to use them" can help you solve the problem.
1. Jieba participle
"stuttering" word segmentation, GitHub's most popular word segmentation tool, is determined to be the best Python Chinese word segmentation component, supporting multiple word segmentation modes and supporting custom dictionaries.
Github star:26k
Code example
Import jieba
Strs= ["I came to Tsinghua University in Beijing", "Table Tennis auction is over", "University of Science and Technology of China"]
For str in strs:
Seg_list = jieba.cut (str,use_paddle=True) # uses paddle mode
Print ("Paddle Mode:" +'/ '.join (list (seg_list)
Seg_list = jieba.cut ("I came to Tsinghua University in Beijing", cut_all=True)
Print ("full mode:" + "/" .join (seg_list)) # full mode
Seg_list = jieba.cut ("I came to Tsinghua University in Beijing", cut_all=False)
Print ("precise mode:" + "/" .join (seg_list)) # precise mode
Seg_list = jieba.cut ("he came to NetEase Hangyan Mansion") # default is precise mode
Print ("New word recognition:", ".join (seg_list))
Seg_list = jieba.cut_for_search ("Xiao Ming Master graduated from the Institute of Computing, Chinese Academy of Sciences and later studied at Kyoto University in Japan") # search engine model
Print ("search engine pattern:", .join (seg_list))
Output:
[full mode]: I / come / Beijing / Tsinghua / Tsinghua University / Huada / University
[precise model]: I / come / Beijing / Tsinghua University
[new word recognition]: he, came, NetEase, Hangyan, Mansion (here, "Hangyan" is not in the dictionary, but it is also recognized by Viterbi algorithm)
[search engine model]: Xiao Ming, Master, graduated, Yu, China, Science, College, Academy of Sciences, Chinese Academy of Sciences, Computing, Institute of Computing, Post, in Japan, Kyoto, University, Kyoto University, Japan, further study 2. Pkuseg participle
Pkuseg is an open source word segmentation tool of Peking University language Computing and Machine Learning Research Group. It is characterized by supporting multi-domain word segmentation. At present, it supports word segmentation pre-training models in news, network, medicine, tourism, and mixed fields. Users are free to choose different models. Compared with the general word segmentation tool, its word segmentation accuracy is higher.
Github star:5.4k
Code example
Import pkuseg
Seg = pkuseg.pkuseg () # load the model with the default configuration
Text = seg.cut ('python is a great language') # for word segmentation
Print (text)
Output
['python',' is', 'one', 'door', 'very', 'great', 'language'] 3. FoolNLTK participle
Trained based on the BiLSTM model, it is said to be probably the most accurate open source Chinese word segmentation, and also supports user-defined dictionaries.
GitHub star: 1.6k
Code example
Import fool
Text = "A fool is in Beijing"
Print (fool.cut (text))
# ['one', 'fool','in', 'Beijing'] 4. THULAC
THULAC is a set of Chinese lexical analysis toolkit developed by the Laboratory of Natural language processing and Social Humanities Computing of Tsinghua University. It has the function of part-of-speech tagging and can analyze whether a word is a noun or a verb or an adjective.
Github star:1.5k
Code sample 1
Code sample 1
Import thulac
Thu1 = thulac.thulac () # default mode
Text= thu1.cut ("I love Tiananmen Square in Beijing", text=True) # carries on a sentence participle
Print (text) # I _ r love _ v Beijing _ ns Tiananmen _ ns
Code sample 2
Thu1 = thulac.thulac (seg_only=True) # word segmentation only, no part of speech tagging
Thu1.cut_f ("input.txt", "output.txt") # participle the contents of the input.txt file and output them to output.txt
This is the end of the content about "what are the python word segmentation tools and how to use them?" Thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.