In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Pyhanlp user-defined Dictionary add example description
Pyhanlp is a HanLP encapsulated in python version, project address: https://github.com/hankcs/pyhanlp
After testing, HanLP is better than nltk in Chinese word segmentation and entity recognition.
How do I add a custom dictionary to pyhanlp? Take python 2.7.9 as an example:
1. Install pyhanlp:pip install pyhanlp
two。 Add a custom dictionary under the dictionary path: the CustomDictionary main dictionary text path is data/dictionary/custom/CustomDictionary.txt, where users can add their own words (not recommended), or you can create a new text file through the configuration file; CustomDictionaryPath=data/dictionary/custom/CustomDictionary.txt; my dictionary .txt; to append the dictionary (recommended).
The specific absolute path can be obtained from hanlp-- version:
# hanlp-version
Jar1.6.3:/usr/local/lib/python2.7/site-packages/pyhanlp/static/hanlp-1.6.3.jar
Data 1.6.2: / usr/local/lib/python2.7/site-packages/pyhanlp/static/data
Config:/usr/local/lib/python2.7/site-packages/pyhanlp/static/hanlp.properties
# cat / usr/local/lib/python2.7/site-packages/pyhanlp/static/hanlp.properties | grep "CustomDictionaryPath"
3. It is recommended to add your own dictionary file under this path, such as my dictionary .txt, and add it to the above properties file under CustomDictionaryPath.
# cat my dictionary .txt
Codis Cluster nz 1000
Jinri Toutiao nz 1000
The first entry, the second part of speech (the default is n), and the third word frequency
4. Then you need to delete the cache file so that python will reload the new file:
# rm-f CustomDictionary.txt.bin
5. Test the new dictionary:
Python-c "from pyhanlp import *; print (HanLP.segment ('codis Cluster, Jinri Toutiao')"
May 16, 2018 4:43:14 afternoon com.hankcs.hanlp.corpus.io.IOUtil readBytes
Warning: read
Exception java.io.FileNotFoundException: / usr/local/lib/python2.7/site-packages/pyhanlp/static/data/dictionary/custom/CustomDictionary.txt.bin occurred during / usr/local/lib/python2.7/site-packages/pyhanlp/static/data/dictionary/custom/CustomDictionary.txt.bin (there is no file or directory)
It's okay to report this error, it's just a warning, reload the cache file.
Note:
HanLP part of speech list: a detailed part of speech list can query the content on the hanlp project website for more comprehensive details!
-
Author: Mingyue San Qianli 68
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.